public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [000/nnn] poly_int: representation of runtime offsets and sizes
@ 2017-10-23 16:57 Richard Sandiford
  2017-10-23 16:58 ` [001/nnn] poly_int: add poly-int.h Richard Sandiford
                   ` (107 more replies)
  0 siblings, 108 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 16:57 UTC (permalink / raw)
  To: gcc-patches

This series adds support for offsets and sizes that are a runtime
invariant rather than a compile time constant.  It's based on the
patch posted here:

  https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00406.html

The rest of the covering note is split into:

- Summary   (from the message linked above)
- Tree representation
- RTL representation
- Compile-time impact
- Typical changes
- Testing


Summary
=======

The size of an SVE register in bits can be any multiple of 128 between
128 and 2048 inclusive.  The way we chose to represent this was to have
a runtime indeterminate that counts the number of 128 bit blocks above
the minimum of 128.  If we call the indeterminate X then:

* an SVE register has 128 + 128 * X bits (16 + 16 * X bytes)
* the last int in an SVE vector is at byte offset 12 + 16 * X
* etc.

Although the maximum value of X is 15, we don't want to take advantage
of that, since there's nothing particularly magical about the value.

So we have two types of target: those for which there are no runtime
indeterminates, and those for which there is one runtime indeterminate.
We decided to generalise the interface slightly by allowing any number
of indeterminates, although some parts of the underlying implementation
are still limited to 0 and 1 for now.

The main class for working with these runtime offsets and sizes is
"poly_int".  It represents a value of the form:

  C0 + C1 * X1 + ... + Cn * Xn

where each coefficient Ci is a compile-time constant and where each
indeterminate Xi is a nonnegative runtime value.  The class takes two
template parameters, one giving the number of coefficients and one
giving the type of the coefficients.  There are then typedefs for the
common cases, with the number of coefficients being controlled by
the target.

poly_int is used for things like:

- the number of elements in a VECTOR_TYPE
- the size and number of units in a general machine_mode
- the offset of something in the stack frame
- SUBREG_BYTE
- MEM_SIZE and MEM_OFFSET
- mem_ref_offset

(only a selective list).

The patch that adds poly_int has detailed documentation, but the main
points are:

* there's no total ordering between poly_ints, so the best we can do
  when comparing them is to ask whether two values *might* or *must*
  be related in a particular way.  E.g. if mode A has size 2 + 2X
  and mode B has size 4, the condition:

    GET_MODE_SIZE (A) <= GET_MODE_SIZE (B)

  is true for X<=1 and false for X>=2.  This translates to:

    may_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == true
    must_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == false

  Of course, the may/must distinction already exists in things like
  alias analysis.

* some poly_int arithmetic operations (notably division) are only possible
  for certain values.  These operations therefore become conditional.

* target-independent code is exposed to these restrictions even if the
  current target has no indeterminates.  But:

  * we've tried to provide enough operations that poly_ints are easy
    to work with.

  * it means that developers working with non-SVE targets don't need
    to test SVE.  If the code compiles on a non-SVE target, and if it
    doesn't use any asserting operations, it's reasonable to assume
    that it will work on SVE too.

* for target-specific code, poly_int degenerates to a constant if there
  are no runtime invariants for that target.  Only very minor changes
  are needed to non-AArch64 targets.

* poly_int operations should be (and in practice seem to be) as
  efficient as single-coefficient operations on non-AArch64 targets.


Tree representation
===================

The series uses a new POLY_INT_CST node to represent a poly_int value
at the tree level.  It is only used on targets with runtime sizes and
offsets; the associated test macro POLY_INT_CST_P is always false for
other targets.

The node has one INTEGER_CST per coefficient, which makes it easier
to refer to the same tree as a poly_wide_int, a poly_offset_int and
a poly_widest_int without copying the representation.

Only low-level routines use the tree node directly.  Most code uses:

- poly_int_tree_p (x)
    Return true if X is an INTEGER_CST or a POLY_INT_CST.

- wi::to_poly_wide (x)
- wi::to_poly_offset (x)
- wi::to_poly_widest (x)
    poly_int versions of the normal wi::to_wide etc. routines.  These
    work on both INTEGER_CSTs and POLY_INT_CSTs.

- poly_int_tree_p (x, &y)
    Test whether X is an INTEGER_CST or POLY_INT_CST and store its value
    in Y if so.  This is defined for Y of type poly_int64 and poly_uint64;
    the wi::to_* routines are more efficient than return-by-pointer for
    wide_int-based types.

- tree_to_poly_int64 (x)
- tree_to_poly_uint64 (x)
    poly_int versions of tree_to_shwi and tree_to_uhwi.  Again they work
    on both INTEGER_CSTs and POLY_INT_CSTs.

Many tree routines now accept poly_int operands, such as:

- build_int_cst
- build_int_cstu
- wide_int_to_tree
- force_fit_type


RTL representation
==================

The corresponding RTL representation is CONST_POLY_INT.  Again,
this is only used on targets with runtime sizes and offsets, with
the test macro CONST_POLY_INT_P returning false for other targets.

Since RTL does not have the equivalent of the tree-level distinction
between wi::to_wide, wi::to_offset and wi::to_widest, CONST_POLY_INT
just stores the coefficients directly as wide_ints, using the
trailing_wide_ints class for efficiency.  The main routines are:

- poly_int_rtx_p (x)
    Return true if X is CONST_SCALAR_INT_P or CONST_POLY_INT_P.

- wi::to_poly_wide (x, mode)
    Return the value of CONST_SCALAR_INT_P or CONST_POLY_INT_P X
    as a poly_wide_int.

- poly_int_rtx_p (x, &y)
    Return true if X is a CONST_INT or a CONST_POLY_INT,
    storing its value in Y if so.  This is defined only for Y of
    type poly_int64.  (poly_uint64 isn't much use for RTL,
    since constants have no inherent sign and are stored in sign-
    extended rather than zero-extended form.  wi::to_wide is more
    efficient than return-by-pointer when accessing an rtx as a
    poly_wide_int.)

- rtx_to_poly_int64 (x)
    A poly_int version of INTVAL, which works on both CONST_INT
    and CONST_POLY_INT.

- split_offset (x, &y)
    If X is a PLUS of X' a poly_int, store the poly_int in Y
    and return the X'.  Otherwise store 0 in Y and return X.

- split_offset_and_add (x, &y)
    If X is a PLUS of X' a poly_int, add the poly_int to Y
    and return the X'.  Otherwise leave Y alone and return X.

Many RTL routines now accept poly_int operands, such as:

- gen_int_mode
- trunc_int_for_mode
- plus_constant
- immed_wide_int_const


Compile-time impact
===================

The series seems to be compile-time neutral for release builds on
targets without runtime indeterminates, within a margin of about
[-0.1%, 0.1%].  Also, the abstraction of poly_int<1, X> is usually
compiled away.  E.g.:

  poly_wide_int
  foo (poly_wide_int x, tree y)
  {
    return x + wi::to_poly_wide (x);
  }

compiles to the same code as:

  wide_int
  foo (wide_int x, tree y)
  {
    return x + wi::to_wide (x);
  }

in release builds.  (I've tried various other combinations too.)


Typical changes
===============

Here's a table of the most common changes in the series.

----------------------------------------------------------------------
Before                                 After
----------------------------------------------------------------------
wi::to_wide (x)                        wi::to_poly_wide (x)
wi::to_offset (x)                      wi::to_poly_offset (x)
wi::to_widest (x)                      wi::to_poly_widest (x)
----------------------------------------------------------------------
unsigned HOST_WIDE_INT y;              poly_uint64 y;
if (tree_fits_uhwi_p (x))              if (poly_int_tree_p (x, &y))
  {                                      {
    x = tree_to_uhwi (y);
----------------------------------------------------------------------
HOST_WIDE_INT y;                       poly_int64 y;
if (tree_fits_shwi_p (x))              if (poly_int_tree_p (x, &y))
  {                                      {
    x = tree_to_shwi (y);
----------------------------------------------------------------------
HOST_WIDE_INT y;                       poly_int64 y;
if (cst_and_fits_in_hwi (x))           if (ptrdiff_tree_p (x, &y))
  {                                      {
    x = int_cst_value (y);
----------------------------------------------------------------------
HOST_WIDE_INT y;                       poly_int64 y;
if (CONST_INT_P (x))                   if (poly_int_rtx_p (x, &y))
  {                                      {
    x = INTVAL (x);
----------------------------------------------------------------------
if (offset < limit)                    if (must_lt (offset, limit))
  ...optimise...;                        ...optimise...;
----------------------------------------------------------------------
if (offset >= limit)                   if (may_ge (offset, limit))
  ...abort optimisation...;              ...abort optimisation...;
----------------------------------------------------------------------
if (offset >= limit)                   if (must_ge (offset, limit))
  ...treat as undefined...;              ...treat as undefined...;
----------------------------------------------------------------------
if (nunits1 == nunits2)                if (must_eq (nunits1, nunits2))
  ...treat as compatible...;             ...treat as compatible...;
----------------------------------------------------------------------
if (nunits1 != nunits2)                if (may_ne (nunits1, nunits2))
  ...treat as incompatible...;           ...treat as incompatible...;
----------------------------------------------------------------------
// Fold (eq op0 op1)                   // Fold (eq op0 op1)
if (op0 == op1)                        if (must_eq (op0, op1))
  ...fold to true...;                    ...fold to true...;
----------------------------------------------------------------------
// Fold (eq op0 op1)                   // Fold (eq op0 op1)
if (op0 != op1)                        if (must_ne (op0, op1))
  ...fold to false...;                   ...fold to false...;
----------------------------------------------------------------------


Testing
=======

Tested by compiling the testsuite before and after the series on:

    aarch64-linux-gnu aarch64_be-linux-gnu alpha-linux-gnu arc-elf
    arm-linux-gnueabi arm-linux-gnueabihf avr-elf bfin-elf c6x-elf
    cr16-elf cris-elf epiphany-elf fr30-elf frv-linux-gnu ft32-elf
    h8300-elf hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
    i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
    m68k-linux-gnu mcore-elf microblaze-elf mipsel-linux-gnu
    mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
    nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnuspe
    powerpc-eabispe powerpc64-linux-gnu powerpc64le-linux-gnu
    powerpc-ibm-aix7.0 riscv32-elf riscv64-elf rl78-elf rx-elf
    s390-linux-gnu s390x-linux-gnu sh-linux-gnu sparc-linux-gnu
    sparc64-linux-gnu sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf
    xstormy16-elf v850-elf vax-netbsdelf visium-elf x86_64-darwin
    x86_64-linux-gnu xtensa-elf

There were no differences in assembly output (except on
powerpc-ibm-aix7.0, where symbol names aren't stable).

Also tested normally on aarch64-linux-gnu, x86_64-linux-gnu and
powerpc64le-linux-gnu.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [001/nnn] poly_int: add poly-int.h
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
@ 2017-10-23 16:58 ` Richard Sandiford
  2017-10-25 16:17   ` Martin Sebor
  2017-11-08 10:03   ` Richard Sandiford
  2017-10-23 16:59 ` [002/nnn] poly_int: IN_TARGET_CODE Richard Sandiford
                   ` (106 subsequent siblings)
  107 siblings, 2 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 16:58 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3967 bytes --]

This patch adds a new "poly_int" class to represent polynomial integers
of the form:

  C0 + C1*X1 + C2*X2 ... + Cn*Xn

It also adds poly_int-based typedefs for offsets and sizes of various
precisions.  In these typedefs, the Ci coefficients are compile-time
constants and the Xi indeterminates are run-time invariants.  The number
of coefficients is controlled by the target and is initially 1 for all
ports.

Most routines can handle general coefficient counts, but for now a few
are specific to one or two coefficients.  Support for other coefficient
counts can be added when needed.

The patch also adds a new macro, IN_TARGET_CODE, that can be
set to indicate that a TU contains target-specific rather than
target-independent code.  When this macro is set and the number of
coefficients is 1, the poly-int.h classes define a conversion operator
to a constant.  This allows most existing target code to work without
modification.  The main exceptions are:

- values passed through ..., which need an explicit conversion to a
  constant

- ?: expression in which one arm ends up being a polynomial and the
  other remains a constant.  In these cases it would be valid to convert
  the constant to a polynomial and the polynomial to a constant, so a
  cast is needed to break the ambiguity.

The patch also adds a new target hook to return the estimated
value of a polynomial for costing purposes.

The patch also adds operator<< on wide_ints (it was already defined
for offset_int and widest_int).  I think this was originally excluded
because >> is ambiguous for wide_int, but << is useful for converting
bytes to bits, etc., so is worth defining on its own.  The patch also
adds operator% and operator/ for offset_int and widest_int, since those
types are always signed.  These changes allow the poly_int interface to
be more predictable.

I'd originally tried adding the tests as selftests, but that ended up
bloating cc1 by at least a third.  It also took a while to build them
at -O2.  The patch therefore uses plugin tests instead, where we can
force the tests to be built at -O0.  They still run in negligible time
when built that way.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* poly-int.h: New file.
	* poly-int-types.h: Likewise.
	* coretypes.h: Include them.
	(POLY_INT_CONVERSION): Define.
	* target.def (estimated_poly_value): New hook.
	* doc/tm.texi.in (TARGET_ESTIMATED_POLY_VALUE): New hook.
	* doc/tm.texi: Regenerate.
	* doc/poly-int.texi: New file.
	* doc/gccint.texi: Include it.
	* doc/rtl.texi: Describe restrictions on subreg modes.
	* Makefile.in (TEXI_GCCINT_FILES): Add poly-int.texi.
	* genmodes.c (NUM_POLY_INT_COEFFS): Provide a default definition.
	(emit_insn_modes_h): Emit a definition of NUM_POLY_INT_COEFFS.
	* targhooks.h (default_estimated_poly_value): Declare.
	* targhooks.c (default_estimated_poly_value): New function.
	* target.h (estimated_poly_value): Likewise.
	* wide-int.h (WI_UNARY_RESULT): Use wi::binary_traits.
	(wi::unary_traits): Delete.
	(wi::binary_traits::signed_shift_result_type): Define for
	offset_int << HOST_WIDE_INT, etc.
	(generic_wide_int::operator <<=): Define for all types and use
	wi::lshift instead of <<.
	(wi::hwi_with_prec): Add a default constructor.
	(wi::ints_for): New class.
	(operator <<): Define for all wide-int types.
	(operator /): New function.
	(operator %): Likewise.
	* selftest.h (ASSERT_MUST_EQ, ASSERT_MUST_EQ_AT, ASSERT_MAY_NE)
	(ASSERT_MAY_NE_AT): New macros.

gcc/testsuite/
	* gcc.dg/plugin/poly-int-tests.h,
	gcc.dg/plugin/poly-int-test-1.c,
	gcc.dg/plugin/poly-int-01_plugin.c,
	gcc.dg/plugin/poly-int-02_plugin.c,
	gcc.dg/plugin/poly-int-03_plugin.c,
	gcc.dg/plugin/poly-int-04_plugin.c,
	gcc.dg/plugin/poly-int-05_plugin.c,
	gcc.dg/plugin/poly-int-06_plugin.c,
	gcc.dg/plugin/poly-int-07_plugin.c: New tests.
	* gcc.dg/plugin/plugin.exp: Run them.


[-- Attachment #2: poly-int.diff.bz2 --]
[-- Type: application/x-bzip2, Size: 39587 bytes --]

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [002/nnn] poly_int: IN_TARGET_CODE
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
  2017-10-23 16:58 ` [001/nnn] poly_int: add poly-int.h Richard Sandiford
@ 2017-10-23 16:59 ` Richard Sandiford
  2017-11-17  3:35   ` Jeff Law
  2017-10-23 17:00 ` [003/nnn] poly_int: MACRO_MODE Richard Sandiford
                   ` (105 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 16:59 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 8110 bytes --]

This patch makes each target-specific TU define an IN_TARGET_CODE macro,
which is used to decide whether poly_int<1, C> should convert to C.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* genattrtab.c (write_header): Define IN_TARGET_CODE to 1 in the
	target C file.
	* genautomata.c (main): Likewise.
	* genconditions.c (write_header): Likewise.
	* genemit.c (main): Likewise.
	* genextract.c (print_header): Likewise.
	* genopinit.c (main): Likewise.
	* genoutput.c (output_prologue): Likewise.
	* genpeep.c (main): Likewise.
	* genpreds.c (write_insn_preds_c): Likewise.
	* genrecog.c (writer_header): Likewise.
	* config/aarch64/aarch64-builtins.c (IN_TARGET_CODE): Define.
	* config/aarch64/aarch64-c.c (IN_TARGET_CODE): Likewise.
	* config/aarch64/aarch64.c (IN_TARGET_CODE): Likewise.
	* config/aarch64/cortex-a57-fma-steering.c (IN_TARGET_CODE): Likewise.
	* config/aarch64/driver-aarch64.c (IN_TARGET_CODE): Likewise.
	* config/alpha/alpha.c (IN_TARGET_CODE): Likewise.
	* config/alpha/driver-alpha.c (IN_TARGET_CODE): Likewise.
	* config/arc/arc-c.c (IN_TARGET_CODE): Likewise.
	* config/arc/arc.c (IN_TARGET_CODE): Likewise.
	* config/arc/driver-arc.c (IN_TARGET_CODE): Likewise.
	* config/arm/aarch-common.c (IN_TARGET_CODE): Likewise.
	* config/arm/arm-builtins.c (IN_TARGET_CODE): Likewise.
	* config/arm/arm-c.c (IN_TARGET_CODE): Likewise.
	* config/arm/arm.c (IN_TARGET_CODE): Likewise.
	* config/arm/driver-arm.c (IN_TARGET_CODE): Likewise.
	* config/avr/avr-c.c (IN_TARGET_CODE): Likewise.
	* config/avr/avr-devices.c (IN_TARGET_CODE): Likewise.
	* config/avr/avr-log.c (IN_TARGET_CODE): Likewise.
	* config/avr/avr.c (IN_TARGET_CODE): Likewise.
	* config/avr/driver-avr.c (IN_TARGET_CODE): Likewise.
	* config/avr/gen-avr-mmcu-specs.c (IN_TARGET_CODE): Likewise.
	* config/bfin/bfin.c (IN_TARGET_CODE): Likewise.
	* config/c6x/c6x.c (IN_TARGET_CODE): Likewise.
	* config/cr16/cr16.c (IN_TARGET_CODE): Likewise.
	* config/cris/cris.c (IN_TARGET_CODE): Likewise.
	* config/darwin.c (IN_TARGET_CODE): Likewise.
	* config/epiphany/epiphany.c (IN_TARGET_CODE): Likewise.
	* config/epiphany/mode-switch-use.c (IN_TARGET_CODE): Likewise.
	* config/epiphany/resolve-sw-modes.c (IN_TARGET_CODE): Likewise.
	* config/fr30/fr30.c (IN_TARGET_CODE): Likewise.
	* config/frv/frv.c (IN_TARGET_CODE): Likewise.
	* config/ft32/ft32.c (IN_TARGET_CODE): Likewise.
	* config/h8300/h8300.c (IN_TARGET_CODE): Likewise.
	* config/i386/djgpp.c (IN_TARGET_CODE): Likewise.
	* config/i386/driver-i386.c (IN_TARGET_CODE): Likewise.
	* config/i386/driver-mingw32.c (IN_TARGET_CODE): Likewise.
	* config/i386/host-cygwin.c (IN_TARGET_CODE): Likewise.
	* config/i386/host-i386-darwin.c (IN_TARGET_CODE): Likewise.
	* config/i386/host-mingw32.c (IN_TARGET_CODE): Likewise.
	* config/i386/i386-c.c (IN_TARGET_CODE): Likewise.
	* config/i386/i386.c (IN_TARGET_CODE): Likewise.
	* config/i386/intelmic-mkoffload.c (IN_TARGET_CODE): Likewise.
	* config/i386/msformat-c.c (IN_TARGET_CODE): Likewise.
	* config/i386/winnt-cxx.c (IN_TARGET_CODE): Likewise.
	* config/i386/winnt-stubs.c (IN_TARGET_CODE): Likewise.
	* config/i386/winnt.c (IN_TARGET_CODE): Likewise.
	* config/i386/x86-tune-sched-atom.c (IN_TARGET_CODE): Likewise.
	* config/i386/x86-tune-sched-bd.c (IN_TARGET_CODE): Likewise.
	* config/i386/x86-tune-sched-core.c (IN_TARGET_CODE): Likewise.
	* config/i386/x86-tune-sched.c (IN_TARGET_CODE): Likewise.
	* config/ia64/ia64-c.c (IN_TARGET_CODE): Likewise.
	* config/ia64/ia64.c (IN_TARGET_CODE): Likewise.
	* config/iq2000/iq2000.c (IN_TARGET_CODE): Likewise.
	* config/lm32/lm32.c (IN_TARGET_CODE): Likewise.
	* config/m32c/m32c-pragma.c (IN_TARGET_CODE): Likewise.
	* config/m32c/m32c.c (IN_TARGET_CODE): Likewise.
	* config/m32r/m32r.c (IN_TARGET_CODE): Likewise.
	* config/m68k/m68k.c (IN_TARGET_CODE): Likewise.
	* config/mcore/mcore.c (IN_TARGET_CODE): Likewise.
	* config/microblaze/microblaze-c.c (IN_TARGET_CODE): Likewise.
	* config/microblaze/microblaze.c (IN_TARGET_CODE): Likewise.
	* config/mips/driver-native.c (IN_TARGET_CODE): Likewise.
	* config/mips/frame-header-opt.c (IN_TARGET_CODE): Likewise.
	* config/mips/mips.c (IN_TARGET_CODE): Likewise.
	* config/mmix/mmix.c (IN_TARGET_CODE): Likewise.
	* config/mn10300/mn10300.c (IN_TARGET_CODE): Likewise.
	* config/moxie/moxie.c (IN_TARGET_CODE): Likewise.
	* config/msp430/driver-msp430.c (IN_TARGET_CODE): Likewise.
	* config/msp430/msp430-c.c (IN_TARGET_CODE): Likewise.
	* config/msp430/msp430.c (IN_TARGET_CODE): Likewise.
	* config/nds32/nds32-cost.c (IN_TARGET_CODE): Likewise.
	* config/nds32/nds32-fp-as-gp.c (IN_TARGET_CODE): Likewise.
	* config/nds32/nds32-intrinsic.c (IN_TARGET_CODE): Likewise.
	* config/nds32/nds32-isr.c (IN_TARGET_CODE): Likewise.
	* config/nds32/nds32-md-auxiliary.c (IN_TARGET_CODE): Likewise.
	* config/nds32/nds32-memory-manipulation.c (IN_TARGET_CODE): Likewise.
	* config/nds32/nds32-pipelines-auxiliary.c (IN_TARGET_CODE): Likewise.
	* config/nds32/nds32-predicates.c (IN_TARGET_CODE): Likewise.
	* config/nds32/nds32.c (IN_TARGET_CODE): Likewise.
	* config/nios2/nios2.c (IN_TARGET_CODE): Likewise.
	* config/nvptx/mkoffload.c (IN_TARGET_CODE): Likewise.
	* config/nvptx/nvptx.c (IN_TARGET_CODE): Likewise.
	* config/pa/pa.c (IN_TARGET_CODE): Likewise.
	* config/pdp11/pdp11.c (IN_TARGET_CODE): Likewise.
	* config/powerpcspe/driver-powerpcspe.c (IN_TARGET_CODE): Likewise.
	* config/powerpcspe/host-darwin.c (IN_TARGET_CODE): Likewise.
	* config/powerpcspe/host-ppc64-darwin.c (IN_TARGET_CODE): Likewise.
	* config/powerpcspe/powerpcspe-c.c (IN_TARGET_CODE): Likewise.
	* config/powerpcspe/powerpcspe-linux.c (IN_TARGET_CODE): Likewise.
	* config/powerpcspe/powerpcspe.c (IN_TARGET_CODE): Likewise.
	* config/riscv/riscv-builtins.c (IN_TARGET_CODE): Likewise.
	* config/riscv/riscv-c.c (IN_TARGET_CODE): Likewise.
	* config/riscv/riscv.c (IN_TARGET_CODE): Likewise.
	* config/rl78/rl78-c.c (IN_TARGET_CODE): Likewise.
	* config/rl78/rl78.c (IN_TARGET_CODE): Likewise.
	* config/rs6000/driver-rs6000.c (IN_TARGET_CODE): Likewise.
	* config/rs6000/host-darwin.c (IN_TARGET_CODE): Likewise.
	* config/rs6000/host-ppc64-darwin.c (IN_TARGET_CODE): Likewise.
	* config/rs6000/rs6000-c.c (IN_TARGET_CODE): Likewise.
	* config/rs6000/rs6000-linux.c (IN_TARGET_CODE): Likewise.
	* config/rs6000/rs6000-p8swap.c (IN_TARGET_CODE): Likewise.
	* config/rs6000/rs6000-string.c (IN_TARGET_CODE): Likewise.
	* config/rs6000/rs6000.c (IN_TARGET_CODE): Likewise.
	* config/rx/rx.c (IN_TARGET_CODE): Likewise.
	* config/s390/driver-native.c (IN_TARGET_CODE): Likewise.
	* config/s390/s390-c.c (IN_TARGET_CODE): Likewise.
	* config/s390/s390.c (IN_TARGET_CODE): Likewise.
	* config/sh/sh-c.c (IN_TARGET_CODE): Likewise.
	* config/sh/sh-mem.cc (IN_TARGET_CODE): Likewise.
	* config/sh/sh.c (IN_TARGET_CODE): Likewise.
	* config/sh/sh_optimize_sett_clrt.cc (IN_TARGET_CODE): Likewise.
	* config/sh/sh_treg_combine.cc (IN_TARGET_CODE): Likewise.
	* config/sparc/driver-sparc.c (IN_TARGET_CODE): Likewise.
	* config/sparc/sparc-c.c (IN_TARGET_CODE): Likewise.
	* config/sparc/sparc.c (IN_TARGET_CODE): Likewise.
	* config/spu/spu-c.c (IN_TARGET_CODE): Likewise.
	* config/spu/spu.c (IN_TARGET_CODE): Likewise.
	* config/stormy16/stormy16.c (IN_TARGET_CODE): Likewise.
	* config/tilegx/mul-tables.c (IN_TARGET_CODE): Likewise.
	* config/tilegx/tilegx-c.c (IN_TARGET_CODE): Likewise.
	* config/tilegx/tilegx.c (IN_TARGET_CODE): Likewise.
	* config/tilepro/mul-tables.c (IN_TARGET_CODE): Likewise.
	* config/tilepro/tilepro-c.c (IN_TARGET_CODE): Likewise.
	* config/tilepro/tilepro.c (IN_TARGET_CODE): Likewise.
	* config/v850/v850-c.c (IN_TARGET_CODE): Likewise.
	* config/v850/v850.c (IN_TARGET_CODE): Likewise.
	* config/vax/vax.c (IN_TARGET_CODE): Likewise.
	* config/visium/visium.c (IN_TARGET_CODE): Likewise.
	* config/vms/vms-c.c (IN_TARGET_CODE): Likewise.
	* config/vms/vms-f.c (IN_TARGET_CODE): Likewise.
	* config/vms/vms.c (IN_TARGET_CODE): Likewise.
	* config/xtensa/xtensa.c (IN_TARGET_CODE): Likewise.


[-- Attachment #2: in-target-code.diff.bz2 --]
[-- Type: application/x-bzip2, Size: 4083 bytes --]

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [003/nnn] poly_int: MACRO_MODE
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
  2017-10-23 16:58 ` [001/nnn] poly_int: add poly-int.h Richard Sandiford
  2017-10-23 16:59 ` [002/nnn] poly_int: IN_TARGET_CODE Richard Sandiford
@ 2017-10-23 17:00 ` Richard Sandiford
  2017-11-17  3:36   ` Jeff Law
  2017-10-23 17:00 ` [004/nnn] poly_int: mode query functions Richard Sandiford
                   ` (104 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:00 UTC (permalink / raw)
  To: gcc-patches

This patch uses a MACRO_MODE wrapper for the target macro invocations
in targhooks.c and address.h, so that macros for non-AArch64 targets
can continue to treat modes as fixed-size.

It didn't seem worth converting the address macros to hooks since
(a) they're heavily used, (b) they should be probably be replaced
with a different interface rather than converted to hooks as-is,
and most importantly (c) addresses.h already localises the problem.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* machmode.h (MACRO_MODE): New macro.
	* addresses.h (base_reg_class, ok_for_base_p_1): Use it.
	* targhooks.c (default_libcall_value, default_secondary_reload)
	(default_memory_move_cost, default_register_move_cost)
	(default_class_max_nregs): Likewise.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2017-10-23 16:52:20.675923636 +0100
+++ gcc/machmode.h	2017-10-23 17:00:49.664349224 +0100
@@ -685,6 +685,17 @@ fixed_size_mode::includes_p (machine_mod
   return true;
 }
 
+/* Wrapper for mode arguments to target macros, so that if a target
+   doesn't need polynomial-sized modes, its header file can continue
+   to treat everything as fixed_size_mode.  This should go away once
+   macros are moved to target hooks.  It shouldn't be used in other
+   contexts.  */
+#if NUM_POLY_INT_COEFFS == 1
+#define MACRO_MODE(MODE) (as_a <fixed_size_mode> (MODE))
+#else
+#define MACRO_MODE(MODE) (MODE)
+#endif
+
 extern opt_machine_mode mode_for_size (unsigned int, enum mode_class, int);
 
 /* Return the machine mode to use for a MODE_INT of SIZE bits, if one
Index: gcc/addresses.h
===================================================================
--- gcc/addresses.h	2017-10-23 16:52:20.675923636 +0100
+++ gcc/addresses.h	2017-10-23 17:00:49.663350133 +0100
@@ -31,14 +31,15 @@ base_reg_class (machine_mode mode ATTRIB
 		enum rtx_code index_code ATTRIBUTE_UNUSED)
 {
 #ifdef MODE_CODE_BASE_REG_CLASS
-  return MODE_CODE_BASE_REG_CLASS (mode, as, outer_code, index_code);
+  return MODE_CODE_BASE_REG_CLASS (MACRO_MODE (mode), as, outer_code,
+				   index_code);
 #else
 #ifdef MODE_BASE_REG_REG_CLASS
   if (index_code == REG)
-    return MODE_BASE_REG_REG_CLASS (mode);
+    return MODE_BASE_REG_REG_CLASS (MACRO_MODE (mode));
 #endif
 #ifdef MODE_BASE_REG_CLASS
-  return MODE_BASE_REG_CLASS (mode);
+  return MODE_BASE_REG_CLASS (MACRO_MODE (mode));
 #else
   return BASE_REG_CLASS;
 #endif
@@ -58,15 +59,15 @@ ok_for_base_p_1 (unsigned regno ATTRIBUT
 		 enum rtx_code index_code ATTRIBUTE_UNUSED)
 {
 #ifdef REGNO_MODE_CODE_OK_FOR_BASE_P
-  return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, mode, as,
+  return REGNO_MODE_CODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode), as,
 					outer_code, index_code);
 #else
 #ifdef REGNO_MODE_OK_FOR_REG_BASE_P
   if (index_code == REG)
-    return REGNO_MODE_OK_FOR_REG_BASE_P (regno, mode);
+    return REGNO_MODE_OK_FOR_REG_BASE_P (regno, MACRO_MODE (mode));
 #endif
 #ifdef REGNO_MODE_OK_FOR_BASE_P
-  return REGNO_MODE_OK_FOR_BASE_P (regno, mode);
+  return REGNO_MODE_OK_FOR_BASE_P (regno, MACRO_MODE (mode));
 #else
   return REGNO_OK_FOR_BASE_P (regno);
 #endif
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2017-10-23 17:00:20.920834919 +0100
+++ gcc/targhooks.c	2017-10-23 17:00:49.664349224 +0100
@@ -941,7 +941,7 @@ default_libcall_value (machine_mode mode
 		       const_rtx fun ATTRIBUTE_UNUSED)
 {
 #ifdef LIBCALL_VALUE
-  return LIBCALL_VALUE (mode);
+  return LIBCALL_VALUE (MACRO_MODE (mode));
 #else
   gcc_unreachable ();
 #endif
@@ -1071,11 +1071,13 @@ default_secondary_reload (bool in_p ATTR
     }
 #ifdef SECONDARY_INPUT_RELOAD_CLASS
   if (in_p)
-    rclass = SECONDARY_INPUT_RELOAD_CLASS (reload_class, reload_mode, x);
+    rclass = SECONDARY_INPUT_RELOAD_CLASS (reload_class,
+					   MACRO_MODE (reload_mode), x);
 #endif
 #ifdef SECONDARY_OUTPUT_RELOAD_CLASS
   if (! in_p)
-    rclass = SECONDARY_OUTPUT_RELOAD_CLASS (reload_class, reload_mode, x);
+    rclass = SECONDARY_OUTPUT_RELOAD_CLASS (reload_class,
+					    MACRO_MODE (reload_mode), x);
 #endif
   if (rclass != NO_REGS)
     {
@@ -1603,7 +1605,7 @@ default_memory_move_cost (machine_mode m
 #ifndef MEMORY_MOVE_COST
     return (4 + memory_move_secondary_cost (mode, (enum reg_class) rclass, in));
 #else
-    return MEMORY_MOVE_COST (mode, (enum reg_class) rclass, in);
+    return MEMORY_MOVE_COST (MACRO_MODE (mode), (enum reg_class) rclass, in);
 #endif
 }
 
@@ -1618,7 +1620,8 @@ default_register_move_cost (machine_mode
 #ifndef REGISTER_MOVE_COST
   return 2;
 #else
-  return REGISTER_MOVE_COST (mode, (enum reg_class) from, (enum reg_class) to);
+  return REGISTER_MOVE_COST (MACRO_MODE (mode),
+			     (enum reg_class) from, (enum reg_class) to);
 #endif
 }
 
@@ -1807,7 +1810,8 @@ default_class_max_nregs (reg_class_t rcl
 			 machine_mode mode ATTRIBUTE_UNUSED)
 {
 #ifdef CLASS_MAX_NREGS
-  return (unsigned char) CLASS_MAX_NREGS ((enum reg_class) rclass, mode);
+  return (unsigned char) CLASS_MAX_NREGS ((enum reg_class) rclass,
+					  MACRO_MODE (mode));
 #else
   return ((GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD);
 #endif

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [004/nnn] poly_int: mode query functions
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (2 preceding siblings ...)
  2017-10-23 17:00 ` [003/nnn] poly_int: MACRO_MODE Richard Sandiford
@ 2017-10-23 17:00 ` Richard Sandiford
  2017-11-17  3:37   ` Jeff Law
  2017-10-23 17:01 ` [005/nnn] poly_int: rtx constants Richard Sandiford
                   ` (103 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:00 UTC (permalink / raw)
  To: gcc-patches

This patch changes the bit size and vector count arguments to the
machmode.h functions from unsigned int to poly_uint64.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* machmode.h (mode_for_size, int_mode_for_size, float_mode_for_size)
	(smallest_mode_for_size, smallest_int_mode_for_size): Take the mode
	size as a poly_uint64.
	(mode_for_vector, mode_for_int_vector): Take the number of vector
	elements as a poly_uint64.
	* stor-layout.c (mode_for_size, smallest_mode_for_size): Take the mode
	size as a poly_uint64.
	(mode_for_vector, mode_for_int_vector): Take the number of vector
	elements as a poly_uint64.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2017-10-23 17:00:49.664349224 +0100
+++ gcc/machmode.h	2017-10-23 17:00:52.669615373 +0100
@@ -696,14 +696,14 @@ #define MACRO_MODE(MODE) (as_a <fixed_si
 #define MACRO_MODE(MODE) (MODE)
 #endif
 
-extern opt_machine_mode mode_for_size (unsigned int, enum mode_class, int);
+extern opt_machine_mode mode_for_size (poly_uint64, enum mode_class, int);
 
 /* Return the machine mode to use for a MODE_INT of SIZE bits, if one
    exists.  If LIMIT is nonzero, modes wider than MAX_FIXED_MODE_SIZE
    will not be used.  */
 
 inline opt_scalar_int_mode
-int_mode_for_size (unsigned int size, int limit)
+int_mode_for_size (poly_uint64 size, int limit)
 {
   return dyn_cast <scalar_int_mode> (mode_for_size (size, MODE_INT, limit));
 }
@@ -712,7 +712,7 @@ int_mode_for_size (unsigned int size, in
    exists.  */
 
 inline opt_scalar_float_mode
-float_mode_for_size (unsigned int size)
+float_mode_for_size (poly_uint64 size)
 {
   return dyn_cast <scalar_float_mode> (mode_for_size (size, MODE_FLOAT, 0));
 }
@@ -726,21 +726,21 @@ decimal_float_mode_for_size (unsigned in
     (mode_for_size (size, MODE_DECIMAL_FLOAT, 0));
 }
 
-extern machine_mode smallest_mode_for_size (unsigned int, enum mode_class);
+extern machine_mode smallest_mode_for_size (poly_uint64, enum mode_class);
 
 /* Find the narrowest integer mode that contains at least SIZE bits.
    Such a mode must exist.  */
 
 inline scalar_int_mode
-smallest_int_mode_for_size (unsigned int size)
+smallest_int_mode_for_size (poly_uint64 size)
 {
   return as_a <scalar_int_mode> (smallest_mode_for_size (size, MODE_INT));
 }
 
 extern opt_scalar_int_mode int_mode_for_mode (machine_mode);
 extern opt_machine_mode bitwise_mode_for_mode (machine_mode);
-extern opt_machine_mode mode_for_vector (scalar_mode, unsigned);
-extern opt_machine_mode mode_for_int_vector (unsigned int, unsigned int);
+extern opt_machine_mode mode_for_vector (scalar_mode, poly_uint64);
+extern opt_machine_mode mode_for_int_vector (unsigned int, poly_uint64);
 
 /* Return the integer vector equivalent of MODE, if one exists.  In other
    words, return the mode for an integer vector that has the same number
Index: gcc/stor-layout.c
===================================================================
--- gcc/stor-layout.c	2017-10-23 16:52:20.627879504 +0100
+++ gcc/stor-layout.c	2017-10-23 17:00:52.669615373 +0100
@@ -297,22 +297,22 @@ finalize_size_functions (void)
    MAX_FIXED_MODE_SIZE.  */
 
 opt_machine_mode
-mode_for_size (unsigned int size, enum mode_class mclass, int limit)
+mode_for_size (poly_uint64 size, enum mode_class mclass, int limit)
 {
   machine_mode mode;
   int i;
 
-  if (limit && size > MAX_FIXED_MODE_SIZE)
+  if (limit && may_gt (size, (unsigned int) MAX_FIXED_MODE_SIZE))
     return opt_machine_mode ();
 
   /* Get the first mode which has this size, in the specified class.  */
   FOR_EACH_MODE_IN_CLASS (mode, mclass)
-    if (GET_MODE_PRECISION (mode) == size)
+    if (must_eq (GET_MODE_PRECISION (mode), size))
       return mode;
 
   if (mclass == MODE_INT || mclass == MODE_PARTIAL_INT)
     for (i = 0; i < NUM_INT_N_ENTS; i ++)
-      if (int_n_data[i].bitsize == size
+      if (must_eq (int_n_data[i].bitsize, size)
 	  && int_n_enabled_p[i])
 	return int_n_data[i].m;
 
@@ -340,7 +340,7 @@ mode_for_size_tree (const_tree size, enu
    SIZE bits.  Abort if no such mode exists.  */
 
 machine_mode
-smallest_mode_for_size (unsigned int size, enum mode_class mclass)
+smallest_mode_for_size (poly_uint64 size, enum mode_class mclass)
 {
   machine_mode mode = VOIDmode;
   int i;
@@ -348,19 +348,18 @@ smallest_mode_for_size (unsigned int siz
   /* Get the first mode which has at least this size, in the
      specified class.  */
   FOR_EACH_MODE_IN_CLASS (mode, mclass)
-    if (GET_MODE_PRECISION (mode) >= size)
+    if (must_ge (GET_MODE_PRECISION (mode), size))
       break;
 
+  gcc_assert (mode != VOIDmode);
+
   if (mclass == MODE_INT || mclass == MODE_PARTIAL_INT)
     for (i = 0; i < NUM_INT_N_ENTS; i ++)
-      if (int_n_data[i].bitsize >= size
-	  && int_n_data[i].bitsize < GET_MODE_PRECISION (mode)
+      if (must_ge (int_n_data[i].bitsize, size)
+	  && must_lt (int_n_data[i].bitsize, GET_MODE_PRECISION (mode))
 	  && int_n_enabled_p[i])
 	mode = int_n_data[i].m;
 
-  if (mode == VOIDmode)
-    gcc_unreachable ();
-
   return mode;
 }
 
@@ -475,7 +474,7 @@ bitwise_type_for_mode (machine_mode mode
    either an integer mode or a vector mode.  */
 
 opt_machine_mode
-mode_for_vector (scalar_mode innermode, unsigned nunits)
+mode_for_vector (scalar_mode innermode, poly_uint64 nunits)
 {
   machine_mode mode;
 
@@ -496,14 +495,14 @@ mode_for_vector (scalar_mode innermode,
   /* Do not check vector_mode_supported_p here.  We'll do that
      later in vector_type_mode.  */
   FOR_EACH_MODE_FROM (mode, mode)
-    if (GET_MODE_NUNITS (mode) == nunits
+    if (must_eq (GET_MODE_NUNITS (mode), nunits)
 	&& GET_MODE_INNER (mode) == innermode)
       return mode;
 
   /* For integers, try mapping it to a same-sized scalar mode.  */
   if (GET_MODE_CLASS (innermode) == MODE_INT)
     {
-      unsigned int nbits = nunits * GET_MODE_BITSIZE (innermode);
+      poly_uint64 nbits = nunits * GET_MODE_BITSIZE (innermode);
       if (int_mode_for_size (nbits, 0).exists (&mode)
 	  && have_regs_of_mode[mode])
 	return mode;
@@ -517,7 +516,7 @@ mode_for_vector (scalar_mode innermode,
    an integer mode or a vector mode.  */
 
 opt_machine_mode
-mode_for_int_vector (unsigned int int_bits, unsigned int nunits)
+mode_for_int_vector (unsigned int int_bits, poly_uint64 nunits)
 {
   scalar_int_mode int_mode;
   machine_mode vec_mode;

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [005/nnn] poly_int: rtx constants
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (3 preceding siblings ...)
  2017-10-23 17:00 ` [004/nnn] poly_int: mode query functions Richard Sandiford
@ 2017-10-23 17:01 ` Richard Sandiford
  2017-11-17  4:17   ` Jeff Law
  2017-10-23 17:02 ` [006/nnn] poly_int: tree constants Richard Sandiford
                   ` (102 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:01 UTC (permalink / raw)
  To: gcc-patches

This patch adds an rtl representation of poly_int values.
There were three possible ways of doing this:

(1) Add a new rtl code for the poly_ints themselves and store the
    coefficients as trailing wide_ints.  This would give constants like:

      (const_poly_int [c0 c1 ... cn])

    The runtime value would be:

      c0 + c1 * x1 + ... + cn * xn

(2) Like (1), but use rtxes for the coefficients.  This would give
    constants like:

      (const_poly_int [(const_int c0)
                       (const_int c1)
                       ...
                       (const_int cn)])

    although the coefficients could be const_wide_ints instead
    of const_ints where appropriate.

(3) Add a new rtl code for the polynomial indeterminates,
    then use them in const wrappers.  A constant like c0 + c1 * x1
    would then look like:

      (const:M (plus:M (mult:M (const_param:M x1)
                               (const_int c1))
                       (const_int c0)))

There didn't seem to be that much to choose between them.  The main
advantage of (1) is that it's a more efficient representation and
that we can refer to the cofficients directly as wide_int_storage.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/rtl.texi (const_poly_int): Document.
	* gengenrtl.c (excluded_rtx): Return true for CONST_POLY_INT.
	* rtl.h (const_poly_int_def): New struct.
	(rtx_def::u): Add a cpi field.
	(CASE_CONST_UNIQUE, CASE_CONST_ANY): Add CONST_POLY_INT.
	(CONST_POLY_INT_P, CONST_POLY_INT_COEFFS): New macros.
	(wi::rtx_to_poly_wide_ref): New typedef
	(const_poly_int_value, wi::to_poly_wide, rtx_to_poly_int64)
	(poly_int_rtx_p): New functions.
	(trunc_int_for_mode): Declare a poly_int64 version.
	(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
	(immed_wide_int_const): Take a poly_wide_int_ref rather than
	a wide_int_ref.
	(strip_offset): Declare.
	(strip_offset_and_add): New function.
	* rtl.def (CONST_POLY_INT): New rtx code.
	* rtl.c (rtx_size): Handle CONST_POLY_INT.
	(shared_const_p): Use poly_int_rtx_p.
	* emit-rtl.h (gen_int_mode): Take a poly_int64 instead of a
	HOST_WIDE_INT.
	(gen_int_shift_amount): Likewise.
	* emit-rtl.c (const_poly_int_hasher): New class.
	(const_poly_int_htab): New variable.
	(init_emit_once): Initialize it when NUM_POLY_INT_COEFFS > 1.
	(const_poly_int_hasher::hash): New function.
	(const_poly_int_hasher::equal): Likewise.
	(gen_int_mode): Take a poly_int64 instead of a HOST_WIDE_INT.
	(immed_wide_int_const): Rename to...
	(immed_wide_int_const_1): ...this and make static.
	(immed_wide_int_const): New function, taking a poly_wide_int_ref
	instead of a wide_int_ref.
	(gen_int_shift_amount): Take a poly_int64 instead of a HOST_WIDE_INT.
	(gen_lowpart_common): Handle CONST_POLY_INT.
	* cse.c (hash_rtx_cb, equiv_constant): Likewise.
	* cselib.c (cselib_hash_rtx): Likewise.
	* dwarf2out.c (const_ok_for_output_1): Likewise.
	* expr.c (convert_modes): Likewise.
	* print-rtl.c (rtx_writer::print_rtx, print_value): Likewise.
	* rtlhash.c (add_rtx): Likewise.
	* explow.c (trunc_int_for_mode): Add a poly_int64 version.
	(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
	Handle existing CONST_POLY_INT rtxes.
	* expmed.h (expand_shift): Take a poly_int64 instead of a
	HOST_WIDE_INT.
	* expmed.c (expand_shift): Likewise.
	* rtlanal.c (strip_offset): New function.
	(commutative_operand_precedence): Give CONST_POLY_INT the same
	precedence as CONST_DOUBLE and put CONST_WIDE_INT between that
	and CONST_INT.
	* rtl-tests.c (const_poly_int_tests): New struct.
	(rtl_tests_c_tests): Use it.
	* simplify-rtx.c (simplify_const_unary_operation): Handle
	CONST_POLY_INT.
	(simplify_const_binary_operation): Likewise.
	(simplify_binary_operation_1): Fold additions of symbolic constants
	and CONST_POLY_INTs.
	(simplify_subreg): Handle extensions and truncations of
	CONST_POLY_INTs.
	(simplify_const_poly_int_tests): New struct.
	(simplify_rtx_c_tests): Use it.
	* wide-int.h (storage_ref): Add default constructor.
	(wide_int_ref_storage): Likewise.
	(trailing_wide_ints): Use GTY((user)).
	(trailing_wide_ints::operator[]): Add a const version.
	(trailing_wide_ints::get_precision): New function.
	(trailing_wide_ints::extra_size): Likewise.

Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi	2017-10-23 17:00:20.916834036 +0100
+++ gcc/doc/rtl.texi	2017-10-23 17:00:54.437007600 +0100
@@ -1621,6 +1621,15 @@ is accessed with the macro @code{CONST_F
 data is accessed with @code{CONST_FIXED_VALUE_HIGH}; the low part is
 accessed with @code{CONST_FIXED_VALUE_LOW}.
 
+@findex const_poly_int
+@item (const_poly_int:@var{m} [@var{c0} @var{c1} @dots{}])
+Represents a @code{poly_int}-style polynomial integer with coefficients
+@var{c0}, @var{c1}, @dots{}.  The coefficients are @code{wide_int}-based
+integers rather than rtxes.  @code{CONST_POLY_INT_COEFFS} gives the
+values of individual coefficients (which is mostly only useful in
+low-level routines) and @code{const_poly_int_value} gives the full
+@code{poly_int} value.
+
 @findex const_vector
 @item (const_vector:@var{m} [@var{x0} @var{x1} @dots{}])
 Represents a vector constant.  The square brackets stand for the vector
Index: gcc/gengenrtl.c
===================================================================
--- gcc/gengenrtl.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/gengenrtl.c	2017-10-23 17:00:54.442003055 +0100
@@ -157,6 +157,7 @@ excluded_rtx (int idx)
   return (strcmp (defs[idx].enumname, "VAR_LOCATION") == 0
 	  || strcmp (defs[idx].enumname, "CONST_DOUBLE") == 0
 	  || strcmp (defs[idx].enumname, "CONST_WIDE_INT") == 0
+	  || strcmp (defs[idx].enumname, "CONST_POLY_INT") == 0
 	  || strcmp (defs[idx].enumname, "CONST_FIXED") == 0);
 }
 
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 16:52:20.579835373 +0100
+++ gcc/rtl.h	2017-10-23 17:00:54.444001238 +0100
@@ -280,6 +280,10 @@ #define CWI_GET_NUM_ELEM(RTX)					\
 #define CWI_PUT_NUM_ELEM(RTX, NUM)					\
   (RTL_FLAG_CHECK1("CWI_PUT_NUM_ELEM", (RTX), CONST_WIDE_INT)->u2.num_elem = (NUM))
 
+struct GTY((variable_size)) const_poly_int_def {
+  trailing_wide_ints<NUM_POLY_INT_COEFFS> coeffs;
+};
+
 /* RTL expression ("rtx").  */
 
 /* The GTY "desc" and "tag" options below are a kludge: we need a desc
@@ -424,6 +428,7 @@ struct GTY((desc("0"), tag("0"),
     struct real_value rv;
     struct fixed_value fv;
     struct hwivec_def hwiv;
+    struct const_poly_int_def cpi;
   } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u;
 };
 
@@ -734,6 +739,7 @@ #define CASE_CONST_SCALAR_INT \
 #define CASE_CONST_UNIQUE \
    case CONST_INT: \
    case CONST_WIDE_INT: \
+   case CONST_POLY_INT: \
    case CONST_DOUBLE: \
    case CONST_FIXED
 
@@ -741,6 +747,7 @@ #define CASE_CONST_UNIQUE \
 #define CASE_CONST_ANY \
    case CONST_INT: \
    case CONST_WIDE_INT: \
+   case CONST_POLY_INT: \
    case CONST_DOUBLE: \
    case CONST_FIXED: \
    case CONST_VECTOR
@@ -773,6 +780,11 @@ #define CONST_INT_P(X) (GET_CODE (X) ==
 /* Predicate yielding nonzero iff X is an rtx for a constant integer.  */
 #define CONST_WIDE_INT_P(X) (GET_CODE (X) == CONST_WIDE_INT)
 
+/* Predicate yielding nonzero iff X is an rtx for a polynomial constant
+   integer.  */
+#define CONST_POLY_INT_P(X) \
+  (NUM_POLY_INT_COEFFS > 1 && GET_CODE (X) == CONST_POLY_INT)
+
 /* Predicate yielding nonzero iff X is an rtx for a constant fixed-point.  */
 #define CONST_FIXED_P(X) (GET_CODE (X) == CONST_FIXED)
 
@@ -1871,6 +1883,12 @@ #define CONST_WIDE_INT_VEC(RTX) HWIVEC_C
 #define CONST_WIDE_INT_NUNITS(RTX) CWI_GET_NUM_ELEM (RTX)
 #define CONST_WIDE_INT_ELT(RTX, N) CWI_ELT (RTX, N)
 
+/* For a CONST_POLY_INT, CONST_POLY_INT_COEFFS gives access to the
+   individual coefficients, in the form of a trailing_wide_ints structure.  */
+#define CONST_POLY_INT_COEFFS(RTX) \
+  (RTL_FLAG_CHECK1("CONST_POLY_INT_COEFFS", (RTX), \
+		   CONST_POLY_INT)->u.cpi.coeffs)
+
 /* For a CONST_DOUBLE:
 #if TARGET_SUPPORTS_WIDE_INT == 0
    For a VOIDmode, there are two integers CONST_DOUBLE_LOW is the
@@ -2184,6 +2202,84 @@ wi::max_value (machine_mode mode, signop
   return max_value (GET_MODE_PRECISION (as_a <scalar_mode> (mode)), sgn);
 }
 
+namespace wi
+{
+  typedef poly_int<NUM_POLY_INT_COEFFS,
+		   generic_wide_int <wide_int_ref_storage <false, false> > >
+    rtx_to_poly_wide_ref;
+  rtx_to_poly_wide_ref to_poly_wide (const_rtx, machine_mode);
+}
+
+/* Return the value of a CONST_POLY_INT in its native precision.  */
+
+inline wi::rtx_to_poly_wide_ref
+const_poly_int_value (const_rtx x)
+{
+  poly_int<NUM_POLY_INT_COEFFS, WIDE_INT_REF_FOR (wide_int)> res;
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    res.coeffs[i] = CONST_POLY_INT_COEFFS (x)[i];
+  return res;
+}
+
+/* Return true if X is a scalar integer or a CONST_POLY_INT.  The value
+   can then be extracted using wi::to_poly_wide.  */
+
+inline bool
+poly_int_rtx_p (const_rtx x)
+{
+  return CONST_SCALAR_INT_P (x) || CONST_POLY_INT_P (x);
+}
+
+/* Access X (which satisfies poly_int_rtx_p) as a poly_wide_int.
+   MODE is the mode of X.  */
+
+inline wi::rtx_to_poly_wide_ref
+wi::to_poly_wide (const_rtx x, machine_mode mode)
+{
+  if (CONST_POLY_INT_P (x))
+    return const_poly_int_value (x);
+  return rtx_mode_t (const_cast<rtx> (x), mode);
+}
+
+/* Return the value of X as a poly_int64.  */
+
+inline poly_int64
+rtx_to_poly_int64 (const_rtx x)
+{
+  if (CONST_POLY_INT_P (x))
+    {
+      poly_int64 res;
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	res.coeffs[i] = CONST_POLY_INT_COEFFS (x)[i].to_shwi ();
+      return res;
+    }
+  return INTVAL (x);
+}
+
+/* Return true if arbitrary value X is an integer constant that can
+   be represented as a poly_int64.  Store the value in *RES if so,
+   otherwise leave it unmodified.  */
+
+inline bool
+poly_int_rtx_p (const_rtx x, poly_int64_pod *res)
+{
+  if (CONST_INT_P (x))
+    {
+      *res = INTVAL (x);
+      return true;
+    }
+  if (CONST_POLY_INT_P (x))
+    {
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	if (!wi::fits_shwi_p (CONST_POLY_INT_COEFFS (x)[i]))
+	  return false;
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	res->coeffs[i] = CONST_POLY_INT_COEFFS (x)[i].to_shwi ();
+      return true;
+    }
+  return false;
+}
+
 extern void init_rtlanal (void);
 extern int rtx_cost (rtx, machine_mode, enum rtx_code, int, bool);
 extern int address_cost (rtx, machine_mode, addr_space_t, bool);
@@ -2721,7 +2817,8 @@ #define EXTRACT_ARGS_IN_RANGE(SIZE, POS,
 
 /* In explow.c */
 extern HOST_WIDE_INT trunc_int_for_mode	(HOST_WIDE_INT, machine_mode);
-extern rtx plus_constant (machine_mode, rtx, HOST_WIDE_INT, bool = false);
+extern poly_int64 trunc_int_for_mode (poly_int64, machine_mode);
+extern rtx plus_constant (machine_mode, rtx, poly_int64, bool = false);
 extern HOST_WIDE_INT get_stack_check_protect (void);
 
 /* In rtl.c */
@@ -3032,13 +3129,11 @@ extern void end_sequence (void);
 extern double_int rtx_to_double_int (const_rtx);
 #endif
 extern void cwi_output_hex (FILE *, const_rtx);
-#ifndef GENERATOR_FILE
-extern rtx immed_wide_int_const (const wide_int_ref &, machine_mode);
-#endif
 #if TARGET_SUPPORTS_WIDE_INT == 0
 extern rtx immed_double_const (HOST_WIDE_INT, HOST_WIDE_INT,
 			       machine_mode);
 #endif
+extern rtx immed_wide_int_const (const poly_wide_int_ref &, machine_mode);
 
 /* In varasm.c  */
 extern rtx force_const_mem (machine_mode, rtx);
@@ -3226,6 +3321,7 @@ extern HOST_WIDE_INT get_integer_term (c
 extern rtx get_related_value (const_rtx);
 extern bool offset_within_block_p (const_rtx, HOST_WIDE_INT);
 extern void split_const (rtx, rtx *, rtx *);
+extern rtx strip_offset (rtx, poly_int64_pod *);
 extern bool unsigned_reg_p (rtx);
 extern int reg_mentioned_p (const_rtx, const_rtx);
 extern int count_occurrences (const_rtx, const_rtx, int);
@@ -4160,6 +4256,21 @@ load_extend_op (machine_mode mode)
   return UNKNOWN;
 }
 
+/* If X is a PLUS of a base and a constant offset, add the constant to *OFFSET
+   and return the base.  Return X otherwise.  */
+
+inline rtx
+strip_offset_and_add (rtx x, poly_int64_pod *offset)
+{
+  if (GET_CODE (x) == PLUS)
+    {
+      poly_int64 suboffset;
+      x = strip_offset (x, &suboffset);
+      *offset += suboffset;
+    }
+  return x;
+}
+
 /* gtype-desc.c.  */
 extern void gt_ggc_mx (rtx &);
 extern void gt_pch_nx (rtx &);
Index: gcc/rtl.def
===================================================================
--- gcc/rtl.def	2017-10-23 16:52:20.579835373 +0100
+++ gcc/rtl.def	2017-10-23 17:00:54.443002147 +0100
@@ -348,6 +348,9 @@ DEF_RTL_EXPR(CONST_INT, "const_int", "w"
 /* numeric integer constant */
 DEF_RTL_EXPR(CONST_WIDE_INT, "const_wide_int", "", RTX_CONST_OBJ)
 
+/* An rtx representation of a poly_wide_int.  */
+DEF_RTL_EXPR(CONST_POLY_INT, "const_poly_int", "", RTX_CONST_OBJ)
+
 /* fixed-point constant */
 DEF_RTL_EXPR(CONST_FIXED, "const_fixed", "www", RTX_CONST_OBJ)
 
Index: gcc/rtl.c
===================================================================
--- gcc/rtl.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/rtl.c	2017-10-23 17:00:54.443002147 +0100
@@ -189,6 +189,10 @@ rtx_size (const_rtx x)
 	    + sizeof (struct hwivec_def)
 	    + ((CONST_WIDE_INT_NUNITS (x) - 1)
 	       * sizeof (HOST_WIDE_INT)));
+  if (CONST_POLY_INT_P (x))
+    return (RTX_HDR_SIZE
+	    + sizeof (struct const_poly_int_def)
+	    + CONST_POLY_INT_COEFFS (x).extra_size ());
   if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_HAS_BLOCK_INFO_P (x))
     return RTX_HDR_SIZE + sizeof (struct block_symbol);
   return RTX_CODE_SIZE (GET_CODE (x));
@@ -257,9 +261,10 @@ shared_const_p (const_rtx orig)
 
   /* CONST can be shared if it contains a SYMBOL_REF.  If it contains
      a LABEL_REF, it isn't sharable.  */
+  poly_int64 offset;
   return (GET_CODE (XEXP (orig, 0)) == PLUS
 	  && GET_CODE (XEXP (XEXP (orig, 0), 0)) == SYMBOL_REF
-	  && CONST_INT_P (XEXP (XEXP (orig, 0), 1)));
+	  && poly_int_rtx_p (XEXP (XEXP (orig, 0), 1), &offset));
 }
 
 
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-10-23 16:52:20.579835373 +0100
+++ gcc/emit-rtl.h	2017-10-23 17:00:54.440004873 +0100
@@ -362,14 +362,14 @@ extern rtvec gen_rtvec (int, ...);
 extern rtx copy_insn_1 (rtx);
 extern rtx copy_insn (rtx);
 extern rtx_insn *copy_delay_slot_insn (rtx_insn *);
-extern rtx gen_int_mode (HOST_WIDE_INT, machine_mode);
+extern rtx gen_int_mode (poly_int64, machine_mode);
 extern rtx_insn *emit_copy_of_insn_after (rtx_insn *, rtx_insn *);
 extern void set_reg_attrs_from_value (rtx, rtx);
 extern void set_reg_attrs_for_parm (rtx, rtx);
 extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
 extern void adjust_reg_mode (rtx, machine_mode);
 extern int mem_expr_equal_p (const_tree, const_tree);
-extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
+extern rtx gen_int_shift_amount (machine_mode, poly_int64);
 
 extern bool need_atomic_barrier_p (enum memmodel, bool);
 
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/emit-rtl.c	2017-10-23 17:00:54.440004873 +0100
@@ -148,6 +148,16 @@ struct const_wide_int_hasher : ggc_cache
 
 static GTY ((cache)) hash_table<const_wide_int_hasher> *const_wide_int_htab;
 
+struct const_poly_int_hasher : ggc_cache_ptr_hash<rtx_def>
+{
+  typedef std::pair<machine_mode, poly_wide_int_ref> compare_type;
+
+  static hashval_t hash (rtx x);
+  static bool equal (rtx x, const compare_type &y);
+};
+
+static GTY ((cache)) hash_table<const_poly_int_hasher> *const_poly_int_htab;
+
 /* A hash table storing register attribute structures.  */
 struct reg_attr_hasher : ggc_cache_ptr_hash<reg_attrs>
 {
@@ -257,6 +267,31 @@ const_wide_int_hasher::equal (rtx x, rtx
 }
 #endif
 
+/* Returns a hash code for CONST_POLY_INT X.  */
+
+hashval_t
+const_poly_int_hasher::hash (rtx x)
+{
+  inchash::hash h;
+  h.add_int (GET_MODE (x));
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    h.add_wide_int (CONST_POLY_INT_COEFFS (x)[i]);
+  return h.end ();
+}
+
+/* Returns nonzero if CONST_POLY_INT X is an rtx representation of Y.  */
+
+bool
+const_poly_int_hasher::equal (rtx x, const compare_type &y)
+{
+  if (GET_MODE (x) != y.first)
+    return false;
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    if (CONST_POLY_INT_COEFFS (x)[i] != y.second.coeffs[i])
+      return false;
+  return true;
+}
+
 /* Returns a hash code for X (which is really a CONST_DOUBLE).  */
 hashval_t
 const_double_hasher::hash (rtx x)
@@ -520,9 +555,13 @@ gen_rtx_CONST_INT (machine_mode mode ATT
 }
 
 rtx
-gen_int_mode (HOST_WIDE_INT c, machine_mode mode)
+gen_int_mode (poly_int64 c, machine_mode mode)
 {
-  return GEN_INT (trunc_int_for_mode (c, mode));
+  c = trunc_int_for_mode (c, mode);
+  if (c.is_constant ())
+    return GEN_INT (c.coeffs[0]);
+  unsigned int prec = GET_MODE_PRECISION (as_a <scalar_mode> (mode));
+  return immed_wide_int_const (poly_wide_int::from (c, prec, SIGNED), mode);
 }
 
 /* CONST_DOUBLEs might be created from pairs of integers, or from
@@ -626,8 +665,8 @@ lookup_const_wide_int (rtx wint)
    a CONST_DOUBLE (if !TARGET_SUPPORTS_WIDE_INT) or a CONST_WIDE_INT
    (if TARGET_SUPPORTS_WIDE_INT).  */
 
-rtx
-immed_wide_int_const (const wide_int_ref &v, machine_mode mode)
+static rtx
+immed_wide_int_const_1 (const wide_int_ref &v, machine_mode mode)
 {
   unsigned int len = v.get_len ();
   /* Not scalar_int_mode because we also allow pointer bound modes.  */
@@ -714,6 +753,53 @@ immed_double_const (HOST_WIDE_INT i0, HO
 }
 #endif
 
+/* Return an rtx representation of C in mode MODE.  */
+
+rtx
+immed_wide_int_const (const poly_wide_int_ref &c, machine_mode mode)
+{
+  if (c.is_constant ())
+    return immed_wide_int_const_1 (c.coeffs[0], mode);
+
+  /* Not scalar_int_mode because we also allow pointer bound modes.  */
+  unsigned int prec = GET_MODE_PRECISION (as_a <scalar_mode> (mode));
+
+  /* Allow truncation but not extension since we do not know if the
+     number is signed or unsigned.  */
+  gcc_assert (prec <= c.coeffs[0].get_precision ());
+  poly_wide_int newc = poly_wide_int::from (c, prec, SIGNED);
+
+  /* See whether we already have an rtx for this constant.  */
+  inchash::hash h;
+  h.add_int (mode);
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    h.add_wide_int (newc.coeffs[i]);
+  const_poly_int_hasher::compare_type typed_value (mode, newc);
+  rtx *slot = const_poly_int_htab->find_slot_with_hash (typed_value,
+							h.end (), INSERT);
+  rtx x = *slot;
+  if (x)
+    return x;
+
+  /* Create a new rtx.  There's a choice to be made here between installing
+     the actual mode of the rtx or leaving it as VOIDmode (for consistency
+     with CONST_INT).  In practice the handling of the codes is different
+     enough that we get no benefit from using VOIDmode, and various places
+     assume that VOIDmode implies CONST_INT.  Using the real mode seems like
+     the right long-term direction anyway.  */
+  typedef trailing_wide_ints<NUM_POLY_INT_COEFFS> twi;
+  size_t extra_size = twi::extra_size (prec);
+  x = rtx_alloc_v (CONST_POLY_INT,
+		   sizeof (struct const_poly_int_def) + extra_size);
+  PUT_MODE (x, mode);
+  CONST_POLY_INT_COEFFS (x).set_precision (prec);
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    CONST_POLY_INT_COEFFS (x)[i] = newc.coeffs[i];
+
+  *slot = x;
+  return x;
+}
+
 rtx
 gen_rtx_REG (machine_mode mode, unsigned int regno)
 {
@@ -1502,7 +1588,8 @@ gen_lowpart_common (machine_mode mode, r
     }
   else if (GET_CODE (x) == SUBREG || REG_P (x)
 	   || GET_CODE (x) == CONCAT || const_vec_p (x)
-	   || CONST_DOUBLE_AS_FLOAT_P (x) || CONST_SCALAR_INT_P (x))
+	   || CONST_DOUBLE_AS_FLOAT_P (x) || CONST_SCALAR_INT_P (x)
+	   || CONST_POLY_INT_P (x))
     return lowpart_subreg (mode, x, innermode);
 
   /* Otherwise, we can't do this.  */
@@ -6089,6 +6176,9 @@ init_emit_once (void)
 #endif
   const_double_htab = hash_table<const_double_hasher>::create_ggc (37);
 
+  if (NUM_POLY_INT_COEFFS > 1)
+    const_poly_int_htab = hash_table<const_poly_int_hasher>::create_ggc (37);
+
   const_fixed_htab = hash_table<const_fixed_hasher>::create_ggc (37);
 
   reg_attrs_htab = hash_table<reg_attr_hasher>::create_ggc (37);
@@ -6482,7 +6572,7 @@ need_atomic_barrier_p (enum memmodel mod
    by VALUE bits.  */
 
 rtx
-gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
+gen_int_shift_amount (machine_mode mode, poly_int64 value)
 {
   return gen_int_mode (value, get_shift_amount_mode (mode));
 }
Index: gcc/cse.c
===================================================================
--- gcc/cse.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/cse.c	2017-10-23 17:00:54.436008509 +0100
@@ -2323,6 +2323,15 @@ hash_rtx_cb (const_rtx x, machine_mode m
 	hash += CONST_WIDE_INT_ELT (x, i);
       return hash;
 
+    case CONST_POLY_INT:
+      {
+	inchash::hash h;
+	h.add_int (hash);
+	for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	  h.add_wide_int (CONST_POLY_INT_COEFFS (x)[i]);
+	return h.end ();
+      }
+
     case CONST_DOUBLE:
       /* This is like the general case, except that it only counts
 	 the integers representing the constant.  */
@@ -3781,6 +3790,8 @@ equiv_constant (rtx x)
       /* See if we previously assigned a constant value to this SUBREG.  */
       if ((new_rtx = lookup_as_function (x, CONST_INT)) != 0
 	  || (new_rtx = lookup_as_function (x, CONST_WIDE_INT)) != 0
+	  || (NUM_POLY_INT_COEFFS > 1
+	      && (new_rtx = lookup_as_function (x, CONST_POLY_INT)) != 0)
           || (new_rtx = lookup_as_function (x, CONST_DOUBLE)) != 0
           || (new_rtx = lookup_as_function (x, CONST_FIXED)) != 0)
         return new_rtx;
Index: gcc/cselib.c
===================================================================
--- gcc/cselib.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/cselib.c	2017-10-23 17:00:54.436008509 +0100
@@ -1128,6 +1128,15 @@ cselib_hash_rtx (rtx x, int create, mach
 	hash += CONST_WIDE_INT_ELT (x, i);
       return hash;
 
+    case CONST_POLY_INT:
+      {
+	inchash::hash h;
+	h.add_int (hash);
+	for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	  h.add_wide_int (CONST_POLY_INT_COEFFS (x)[i]);
+	return h.end ();
+      }
+
     case CONST_DOUBLE:
       /* This is like the general case, except that it only counts
 	 the integers representing the constant.  */
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/dwarf2out.c	2017-10-23 17:00:54.439005782 +0100
@@ -13753,6 +13753,9 @@ const_ok_for_output_1 (rtx rtl)
       return false;
     }
 
+  if (CONST_POLY_INT_P (rtl))
+    return false;
+
   if (targetm.const_not_ok_for_debug_p (rtl))
     {
       expansion_failed (NULL_TREE, rtl,
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/expr.c	2017-10-23 17:00:54.442003055 +0100
@@ -692,6 +692,7 @@ convert_modes (machine_mode mode, machin
       && is_int_mode (oldmode, &int_oldmode)
       && GET_MODE_PRECISION (int_mode) <= GET_MODE_PRECISION (int_oldmode)
       && ((MEM_P (x) && !MEM_VOLATILE_P (x) && direct_load[(int) int_mode])
+	  || CONST_POLY_INT_P (x)
           || (REG_P (x)
               && (!HARD_REGISTER_P (x)
 		  || targetm.hard_regno_mode_ok (REGNO (x), int_mode))
Index: gcc/print-rtl.c
===================================================================
--- gcc/print-rtl.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/print-rtl.c	2017-10-23 17:00:54.443002147 +0100
@@ -898,6 +898,17 @@ rtx_writer::print_rtx (const_rtx in_rtx)
       fprintf (m_outfile, " ");
       cwi_output_hex (m_outfile, in_rtx);
       break;
+
+    case CONST_POLY_INT:
+      fprintf (m_outfile, " [");
+      print_dec (CONST_POLY_INT_COEFFS (in_rtx)[0], m_outfile, SIGNED);
+      for (unsigned int i = 1; i < NUM_POLY_INT_COEFFS; ++i)
+	{
+	  fprintf (m_outfile, ", ");
+	  print_dec (CONST_POLY_INT_COEFFS (in_rtx)[i], m_outfile, SIGNED);
+	}
+      fprintf (m_outfile, "]");
+      break;
 #endif
 
     case CODE_LABEL:
@@ -1568,6 +1579,17 @@ print_value (pretty_printer *pp, const_r
       }
       break;
 
+    case CONST_POLY_INT:
+      pp_left_bracket (pp);
+      pp_wide_int (pp, CONST_POLY_INT_COEFFS (x)[0], SIGNED);
+      for (unsigned int i = 1; i < NUM_POLY_INT_COEFFS; ++i)
+	{
+	  pp_string (pp, ", ");
+	  pp_wide_int (pp, CONST_POLY_INT_COEFFS (x)[i], SIGNED);
+	}
+      pp_right_bracket (pp);
+      break;
+
     case CONST_DOUBLE:
       if (FLOAT_MODE_P (GET_MODE (x)))
 	{
Index: gcc/rtlhash.c
===================================================================
--- gcc/rtlhash.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/rtlhash.c	2017-10-23 17:00:54.444001238 +0100
@@ -55,6 +55,10 @@ add_rtx (const_rtx x, hash &hstate)
       for (i = 0; i < CONST_WIDE_INT_NUNITS (x); i++)
 	hstate.add_object (CONST_WIDE_INT_ELT (x, i));
       return;
+    case CONST_POLY_INT:
+      for (i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	hstate.add_wide_int (CONST_POLY_INT_COEFFS (x)[i]);
+      break;
     case SYMBOL_REF:
       if (XSTR (x, 0))
 	hstate.add (XSTR (x, 0), strlen (XSTR (x, 0)) + 1);
Index: gcc/explow.c
===================================================================
--- gcc/explow.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/explow.c	2017-10-23 17:00:54.440004873 +0100
@@ -77,13 +77,23 @@ trunc_int_for_mode (HOST_WIDE_INT c, mac
   return c;
 }
 
+/* Likewise for polynomial values, using the sign-extended representation
+   for each individual coefficient.  */
+
+poly_int64
+trunc_int_for_mode (poly_int64 x, machine_mode mode)
+{
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    x.coeffs[i] = trunc_int_for_mode (x.coeffs[i], mode);
+  return x;
+}
+
 /* Return an rtx for the sum of X and the integer C, given that X has
    mode MODE.  INPLACE is true if X can be modified inplace or false
    if it must be treated as immutable.  */
 
 rtx
-plus_constant (machine_mode mode, rtx x, HOST_WIDE_INT c,
-	       bool inplace)
+plus_constant (machine_mode mode, rtx x, poly_int64 c, bool inplace)
 {
   RTX_CODE code;
   rtx y;
@@ -92,7 +102,7 @@ plus_constant (machine_mode mode, rtx x,
 
   gcc_assert (GET_MODE (x) == VOIDmode || GET_MODE (x) == mode);
 
-  if (c == 0)
+  if (known_zero (c))
     return x;
 
  restart:
@@ -180,10 +190,12 @@ plus_constant (machine_mode mode, rtx x,
       break;
 
     default:
+      if (CONST_POLY_INT_P (x))
+	return immed_wide_int_const (const_poly_int_value (x) + c, mode);
       break;
     }
 
-  if (c != 0)
+  if (maybe_nonzero (c))
     x = gen_rtx_PLUS (mode, x, gen_int_mode (c, mode));
 
   if (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF)
Index: gcc/expmed.h
===================================================================
--- gcc/expmed.h	2017-10-23 16:52:20.579835373 +0100
+++ gcc/expmed.h	2017-10-23 17:00:54.441003964 +0100
@@ -712,8 +712,8 @@ extern unsigned HOST_WIDE_INT choose_mul
 #ifdef TREE_CODE
 extern rtx expand_variable_shift (enum tree_code, machine_mode,
 				  rtx, tree, rtx, int);
-extern rtx expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
-			     int);
+extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
+			 int);
 extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
 			  rtx, int);
 #endif
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/expmed.c	2017-10-23 17:00:54.441003964 +0100
@@ -2541,7 +2541,7 @@ expand_shift_1 (enum tree_code code, mac
 
 rtx
 expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
-	      int amount, rtx target, int unsignedp)
+	      poly_int64 amount, rtx target, int unsignedp)
 {
   return expand_shift_1 (code, mode, shifted,
 			 gen_int_shift_amount (mode, amount),
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/rtlanal.c	2017-10-23 17:00:54.444001238 +0100
@@ -915,6 +915,28 @@ split_const (rtx x, rtx *base_out, rtx *
   *base_out = x;
   *offset_out = const0_rtx;
 }
+
+/* Express integer value X as some value Y plus a polynomial offset,
+   where Y is either const0_rtx, X or something within X (as opposed
+   to a new rtx).  Return the Y and store the offset in *OFFSET_OUT.  */
+
+rtx
+strip_offset (rtx x, poly_int64_pod *offset_out)
+{
+  rtx base = const0_rtx;
+  rtx test = x;
+  if (GET_CODE (test) == CONST)
+    test = XEXP (test, 0);
+  if (GET_CODE (test) == PLUS)
+    {
+      base = XEXP (test, 0);
+      test = XEXP (test, 1);
+    }
+  if (poly_int_rtx_p (test, offset_out))
+    return base;
+  *offset_out = 0;
+  return x;
+}
 \f
 /* Return the number of places FIND appears within X.  If COUNT_DEST is
    zero, we do not count occurrences inside the destination of a SET.  */
@@ -3406,13 +3428,15 @@ commutative_operand_precedence (rtx op)
 
   /* Constants always become the second operand.  Prefer "nice" constants.  */
   if (code == CONST_INT)
-    return -8;
+    return -10;
   if (code == CONST_WIDE_INT)
-    return -7;
+    return -9;
+  if (code == CONST_POLY_INT)
+    return -8;
   if (code == CONST_DOUBLE)
-    return -7;
+    return -8;
   if (code == CONST_FIXED)
-    return -7;
+    return -8;
   op = avoid_constant_pool_reference (op);
   code = GET_CODE (op);
 
@@ -3420,13 +3444,15 @@ commutative_operand_precedence (rtx op)
     {
     case RTX_CONST_OBJ:
       if (code == CONST_INT)
-        return -6;
+	return -7;
       if (code == CONST_WIDE_INT)
-        return -6;
+	return -6;
+      if (code == CONST_POLY_INT)
+	return -5;
       if (code == CONST_DOUBLE)
-        return -5;
+	return -5;
       if (code == CONST_FIXED)
-        return -5;
+	return -5;
       return -4;
 
     case RTX_EXTRA:
Index: gcc/rtl-tests.c
===================================================================
--- gcc/rtl-tests.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/rtl-tests.c	2017-10-23 17:00:54.443002147 +0100
@@ -228,6 +228,62 @@ test_uncond_jump ()
 		      jump_insn);
 }
 
+template<unsigned int N>
+struct const_poly_int_tests
+{
+  static void run ();
+};
+
+template<>
+struct const_poly_int_tests<1>
+{
+  static void run () {}
+};
+
+/* Test various CONST_POLY_INT properties.  */
+
+template<unsigned int N>
+void
+const_poly_int_tests<N>::run ()
+{
+  rtx x1 = gen_int_mode (poly_int64 (1, 1), QImode);
+  rtx x255 = gen_int_mode (poly_int64 (1, 255), QImode);
+
+  /* Test that constants are unique.  */
+  ASSERT_EQ (x1, gen_int_mode (poly_int64 (1, 1), QImode));
+  ASSERT_NE (x1, gen_int_mode (poly_int64 (1, 1), HImode));
+  ASSERT_NE (x1, x255);
+
+  /* Test const_poly_int_value.  */
+  ASSERT_MUST_EQ (const_poly_int_value (x1), poly_int64 (1, 1));
+  ASSERT_MUST_EQ (const_poly_int_value (x255), poly_int64 (1, -1));
+
+  /* Test rtx_to_poly_int64.  */
+  ASSERT_MUST_EQ (rtx_to_poly_int64 (x1), poly_int64 (1, 1));
+  ASSERT_MUST_EQ (rtx_to_poly_int64 (x255), poly_int64 (1, -1));
+  ASSERT_MAY_NE (rtx_to_poly_int64 (x255), poly_int64 (1, 255));
+
+  /* Test plus_constant of a symbol.  */
+  rtx symbol = gen_rtx_SYMBOL_REF (Pmode, "foo");
+  rtx offset1 = gen_int_mode (poly_int64 (9, 11), Pmode);
+  rtx sum1 = gen_rtx_CONST (Pmode, gen_rtx_PLUS (Pmode, symbol, offset1));
+  ASSERT_RTX_EQ (plus_constant (Pmode, symbol, poly_int64 (9, 11)), sum1);
+
+  /* Test plus_constant of a CONST.  */
+  rtx offset2 = gen_int_mode (poly_int64 (12, 20), Pmode);
+  rtx sum2 = gen_rtx_CONST (Pmode, gen_rtx_PLUS (Pmode, symbol, offset2));
+  ASSERT_RTX_EQ (plus_constant (Pmode, sum1, poly_int64 (3, 9)), sum2);
+
+  /* Test a cancelling plus_constant.  */
+  ASSERT_EQ (plus_constant (Pmode, sum2, poly_int64 (-12, -20)), symbol);
+
+  /* Test plus_constant on integer constants.  */
+  ASSERT_EQ (plus_constant (QImode, const1_rtx, poly_int64 (4, -2)),
+	     gen_int_mode (poly_int64 (5, -2), QImode));
+  ASSERT_EQ (plus_constant (QImode, x1, poly_int64 (4, -2)),
+	     gen_int_mode (poly_int64 (5, -1), QImode));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -238,6 +294,7 @@ rtl_tests_c_tests ()
   test_dumping_rtx_reuse ();
   test_single_set ();
   test_uncond_jump ();
+  const_poly_int_tests<NUM_POLY_INT_COEFFS>::run ();
 
   /* Purge state.  */
   set_first_insn (NULL);
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 16:52:20.579835373 +0100
+++ gcc/simplify-rtx.c	2017-10-23 17:00:54.445000329 +0100
@@ -2039,6 +2039,26 @@ simplify_const_unary_operation (enum rtx
 	}
     }
 
+  /* Handle polynomial integers.  */
+  else if (CONST_POLY_INT_P (op))
+    {
+      poly_wide_int result;
+      switch (code)
+	{
+	case NEG:
+	  result = -const_poly_int_value (op);
+	  break;
+
+	case NOT:
+	  result = ~const_poly_int_value (op);
+	  break;
+
+	default:
+	  return NULL_RTX;
+	}
+      return immed_wide_int_const (result, mode);
+    }
+
   return NULL_RTX;
 }
 \f
@@ -2219,6 +2239,7 @@ simplify_binary_operation_1 (enum rtx_co
   rtx tem, reversed, opleft, opright, elt0, elt1;
   HOST_WIDE_INT val;
   scalar_int_mode int_mode, inner_mode;
+  poly_int64 offset;
 
   /* Even if we can't compute a constant result,
      there are some cases worth simplifying.  */
@@ -2531,6 +2552,12 @@ simplify_binary_operation_1 (enum rtx_co
 	    return simplify_gen_binary (MINUS, mode, tem, XEXP (op0, 0));
 	}
 
+      if ((GET_CODE (op0) == CONST
+	   || GET_CODE (op0) == SYMBOL_REF
+	   || GET_CODE (op0) == LABEL_REF)
+	  && poly_int_rtx_p (op1, &offset))
+	return plus_constant (mode, op0, trunc_int_for_mode (-offset, mode));
+
       /* Don't let a relocatable value get a negative coeff.  */
       if (CONST_INT_P (op1) && GET_MODE (op0) != VOIDmode)
 	return simplify_gen_binary (PLUS, mode,
@@ -4325,6 +4352,57 @@ simplify_const_binary_operation (enum rt
       return immed_wide_int_const (result, int_mode);
     }
 
+  /* Handle polynomial integers.  */
+  if (NUM_POLY_INT_COEFFS > 1
+      && is_a <scalar_int_mode> (mode, &int_mode)
+      && poly_int_rtx_p (op0)
+      && poly_int_rtx_p (op1))
+    {
+      poly_wide_int result;
+      switch (code)
+	{
+	case PLUS:
+	  result = wi::to_poly_wide (op0, mode) + wi::to_poly_wide (op1, mode);
+	  break;
+
+	case MINUS:
+	  result = wi::to_poly_wide (op0, mode) - wi::to_poly_wide (op1, mode);
+	  break;
+
+	case MULT:
+	  if (CONST_SCALAR_INT_P (op1))
+	    result = wi::to_poly_wide (op0, mode) * rtx_mode_t (op1, mode);
+	  else
+	    return NULL_RTX;
+	  break;
+
+	case ASHIFT:
+	  if (CONST_SCALAR_INT_P (op1))
+	    {
+	      wide_int shift = rtx_mode_t (op1, mode);
+	      if (SHIFT_COUNT_TRUNCATED)
+		shift = wi::umod_trunc (shift, GET_MODE_PRECISION (int_mode));
+	      else if (wi::geu_p (shift, GET_MODE_PRECISION (int_mode)))
+		return NULL_RTX;
+	      result = wi::to_poly_wide (op0, mode) << shift;
+	    }
+	  else
+	    return NULL_RTX;
+	  break;
+
+	case IOR:
+	  if (!CONST_SCALAR_INT_P (op1)
+	      || !can_ior_p (wi::to_poly_wide (op0, mode),
+			     rtx_mode_t (op1, mode), &result))
+	    return NULL_RTX;
+	  break;
+
+	default:
+	  return NULL_RTX;
+	}
+      return immed_wide_int_const (result, int_mode);
+    }
+
   return NULL_RTX;
 }
 
@@ -6317,13 +6395,27 @@ simplify_subreg (machine_mode outermode,
   scalar_int_mode int_outermode, int_innermode;
   if (is_a <scalar_int_mode> (outermode, &int_outermode)
       && is_a <scalar_int_mode> (innermode, &int_innermode)
-      && (GET_MODE_PRECISION (int_outermode)
-	  < GET_MODE_PRECISION (int_innermode))
       && byte == subreg_lowpart_offset (int_outermode, int_innermode))
     {
-      rtx tem = simplify_truncation (int_outermode, op, int_innermode);
-      if (tem)
-	return tem;
+      /* Handle polynomial integers.  The upper bits of a paradoxical
+	 subreg are undefined, so this is safe regardless of whether
+	 we're truncating or extending.  */
+      if (CONST_POLY_INT_P (op))
+	{
+	  poly_wide_int val
+	    = poly_wide_int::from (const_poly_int_value (op),
+				   GET_MODE_PRECISION (int_outermode),
+				   SIGNED);
+	  return immed_wide_int_const (val, int_outermode);
+	}
+
+      if (GET_MODE_PRECISION (int_outermode)
+	  < GET_MODE_PRECISION (int_innermode))
+	{
+	  rtx tem = simplify_truncation (int_outermode, op, int_innermode);
+	  if (tem)
+	    return tem;
+	}
     }
 
   return NULL_RTX;
@@ -6629,12 +6721,60 @@ test_vector_ops ()
     }
 }
 
+template<unsigned int N>
+struct simplify_const_poly_int_tests
+{
+  static void run ();
+};
+
+template<>
+struct simplify_const_poly_int_tests<1>
+{
+  static void run () {}
+};
+
+/* Test various CONST_POLY_INT properties.  */
+
+template<unsigned int N>
+void
+simplify_const_poly_int_tests<N>::run ()
+{
+  rtx x1 = gen_int_mode (poly_int64 (1, 1), QImode);
+  rtx x2 = gen_int_mode (poly_int64 (-80, 127), QImode);
+  rtx x3 = gen_int_mode (poly_int64 (-79, -128), QImode);
+  rtx x4 = gen_int_mode (poly_int64 (5, 4), QImode);
+  rtx x5 = gen_int_mode (poly_int64 (30, 24), QImode);
+  rtx x6 = gen_int_mode (poly_int64 (20, 16), QImode);
+  rtx x7 = gen_int_mode (poly_int64 (7, 4), QImode);
+  rtx x8 = gen_int_mode (poly_int64 (30, 24), HImode);
+  rtx x9 = gen_int_mode (poly_int64 (-30, -24), HImode);
+  rtx x10 = gen_int_mode (poly_int64 (-31, -24), HImode);
+  rtx two = GEN_INT (2);
+  rtx six = GEN_INT (6);
+  HOST_WIDE_INT offset = subreg_lowpart_offset (QImode, HImode);
+
+  /* These tests only try limited operation combinations.  Fuller arithmetic
+     testing is done directly on poly_ints.  */
+  ASSERT_EQ (simplify_unary_operation (NEG, HImode, x8, HImode), x9);
+  ASSERT_EQ (simplify_unary_operation (NOT, HImode, x8, HImode), x10);
+  ASSERT_EQ (simplify_unary_operation (TRUNCATE, QImode, x8, HImode), x5);
+  ASSERT_EQ (simplify_binary_operation (PLUS, QImode, x1, x2), x3);
+  ASSERT_EQ (simplify_binary_operation (MINUS, QImode, x3, x1), x2);
+  ASSERT_EQ (simplify_binary_operation (MULT, QImode, x4, six), x5);
+  ASSERT_EQ (simplify_binary_operation (MULT, QImode, six, x4), x5);
+  ASSERT_EQ (simplify_binary_operation (ASHIFT, QImode, x4, two), x6);
+  ASSERT_EQ (simplify_binary_operation (IOR, QImode, x4, two), x7);
+  ASSERT_EQ (simplify_subreg (HImode, x5, QImode, 0), x8);
+  ASSERT_EQ (simplify_subreg (QImode, x8, HImode, offset), x5);
+}
+
 /* Run all of the selftests within this file.  */
 
 void
 simplify_rtx_c_tests ()
 {
   test_vector_ops ();
+  simplify_const_poly_int_tests<NUM_POLY_INT_COEFFS>::run ();
 }
 
 } // namespace selftest
Index: gcc/wide-int.h
===================================================================
--- gcc/wide-int.h	2017-10-23 17:00:20.923835582 +0100
+++ gcc/wide-int.h	2017-10-23 17:00:54.445999420 +0100
@@ -613,6 +613,7 @@ #define SHIFT_FUNCTION \
      access.  */
   struct storage_ref
   {
+    storage_ref () {}
     storage_ref (const HOST_WIDE_INT *, unsigned int, unsigned int);
 
     const HOST_WIDE_INT *val;
@@ -944,6 +945,8 @@ struct wide_int_ref_storage : public wi:
   HOST_WIDE_INT scratch[2];
 
 public:
+  wide_int_ref_storage () {}
+
   wide_int_ref_storage (const wi::storage_ref &);
 
   template <typename T>
@@ -1323,7 +1326,7 @@ typedef generic_wide_int <trailing_wide_
    bytes beyond the sizeof need to be allocated.  Use set_precision
    to initialize the structure.  */
 template <int N>
-class GTY(()) trailing_wide_ints
+class GTY((user)) trailing_wide_ints
 {
 private:
   /* The shared precision of each number.  */
@@ -1340,9 +1343,14 @@ class GTY(()) trailing_wide_ints
   HOST_WIDE_INT m_val[1];
 
 public:
+  typedef WIDE_INT_REF_FOR (trailing_wide_int_storage) const_reference;
+
   void set_precision (unsigned int);
+  unsigned int get_precision () const { return m_precision; }
   trailing_wide_int operator [] (unsigned int);
+  const_reference operator [] (unsigned int) const;
   static size_t extra_size (unsigned int);
+  size_t extra_size () const { return extra_size (m_precision); }
 };
 
 inline trailing_wide_int_storage::
@@ -1414,6 +1422,14 @@ trailing_wide_ints <N>::operator [] (uns
 				    &m_val[index * m_max_len]);
 }
 
+template <int N>
+inline typename trailing_wide_ints <N>::const_reference
+trailing_wide_ints <N>::operator [] (unsigned int index) const
+{
+  return wi::storage_ref (&m_val[index * m_max_len],
+			  m_len[index], m_precision);
+}
+
 /* Return how many extra bytes need to be added to the end of the structure
    in order to handle N wide_ints of precision PRECISION.  */
 template <int N>

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [006/nnn] poly_int: tree constants
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (4 preceding siblings ...)
  2017-10-23 17:01 ` [005/nnn] poly_int: rtx constants Richard Sandiford
@ 2017-10-23 17:02 ` Richard Sandiford
  2017-10-25 17:14   ` Martin Sebor
  2017-11-17  4:51   ` Jeff Law
  2017-10-23 17:02 ` [007/nnn] poly_int: dump routines Richard Sandiford
                   ` (101 subsequent siblings)
  107 siblings, 2 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:02 UTC (permalink / raw)
  To: gcc-patches

This patch adds a tree representation for poly_ints.  Unlike the
rtx version, the coefficients are INTEGER_CSTs rather than plain
integers, so that we can easily access them as poly_widest_ints
and poly_offset_ints.

The patch also adjusts some places that previously
relied on "constant" meaning "INTEGER_CST".  It also makes
sure that the TYPE_SIZE agrees with the TYPE_SIZE_UNIT for
vector booleans, given the existing:

	/* Several boolean vector elements may fit in a single unit.  */
	if (VECTOR_BOOLEAN_TYPE_P (type)
	    && type->type_common.mode != BLKmode)
	  TYPE_SIZE_UNIT (type)
	    = size_int (GET_MODE_SIZE (type->type_common.mode));
	else
	  TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR,
						   TYPE_SIZE_UNIT (innertype),
						   size_int (nunits));


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/generic.texi (POLY_INT_CST): Document.
	* tree.def (POLY_INT_CST): New tree code.
	* treestruct.def (TS_POLY_INT_CST): New tree layout.
	* tree-core.h (tree_poly_int_cst): New struct.
	(tree_node): Add a poly_int_cst field.
	* tree.h (POLY_INT_CST_P, POLY_INT_CST_COEFF): New macros.
	(wide_int_to_tree, force_fit_type): Take a poly_wide_int_ref
	instead of a wide_int_ref.
	(build_int_cst, build_int_cst_type): Take a poly_int64 instead
	of a HOST_WIDE_INT.
	(build_int_cstu, build_array_type_nelts): Take a poly_uint64
	instead of an unsigned HOST_WIDE_INT.
	(build_poly_int_cst, tree_fits_poly_int64_p, tree_fits_poly_uint64_p)
	(ptrdiff_tree_p): Declare.
	(tree_to_poly_int64, tree_to_poly_uint64): Likewise.  Provide
	extern inline implementations if the target doesn't use POLY_INT_CST.
	(poly_int_tree_p): New function.
	(wi::unextended_tree): New class.
	(wi::int_traits <unextended_tree>): New override.
	(wi::extended_tree): Add a default constructor.
	(wi::extended_tree::get_tree): New function.
	(wi::widest_extended_tree, wi::offset_extended_tree): New typedefs.
	(wi::tree_to_widest_ref, wi::tree_to_offset_ref): Use them.
	(wi::tree_to_poly_widest_ref, wi::tree_to_poly_offset_ref)
	(wi::tree_to_poly_wide_ref): New typedefs.
	(wi::ints_for): Provide overloads for extended_tree and
	unextended_tree.
	(poly_int_cst_value, wi::to_poly_widest, wi::to_poly_offset)
	(wi::to_wide): New functions.
	(wi::fits_to_boolean_p, wi::fits_to_tree_p): Handle poly_ints.
	* tree.c (poly_int_cst_hasher): New struct.
	(poly_int_cst_hash_table): New variable.
	(tree_node_structure_for_code, tree_code_size, simple_cst_equal)
	(valid_constant_size_p, add_expr, drop_tree_overflow): Handle
	POLY_INT_CST.
	(initialize_tree_contains_struct): Handle TS_POLY_INT_CST.
	(init_ttree): Initialize poly_int_cst_hash_table.
	(build_int_cst, build_int_cst_type, build_invariant_address): Take
	a poly_int64 instead of a HOST_WIDE_INT.
	(build_int_cstu, build_array_type_nelts): Take a poly_uint64
	instead of an unsigned HOST_WIDE_INT.
	(wide_int_to_tree): Rename to...
	(wide_int_to_tree_1): ...this.
	(build_new_poly_int_cst, build_poly_int_cst): New functions.
	(force_fit_type): Take a poly_wide_int_ref instead of a wide_int_ref.
	(wide_int_to_tree): New function that takes a poly_wide_int_ref.
	(ptrdiff_tree_p, tree_to_poly_int64, tree_to_poly_uint64)
	(tree_fits_poly_int64_p, tree_fits_poly_uint64_p): New functions.
	* lto-streamer-out.c (DFS::DFS_write_tree_body, hash_tree): Handle
	TS_POLY_INT_CST.
	* tree-streamer-in.c (lto_input_ts_poly_tree_pointers): Likewise.
	(streamer_read_tree_body): Likewise.
	* tree-streamer-out.c (write_ts_poly_tree_pointers): Likewise.
	(streamer_write_tree_body): Likewise.
	* tree-streamer.c (streamer_check_handled_ts_structures): Likewise.
	* asan.c (asan_protect_global): Require the size to be an INTEGER_CST.
	* cfgexpand.c (expand_debug_expr): Handle POLY_INT_CST.
	* expr.c (const_vector_element, expand_expr_real_1): Likewise.
	* gimple-expr.h (is_gimple_constant): Likewise.
	* gimplify.c (maybe_with_size_expr): Likewise.
	* print-tree.c (print_node): Likewise.
	* tree-data-ref.c (data_ref_compare_tree): Likewise.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	* tree-ssa-address.c (addr_for_mem_ref): Likewise.
	* tree-vect-data-refs.c (dr_group_sort_cmp): Likewise.
	* tree-vrp.c (compare_values_warnv): Likewise.
	* tree-ssa-loop-ivopts.c (determine_base_object, constant_multiple_of)
	(get_loop_invariant_expr, add_candidate_1, get_computation_aff_1)
	(force_expr_to_var_cost): Likewise.
	* tree-ssa-loop.c (for_each_index): Likewise.
	* fold-const.h (build_invariant_address, size_int_kind): Take a
	poly_int64 instead of a HOST_WIDE_INT.
	* fold-const.c (fold_negate_expr_1, const_binop, const_unop)
	(fold_convert_const, multiple_of_p, fold_negate_const): Handle
	POLY_INT_CST.
	(size_binop_loc): Likewise.  Allow int_const_binop_1 to fail.
	(int_const_binop_2): New function, split out from...
	(int_const_binop_1): ...here.  Handle POLY_INT_CST.
	(size_int_kind): Take a poly_int64 instead of a HOST_WIDE_INT.
	* expmed.c (make_tree): Handle CONST_POLY_INT_P.
	* gimple-ssa-strength-reduction.c (slsr_process_add)
	(slsr_process_mul): Check for INTEGER_CSTs before using them
	as candidates.
	* stor-layout.c (bits_from_bytes): New function.
	(bit_from_pos): Use it.
	(layout_type): Likewise.  For vectors, multiply the TYPE_SIZE_UNIT
	by BITS_PER_UNIT to get the TYPE_SIZE.
	* tree-cfg.c (verify_expr, verify_types_in_gimple_reference): Allow
	MEM_REF and TARGET_MEM_REF offsets to be a POLY_INT_CST.

Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi	2017-10-23 16:52:20.504766418 +0100
+++ gcc/doc/generic.texi	2017-10-23 17:00:57.771973825 +0100
@@ -1039,6 +1039,7 @@ As this example indicates, the operands
 @tindex VEC_DUPLICATE_CST
 @tindex VEC_SERIES_CST
 @tindex STRING_CST
+@tindex POLY_INT_CST
 @findex TREE_STRING_LENGTH
 @findex TREE_STRING_POINTER
 
@@ -1128,6 +1129,16 @@ of the @code{STRING_CST}.
 FIXME: The formats of string constants are not well-defined when the
 target system bytes are not the same width as host system bytes.
 
+@item POLY_INT_CST
+These nodes represent invariants that depend on some target-specific
+runtime parameters.  They consist of @code{NUM_POLY_INT_COEFFS}
+coefficients, with the first coefficient being the constant term and
+the others being multipliers that are applied to the runtime parameters.
+
+@code{POLY_INT_CST_ELT (@var{x}, @var{i})} references coefficient number
+@var{i} of @code{POLY_INT_CST} node @var{x}.  Each coefficient is an
+@code{INTEGER_CST}.
+
 @end table
 
 @node Storage References
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree.def	2017-10-23 17:00:57.783962919 +0100
@@ -291,6 +291,9 @@ DEFTREECODE (VOID_CST, "void_cst", tcc_c
    some circumstances.  */
 DEFTREECODE (INTEGER_CST, "integer_cst", tcc_constant, 0)
 
+/* Contents are given by POLY_INT_CST_COEFF.  */
+DEFTREECODE (POLY_INT_CST, "poly_int_cst", tcc_constant, 0)
+
 /* Contents are in TREE_REAL_CST field.  */
 DEFTREECODE (REAL_CST, "real_cst", tcc_constant, 0)
 
Index: gcc/treestruct.def
===================================================================
--- gcc/treestruct.def	2017-10-23 16:52:20.504766418 +0100
+++ gcc/treestruct.def	2017-10-23 17:00:57.784962010 +0100
@@ -34,6 +34,7 @@ DEFTREESTRUCT(TS_BASE, "base")
 DEFTREESTRUCT(TS_TYPED, "typed")
 DEFTREESTRUCT(TS_COMMON, "common")
 DEFTREESTRUCT(TS_INT_CST, "integer cst")
+DEFTREESTRUCT(TS_POLY_INT_CST, "poly_int_cst")
 DEFTREESTRUCT(TS_REAL_CST, "real cst")
 DEFTREESTRUCT(TS_FIXED_CST, "fixed cst")
 DEFTREESTRUCT(TS_VECTOR, "vector")
Index: gcc/tree-core.h
===================================================================
--- gcc/tree-core.h	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-core.h	2017-10-23 17:00:57.778967463 +0100
@@ -1336,6 +1336,11 @@ struct GTY(()) tree_vector {
   tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1];
 };
 
+struct GTY(()) tree_poly_int_cst {
+  struct tree_typed typed;
+  tree coeffs[NUM_POLY_INT_COEFFS];
+};
+
 struct GTY(()) tree_identifier {
   struct tree_common common;
   struct ht_identifier id;
@@ -1861,6 +1866,7 @@ union GTY ((ptr_alias (union lang_tree_n
   struct tree_typed GTY ((tag ("TS_TYPED"))) typed;
   struct tree_common GTY ((tag ("TS_COMMON"))) common;
   struct tree_int_cst GTY ((tag ("TS_INT_CST"))) int_cst;
+  struct tree_poly_int_cst GTY ((tag ("TS_POLY_INT_CST"))) poly_int_cst;
   struct tree_real_cst GTY ((tag ("TS_REAL_CST"))) real_cst;
   struct tree_fixed_cst GTY ((tag ("TS_FIXED_CST"))) fixed_cst;
   struct tree_vector GTY ((tag ("TS_VECTOR"))) vector;
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree.h	2017-10-23 17:00:57.784962010 +0100
@@ -1008,6 +1008,15 @@ #define TREE_INT_CST_ELT(NODE, I) TREE_I
 #define TREE_INT_CST_LOW(NODE) \
   ((unsigned HOST_WIDE_INT) TREE_INT_CST_ELT (NODE, 0))
 
+/* Return true if NODE is a POLY_INT_CST.  This is only ever true on
+   targets with variable-sized modes.  */
+#define POLY_INT_CST_P(NODE) \
+  (NUM_POLY_INT_COEFFS > 1 && TREE_CODE (NODE) == POLY_INT_CST)
+
+/* In a POLY_INT_CST node.  */
+#define POLY_INT_CST_COEFF(NODE, I) \
+  (POLY_INT_CST_CHECK (NODE)->poly_int_cst.coeffs[I])
+
 #define TREE_REAL_CST_PTR(NODE) (REAL_CST_CHECK (NODE)->real_cst.real_cst_ptr)
 #define TREE_REAL_CST(NODE) (*TREE_REAL_CST_PTR (NODE))
 
@@ -4025,15 +4034,15 @@ build5_loc (location_t loc, enum tree_co
 
 extern tree double_int_to_tree (tree, double_int);
 
-extern tree wide_int_to_tree (tree type, const wide_int_ref &cst);
-extern tree force_fit_type (tree, const wide_int_ref &, int, bool);
+extern tree wide_int_to_tree (tree type, const poly_wide_int_ref &cst);
+extern tree force_fit_type (tree, const poly_wide_int_ref &, int, bool);
 
 /* Create an INT_CST node with a CST value zero extended.  */
 
 /* static inline */
-extern tree build_int_cst (tree, HOST_WIDE_INT);
-extern tree build_int_cstu (tree type, unsigned HOST_WIDE_INT cst);
-extern tree build_int_cst_type (tree, HOST_WIDE_INT);
+extern tree build_int_cst (tree, poly_int64);
+extern tree build_int_cstu (tree type, poly_uint64);
+extern tree build_int_cst_type (tree, poly_int64);
 extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
 extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
 extern tree build_vec_series_cst (tree, tree, tree CXX_MEM_STAT_INFO);
@@ -4056,6 +4065,7 @@ extern tree build_minus_one_cst (tree);
 extern tree build_all_ones_cst (tree);
 extern tree build_zero_cst (tree);
 extern tree build_string (int, const char *);
+extern tree build_poly_int_cst (tree, const poly_wide_int_ref &);
 extern tree build_tree_list (tree, tree CXX_MEM_STAT_INFO);
 extern tree build_tree_list_vec (const vec<tree, va_gc> * CXX_MEM_STAT_INFO);
 extern tree build_decl (location_t, enum tree_code,
@@ -4104,7 +4114,7 @@ extern tree build_opaque_vector_type (tr
 extern tree build_index_type (tree);
 extern tree build_array_type (tree, tree, bool = false);
 extern tree build_nonshared_array_type (tree, tree);
-extern tree build_array_type_nelts (tree, unsigned HOST_WIDE_INT);
+extern tree build_array_type_nelts (tree, poly_uint64);
 extern tree build_function_type (tree, tree);
 extern tree build_function_type_list (tree, ...);
 extern tree build_varargs_function_type_list (tree, ...);
@@ -4128,12 +4138,14 @@ extern tree chain_index (int, tree);
 
 extern int tree_int_cst_equal (const_tree, const_tree);
 
-extern bool tree_fits_shwi_p (const_tree)
-  ATTRIBUTE_PURE;
-extern bool tree_fits_uhwi_p (const_tree)
-  ATTRIBUTE_PURE;
+extern bool tree_fits_shwi_p (const_tree) ATTRIBUTE_PURE;
+extern bool tree_fits_poly_int64_p (const_tree) ATTRIBUTE_PURE;
+extern bool tree_fits_uhwi_p (const_tree) ATTRIBUTE_PURE;
+extern bool tree_fits_poly_uint64_p (const_tree) ATTRIBUTE_PURE;
 extern HOST_WIDE_INT tree_to_shwi (const_tree);
+extern poly_int64 tree_to_poly_int64 (const_tree);
 extern unsigned HOST_WIDE_INT tree_to_uhwi (const_tree);
+extern poly_uint64 tree_to_poly_uint64 (const_tree);
 #if !defined ENABLE_TREE_CHECKING && (GCC_VERSION >= 4003)
 extern inline __attribute__ ((__gnu_inline__)) HOST_WIDE_INT
 tree_to_shwi (const_tree t)
@@ -4148,6 +4160,21 @@ tree_to_uhwi (const_tree t)
   gcc_assert (tree_fits_uhwi_p (t));
   return TREE_INT_CST_LOW (t);
 }
+#if NUM_POLY_INT_COEFFS == 1
+extern inline __attribute__ ((__gnu_inline__)) poly_int64
+tree_to_poly_int64 (const_tree t)
+{
+  gcc_assert (tree_fits_poly_int64_p (t));
+  return TREE_INT_CST_LOW (t);
+}
+
+extern inline __attribute__ ((__gnu_inline__)) poly_uint64
+tree_to_poly_uint64 (const_tree t)
+{
+  gcc_assert (tree_fits_poly_uint64_p (t));
+  return TREE_INT_CST_LOW (t);
+}
+#endif
 #endif
 extern int tree_int_cst_sgn (const_tree);
 extern int tree_int_cst_sign_bit (const_tree);
@@ -4156,6 +4183,33 @@ extern tree strip_array_types (tree);
 extern tree excess_precision_type (tree);
 extern bool valid_constant_size_p (const_tree);
 
+/* Return true if T holds a value that can be represented as a poly_int64
+   without loss of precision.  Store the value in *VALUE if so.  */
+
+inline bool
+poly_int_tree_p (const_tree t, poly_int64_pod *value)
+{
+  if (tree_fits_poly_int64_p (t))
+    {
+      *value = tree_to_poly_int64 (t);
+      return true;
+    }
+  return false;
+}
+
+/* Return true if T holds a value that can be represented as a poly_uint64
+   without loss of precision.  Store the value in *VALUE if so.  */
+
+inline bool
+poly_int_tree_p (const_tree t, poly_uint64_pod *value)
+{
+  if (tree_fits_poly_uint64_p (t))
+    {
+      *value = tree_to_poly_uint64 (t);
+      return true;
+    }
+  return false;
+}
 
 /* From expmed.c.  Since rtl.h is included after tree.h, we can't
    put the prototype here.  Rtl.h does declare the prototype if
@@ -4702,8 +4756,17 @@ complete_or_array_type_p (const_tree typ
 	     && COMPLETE_TYPE_P (TREE_TYPE (type)));
 }
 
+/* Return true if the value of T could be represented as a poly_widest_int.  */
+
+inline bool
+poly_int_tree_p (const_tree t)
+{
+  return (TREE_CODE (t) == INTEGER_CST || POLY_INT_CST_P (t));
+}
+
 extern tree strip_float_extensions (tree);
 extern int really_constant_p (const_tree);
+extern bool ptrdiff_tree_p (const_tree, poly_int64_pod *);
 extern bool decl_address_invariant_p (const_tree);
 extern bool decl_address_ip_invariant_p (const_tree);
 extern bool int_fits_type_p (const_tree, const_tree);
@@ -5132,6 +5195,29 @@ extern bool anon_aggrname_p (const_tree)
 /* The tree and const_tree overload templates.   */
 namespace wi
 {
+  class unextended_tree
+  {
+  private:
+    const_tree m_t;
+
+  public:
+    unextended_tree () {}
+    unextended_tree (const_tree t) : m_t (t) {}
+
+    unsigned int get_precision () const;
+    const HOST_WIDE_INT *get_val () const;
+    unsigned int get_len () const;
+    const_tree get_tree () const { return m_t; }
+  };
+
+  template <>
+  struct int_traits <unextended_tree>
+  {
+    static const enum precision_type precision_type = VAR_PRECISION;
+    static const bool host_dependent_precision = false;
+    static const bool is_sign_extended = false;
+  };
+
   template <int N>
   class extended_tree
   {
@@ -5139,11 +5225,13 @@ extern bool anon_aggrname_p (const_tree)
     const_tree m_t;
 
   public:
+    extended_tree () {}
     extended_tree (const_tree);
 
     unsigned int get_precision () const;
     const HOST_WIDE_INT *get_val () const;
     unsigned int get_len () const;
+    const_tree get_tree () const { return m_t; }
   };
 
   template <int N>
@@ -5155,10 +5243,11 @@ extern bool anon_aggrname_p (const_tree)
     static const unsigned int precision = N;
   };
 
-  typedef const generic_wide_int <extended_tree <WIDE_INT_MAX_PRECISION> >
-    tree_to_widest_ref;
-  typedef const generic_wide_int <extended_tree <ADDR_MAX_PRECISION> >
-    tree_to_offset_ref;
+  typedef extended_tree <WIDE_INT_MAX_PRECISION> widest_extended_tree;
+  typedef extended_tree <ADDR_MAX_PRECISION> offset_extended_tree;
+
+  typedef const generic_wide_int <widest_extended_tree> tree_to_widest_ref;
+  typedef const generic_wide_int <offset_extended_tree> tree_to_offset_ref;
   typedef const generic_wide_int<wide_int_ref_storage<false, false> >
     tree_to_wide_ref;
 
@@ -5166,6 +5255,34 @@ extern bool anon_aggrname_p (const_tree)
   tree_to_offset_ref to_offset (const_tree);
   tree_to_wide_ref to_wide (const_tree);
   wide_int to_wide (const_tree, unsigned int);
+
+  typedef const poly_int <NUM_POLY_INT_COEFFS,
+			  generic_wide_int <widest_extended_tree> >
+    tree_to_poly_widest_ref;
+  typedef const poly_int <NUM_POLY_INT_COEFFS,
+			  generic_wide_int <offset_extended_tree> >
+    tree_to_poly_offset_ref;
+  typedef const poly_int <NUM_POLY_INT_COEFFS,
+			  generic_wide_int <unextended_tree> >
+    tree_to_poly_wide_ref;
+
+  tree_to_poly_widest_ref to_poly_widest (const_tree);
+  tree_to_poly_offset_ref to_poly_offset (const_tree);
+  tree_to_poly_wide_ref to_poly_wide (const_tree);
+
+  template <int N>
+  struct ints_for <generic_wide_int <extended_tree <N> >, CONST_PRECISION>
+  {
+    typedef generic_wide_int <extended_tree <N> > extended;
+    static extended zero (const extended &);
+  };
+
+  template <>
+  struct ints_for <generic_wide_int <unextended_tree>, VAR_PRECISION>
+  {
+    typedef generic_wide_int <unextended_tree> unextended;
+    static unextended zero (const unextended &);
+  };
 }
 
 /* Refer to INTEGER_CST T as though it were a widest_int.
@@ -5310,6 +5427,95 @@ wi::extended_tree <N>::get_len () const
     gcc_unreachable ();
 }
 
+inline unsigned int
+wi::unextended_tree::get_precision () const
+{
+  return TYPE_PRECISION (TREE_TYPE (m_t));
+}
+
+inline const HOST_WIDE_INT *
+wi::unextended_tree::get_val () const
+{
+  return &TREE_INT_CST_ELT (m_t, 0);
+}
+
+inline unsigned int
+wi::unextended_tree::get_len () const
+{
+  return TREE_INT_CST_NUNITS (m_t);
+}
+
+/* Return the value of a POLY_INT_CST in its native precision.  */
+
+inline wi::tree_to_poly_wide_ref
+poly_int_cst_value (const_tree x)
+{
+  poly_int <NUM_POLY_INT_COEFFS, generic_wide_int <wi::unextended_tree> > res;
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    res.coeffs[i] = POLY_INT_CST_COEFF (x, i);
+  return res;
+}
+
+/* Access INTEGER_CST or POLY_INT_CST tree T as if it were a
+   poly_widest_int.  See wi::to_widest for more details.  */
+
+inline wi::tree_to_poly_widest_ref
+wi::to_poly_widest (const_tree t)
+{
+  if (POLY_INT_CST_P (t))
+    {
+      poly_int <NUM_POLY_INT_COEFFS,
+		generic_wide_int <widest_extended_tree> > res;
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	res.coeffs[i] = POLY_INT_CST_COEFF (t, i);
+      return res;
+    }
+  return t;
+}
+
+/* Access INTEGER_CST or POLY_INT_CST tree T as if it were a
+   poly_offset_int.  See wi::to_offset for more details.  */
+
+inline wi::tree_to_poly_offset_ref
+wi::to_poly_offset (const_tree t)
+{
+  if (POLY_INT_CST_P (t))
+    {
+      poly_int <NUM_POLY_INT_COEFFS,
+		generic_wide_int <offset_extended_tree> > res;
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	res.coeffs[i] = POLY_INT_CST_COEFF (t, i);
+      return res;
+    }
+  return t;
+}
+
+/* Access INTEGER_CST or POLY_INT_CST tree T as if it were a
+   poly_wide_int.  See wi::to_wide for more details.  */
+
+inline wi::tree_to_poly_wide_ref
+wi::to_poly_wide (const_tree t)
+{
+  if (POLY_INT_CST_P (t))
+    return poly_int_cst_value (t);
+  return t;
+}
+
+template <int N>
+inline generic_wide_int <wi::extended_tree <N> >
+wi::ints_for <generic_wide_int <wi::extended_tree <N> >,
+	      wi::CONST_PRECISION>::zero (const extended &x)
+{
+  return build_zero_cst (TREE_TYPE (x.get_tree ()));
+}
+
+inline generic_wide_int <wi::unextended_tree>
+wi::ints_for <generic_wide_int <wi::unextended_tree>,
+	      wi::VAR_PRECISION>::zero (const unextended &x)
+{
+  return build_zero_cst (TREE_TYPE (x.get_tree ()));
+}
+
 namespace wi
 {
   template <typename T>
@@ -5327,7 +5533,8 @@ wi::extended_tree <N>::get_len () const
 bool
 wi::fits_to_boolean_p (const T &x, const_tree type)
 {
-  return eq_p (x, 0) || eq_p (x, TYPE_UNSIGNED (type) ? 1 : -1);
+  return (known_zero (x)
+	  || (TYPE_UNSIGNED (type) ? known_one (x) : known_all_ones (x)));
 }
 
 template <typename T>
@@ -5340,9 +5547,9 @@ wi::fits_to_tree_p (const T &x, const_tr
     return fits_to_boolean_p (x, type);
 
   if (TYPE_UNSIGNED (type))
-    return eq_p (x, zext (x, TYPE_PRECISION (type)));
+    return must_eq (x, zext (x, TYPE_PRECISION (type)));
   else
-    return eq_p (x, sext (x, TYPE_PRECISION (type)));
+    return must_eq (x, sext (x, TYPE_PRECISION (type)));
 }
 
 /* Produce the smallest number that is represented in TYPE.  The precision
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree.c	2017-10-23 17:00:57.783962919 +0100
@@ -203,6 +203,17 @@ struct int_cst_hasher : ggc_cache_ptr_ha
 
 static GTY ((cache)) hash_table<int_cst_hasher> *int_cst_hash_table;
 
+/* Class and variable for making sure that there is a single POLY_INT_CST
+   for a given value.  */
+struct poly_int_cst_hasher : ggc_cache_ptr_hash<tree_node>
+{
+  typedef std::pair<tree, const poly_wide_int *> compare_type;
+  static hashval_t hash (tree t);
+  static bool equal (tree x, const compare_type &y);
+};
+
+static GTY ((cache)) hash_table<poly_int_cst_hasher> *poly_int_cst_hash_table;
+
 /* Hash table for optimization flags and target option flags.  Use the same
    hash table for both sets of options.  Nodes for building the current
    optimization and target option nodes.  The assumption is most of the time
@@ -460,6 +471,7 @@ tree_node_structure_for_code (enum tree_
       /* tcc_constant cases.  */
     case VOID_CST:		return TS_TYPED;
     case INTEGER_CST:		return TS_INT_CST;
+    case POLY_INT_CST:		return TS_POLY_INT_CST;
     case REAL_CST:		return TS_REAL_CST;
     case FIXED_CST:		return TS_FIXED_CST;
     case COMPLEX_CST:		return TS_COMPLEX;
@@ -519,6 +531,7 @@ initialize_tree_contains_struct (void)
 
 	case TS_COMMON:
 	case TS_INT_CST:
+	case TS_POLY_INT_CST:
 	case TS_REAL_CST:
 	case TS_FIXED_CST:
 	case TS_VECTOR:
@@ -652,6 +665,8 @@ init_ttree (void)
 
   int_cst_hash_table = hash_table<int_cst_hasher>::create_ggc (1024);
 
+  poly_int_cst_hash_table = hash_table<poly_int_cst_hasher>::create_ggc (64);
+
   int_cst_node = make_int_cst (1, 1);
 
   cl_option_hash_table = hash_table<cl_option_hasher>::create_ggc (64);
@@ -814,6 +829,7 @@ tree_code_size (enum tree_code code)
 	{
 	case VOID_CST:		return sizeof (struct tree_typed);
 	case INTEGER_CST:	gcc_unreachable ();
+	case POLY_INT_CST:	return sizeof (struct tree_poly_int_cst);
 	case REAL_CST:		return sizeof (struct tree_real_cst);
 	case FIXED_CST:		return sizeof (struct tree_fixed_cst);
 	case COMPLEX_CST:	return sizeof (struct tree_complex);
@@ -1298,31 +1314,51 @@ build_new_int_cst (tree type, const wide
   return nt;
 }
 
-/* Create an INT_CST node with a LOW value sign extended to TYPE.  */
+/* Return a new POLY_INT_CST with coefficients COEFFS and type TYPE.  */
+
+static tree
+build_new_poly_int_cst (tree type, tree (&coeffs)[NUM_POLY_INT_COEFFS])
+{
+  size_t length = sizeof (struct tree_poly_int_cst);
+  record_node_allocation_statistics (POLY_INT_CST, length);
+
+  tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
+
+  TREE_SET_CODE (t, POLY_INT_CST);
+  TREE_CONSTANT (t) = 1;
+  TREE_TYPE (t) = type;
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    POLY_INT_CST_COEFF (t, i) = coeffs[i];
+  return t;
+}
+
+/* Create a constant tree that contains CST sign-extended to TYPE.  */
 
 tree
-build_int_cst (tree type, HOST_WIDE_INT low)
+build_int_cst (tree type, poly_int64 cst)
 {
   /* Support legacy code.  */
   if (!type)
     type = integer_type_node;
 
-  return wide_int_to_tree (type, wi::shwi (low, TYPE_PRECISION (type)));
+  return wide_int_to_tree (type, wi::shwi (cst, TYPE_PRECISION (type)));
 }
 
+/* Create a constant tree that contains CST zero-extended to TYPE.  */
+
 tree
-build_int_cstu (tree type, unsigned HOST_WIDE_INT cst)
+build_int_cstu (tree type, poly_uint64 cst)
 {
   return wide_int_to_tree (type, wi::uhwi (cst, TYPE_PRECISION (type)));
 }
 
-/* Create an INT_CST node with a LOW value sign extended to TYPE.  */
+/* Create a constant tree that contains CST sign-extended to TYPE.  */
 
 tree
-build_int_cst_type (tree type, HOST_WIDE_INT low)
+build_int_cst_type (tree type, poly_int64 cst)
 {
   gcc_assert (type);
-  return wide_int_to_tree (type, wi::shwi (low, TYPE_PRECISION (type)));
+  return wide_int_to_tree (type, wi::shwi (cst, TYPE_PRECISION (type)));
 }
 
 /* Constructs tree in type TYPE from with value given by CST.  Signedness
@@ -1350,7 +1386,7 @@ double_int_to_tree (tree type, double_in
 
 
 tree
-force_fit_type (tree type, const wide_int_ref &cst,
+force_fit_type (tree type, const poly_wide_int_ref &cst,
 		int overflowable, bool overflowed)
 {
   signop sign = TYPE_SIGN (type);
@@ -1362,8 +1398,21 @@ force_fit_type (tree type, const wide_in
 	  || overflowable < 0
 	  || (overflowable > 0 && sign == SIGNED))
 	{
-	  wide_int tmp = wide_int::from (cst, TYPE_PRECISION (type), sign);
-	  tree t = build_new_int_cst (type, tmp);
+	  poly_wide_int tmp = poly_wide_int::from (cst, TYPE_PRECISION (type),
+						   sign);
+	  tree t;
+	  if (tmp.is_constant ())
+	    t = build_new_int_cst (type, tmp.coeffs[0]);
+	  else
+	    {
+	      tree coeffs[NUM_POLY_INT_COEFFS];
+	      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+		{
+		  coeffs[i] = build_new_int_cst (type, tmp.coeffs[i]);
+		  TREE_OVERFLOW (coeffs[i]) = 1;
+		}
+	      t = build_new_poly_int_cst (type, coeffs);
+	    }
 	  TREE_OVERFLOW (t) = 1;
 	  return t;
 	}
@@ -1420,8 +1469,8 @@ int_cst_hasher::equal (tree x, tree y)
    the upper bits and ensures that hashing and value equality based
    upon the underlying HOST_WIDE_INTs works without masking.  */
 
-tree
-wide_int_to_tree (tree type, const wide_int_ref &pcst)
+static tree
+wide_int_to_tree_1 (tree type, const wide_int_ref &pcst)
 {
   tree t;
   int ix = -1;
@@ -1566,6 +1615,66 @@ wide_int_to_tree (tree type, const wide_
   return t;
 }
 
+hashval_t
+poly_int_cst_hasher::hash (tree t)
+{
+  inchash::hash hstate;
+
+  hstate.add_int (TYPE_UID (TREE_TYPE (t)));
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    hstate.add_wide_int (wi::to_wide (POLY_INT_CST_COEFF (t, i)));
+
+  return hstate.end ();
+}
+
+bool
+poly_int_cst_hasher::equal (tree x, const compare_type &y)
+{
+  if (TREE_TYPE (x) != y.first)
+    return false;
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    if (wi::to_wide (POLY_INT_CST_COEFF (x, i)) != y.second->coeffs[i])
+      return false;
+  return true;
+}
+
+/* Build a POLY_INT_CST node with type TYPE and with the elements in VALUES.
+   The elements must also have type TYPE.  */
+
+tree
+build_poly_int_cst (tree type, const poly_wide_int_ref &values)
+{
+  unsigned int prec = TYPE_PRECISION (type);
+  gcc_assert (prec <= values.coeffs[0].get_precision ());
+  poly_wide_int c = poly_wide_int::from (values, prec, SIGNED);
+
+  inchash::hash h;
+  h.add_int (TYPE_UID (type));
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    h.add_wide_int (c.coeffs[i]);
+  poly_int_cst_hasher::compare_type comp (type, &c);
+  tree *slot = poly_int_cst_hash_table->find_slot_with_hash (comp, h.end (),
+							     INSERT);
+  if (*slot == NULL_TREE)
+    {
+      tree coeffs[NUM_POLY_INT_COEFFS];
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	coeffs[i] = wide_int_to_tree_1 (type, c.coeffs[i]);
+      *slot = build_new_poly_int_cst (type, coeffs);
+    }
+  return *slot;
+}
+
+/* Create a constant tree with value VALUE in type TYPE.  */
+
+tree
+wide_int_to_tree (tree type, const poly_wide_int_ref &value)
+{
+  if (value.is_constant ())
+    return wide_int_to_tree_1 (type, value.coeffs[0]);
+  return build_poly_int_cst (type, value);
+}
+
 void
 cache_integer_cst (tree t)
 {
@@ -2791,6 +2900,59 @@ really_constant_p (const_tree exp)
     exp = TREE_OPERAND (exp, 0);
   return TREE_CONSTANT (exp);
 }
+
+/* Return true if T holds a polynomial pointer difference, storing it in
+   *VALUE if so.  A true return means that T's precision is no greater
+   than 64 bits, which is the largest address space we support, so *VALUE
+   never loses precision.  However, the signedness of the result is
+   somewhat arbitrary, since if B lives near the end of a 64-bit address
+   range and A lives near the beginning, B - A is a large positive value
+   outside the range of int64_t.  A - B is likewise a large negative value
+   outside the range of int64_t.  All the pointer difference really
+   gives is a raw pointer-sized bitstring that can be added to the first
+   pointer value to get the second.  */
+
+bool
+ptrdiff_tree_p (const_tree t, poly_int64_pod *value)
+{
+  if (!t)
+    return false;
+  if (TREE_CODE (t) == INTEGER_CST)
+    {
+      if (!cst_and_fits_in_hwi (t))
+	return false;
+      *value = int_cst_value (t);
+      return true;
+    }
+  if (POLY_INT_CST_P (t))
+    {
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	if (!cst_and_fits_in_hwi (POLY_INT_CST_COEFF (t, i)))
+	  return false;
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	value->coeffs[i] = int_cst_value (POLY_INT_CST_COEFF (t, i));
+      return true;
+    }
+  return false;
+}
+
+poly_int64
+tree_to_poly_int64 (const_tree t)
+{
+  gcc_assert (tree_fits_poly_int64_p (t));
+  if (POLY_INT_CST_P (t))
+    return poly_int_cst_value (t).force_shwi ();
+  return TREE_INT_CST_LOW (t);
+}
+
+poly_uint64
+tree_to_poly_uint64 (const_tree t)
+{
+  gcc_assert (tree_fits_poly_uint64_p (t));
+  if (POLY_INT_CST_P (t))
+    return poly_int_cst_value (t).force_uhwi ();
+  return TREE_INT_CST_LOW (t);
+}
 \f
 /* Return first list element whose TREE_VALUE is ELEM.
    Return 0 if ELEM is not in LIST.  */
@@ -4773,7 +4935,7 @@ mem_ref_offset (const_tree t)
    offsetted by OFFSET units.  */
 
 tree
-build_invariant_address (tree type, tree base, HOST_WIDE_INT offset)
+build_invariant_address (tree type, tree base, poly_int64 offset)
 {
   tree ref = fold_build2 (MEM_REF, TREE_TYPE (type),
 			  build_fold_addr_expr (base),
@@ -6661,6 +6823,25 @@ tree_fits_shwi_p (const_tree t)
 	  && wi::fits_shwi_p (wi::to_widest (t)));
 }
 
+/* Return true if T is an INTEGER_CST or POLY_INT_CST whose numerical
+   value (extended according to TYPE_UNSIGNED) fits in a poly_int64.  */
+
+bool
+tree_fits_poly_int64_p (const_tree t)
+{
+  if (t == NULL_TREE)
+    return false;
+  if (POLY_INT_CST_P (t))
+    {
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; i++)
+	if (!wi::fits_shwi_p (wi::to_wide (POLY_INT_CST_COEFF (t, i))))
+	  return false;
+      return true;
+    }
+  return (TREE_CODE (t) == INTEGER_CST
+	  && wi::fits_shwi_p (wi::to_widest (t)));
+}
+
 /* Return true if T is an INTEGER_CST whose numerical value (extended
    according to TYPE_UNSIGNED) fits in an unsigned HOST_WIDE_INT.  */
 
@@ -6672,6 +6853,25 @@ tree_fits_uhwi_p (const_tree t)
 	  && wi::fits_uhwi_p (wi::to_widest (t)));
 }
 
+/* Return true if T is an INTEGER_CST or POLY_INT_CST whose numerical
+   value (extended according to TYPE_UNSIGNED) fits in a poly_uint64.  */
+
+bool
+tree_fits_poly_uint64_p (const_tree t)
+{
+  if (t == NULL_TREE)
+    return false;
+  if (POLY_INT_CST_P (t))
+    {
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; i++)
+	if (!wi::fits_uhwi_p (wi::to_widest (POLY_INT_CST_COEFF (t, i))))
+	  return false;
+      return true;
+    }
+  return (TREE_CODE (t) == INTEGER_CST
+	  && wi::fits_uhwi_p (wi::to_widest (t)));
+}
+
 /* T is an INTEGER_CST whose numerical value (extended according to
    TYPE_UNSIGNED) fits in a signed HOST_WIDE_INT.  Return that
    HOST_WIDE_INT.  */
@@ -6880,6 +7080,12 @@ simple_cst_equal (const_tree t1, const_t
       return 0;
 
     default:
+      if (POLY_INT_CST_P (t1))
+	/* A false return means may_ne rather than must_ne.  */
+	return must_eq (poly_widest_int::from (poly_int_cst_value (t1),
+					       TYPE_SIGN (TREE_TYPE (t1))),
+			poly_widest_int::from (poly_int_cst_value (t2),
+					       TYPE_SIGN (TREE_TYPE (t2))));
       break;
     }
 
@@ -6939,8 +7145,16 @@ compare_tree_int (const_tree t, unsigned
 bool
 valid_constant_size_p (const_tree size)
 {
+  if (TREE_OVERFLOW (size))
+    return false;
+  if (POLY_INT_CST_P (size))
+    {
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	if (!valid_constant_size_p (POLY_INT_CST_COEFF (size, i)))
+	  return false;
+      return true;
+    }
   if (! tree_fits_uhwi_p (size)
-      || TREE_OVERFLOW (size)
       || tree_int_cst_sign_bit (size) != 0)
     return false;
   return true;
@@ -7239,6 +7453,12 @@ add_expr (const_tree t, inchash::hash &h
 	}
       /* FALL THROUGH */
     default:
+      if (POLY_INT_CST_P (t))
+	{
+	  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	    hstate.add_wide_int (wi::to_wide (POLY_INT_CST_COEFF (t, i)));
+	  return;
+	}
       tclass = TREE_CODE_CLASS (code);
 
       if (tclass == tcc_declaration)
@@ -7776,7 +7996,7 @@ build_nonshared_array_type (tree elt_typ
    sizetype.  */
 
 tree
-build_array_type_nelts (tree elt_type, unsigned HOST_WIDE_INT nelts)
+build_array_type_nelts (tree elt_type, poly_uint64 nelts)
 {
   return build_array_type (elt_type, build_index_type (size_int (nelts - 1)));
 }
@@ -12459,8 +12679,8 @@ drop_tree_overflow (tree t)
   gcc_checking_assert (TREE_OVERFLOW (t));
 
   /* For tree codes with a sharing machinery re-build the result.  */
-  if (TREE_CODE (t) == INTEGER_CST)
-    return wide_int_to_tree (TREE_TYPE (t), wi::to_wide (t));
+  if (poly_int_tree_p (t))
+    return wide_int_to_tree (TREE_TYPE (t), wi::to_poly_wide (t));
 
   /* Otherwise, as all tcc_constants are possibly shared, copy the node
      and drop the flag.  */
Index: gcc/lto-streamer-out.c
===================================================================
--- gcc/lto-streamer-out.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/lto-streamer-out.c	2017-10-23 17:00:57.776969281 +0100
@@ -751,6 +751,10 @@ #define DFS_follow_tree_edge(DEST) \
 	DFS_follow_tree_edge (VECTOR_CST_ELT (expr, i));
     }
 
+  if (CODE_CONTAINS_STRUCT (code, TS_POLY_INT_CST))
+    for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+      DFS_follow_tree_edge (POLY_INT_CST_COEFF (expr, i));
+
   if (CODE_CONTAINS_STRUCT (code, TS_COMPLEX))
     {
       DFS_follow_tree_edge (TREE_REALPART (expr));
@@ -1202,6 +1206,10 @@ #define visit(SIBLING) \
     for (unsigned i = 0; i < VECTOR_CST_NELTS (t); ++i)
       visit (VECTOR_CST_ELT (t, i));
 
+  if (CODE_CONTAINS_STRUCT (code, TS_POLY_INT_CST))
+    for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+      visit (POLY_INT_CST_COEFF (t, i));
+
   if (CODE_CONTAINS_STRUCT (code, TS_COMPLEX))
     {
       visit (TREE_REALPART (t));
Index: gcc/tree-streamer-in.c
===================================================================
--- gcc/tree-streamer-in.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-streamer-in.c	2017-10-23 17:00:57.780965645 +0100
@@ -654,6 +654,19 @@ lto_input_ts_vector_tree_pointers (struc
 }
 
 
+/* Read all pointer fields in the TS_POLY_INT_CST structure of EXPR from
+   input block IB.  DATA_IN contains tables and descriptors for the
+   file being read.  */
+
+static void
+lto_input_ts_poly_tree_pointers (struct lto_input_block *ib,
+				 struct data_in *data_in, tree expr)
+{
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    POLY_INT_CST_COEFF (expr, i) = stream_read_tree (ib, data_in);
+}
+
+
 /* Read all pointer fields in the TS_COMPLEX structure of EXPR from input
    block IB.  DATA_IN contains tables and descriptors for the
    file being read.  */
@@ -1037,6 +1050,9 @@ streamer_read_tree_body (struct lto_inpu
   if (CODE_CONTAINS_STRUCT (code, TS_VECTOR))
     lto_input_ts_vector_tree_pointers (ib, data_in, expr);
 
+  if (CODE_CONTAINS_STRUCT (code, TS_POLY_INT_CST))
+    lto_input_ts_poly_tree_pointers (ib, data_in, expr);
+
   if (CODE_CONTAINS_STRUCT (code, TS_COMPLEX))
     lto_input_ts_complex_tree_pointers (ib, data_in, expr);
 
Index: gcc/tree-streamer-out.c
===================================================================
--- gcc/tree-streamer-out.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-streamer-out.c	2017-10-23 17:00:57.780965645 +0100
@@ -539,6 +539,18 @@ write_ts_vector_tree_pointers (struct ou
 }
 
 
+/* Write all pointer fields in the TS_POLY_INT_CST structure of EXPR to
+   output block OB.  If REF_P is true, write a reference to EXPR's pointer
+   fields.  */
+
+static void
+write_ts_poly_tree_pointers (struct output_block *ob, tree expr, bool ref_p)
+{
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    stream_write_tree (ob, POLY_INT_CST_COEFF (expr, i), ref_p);
+}
+
+
 /* Write all pointer fields in the TS_COMPLEX structure of EXPR to output
    block OB.  If REF_P is true, write a reference to EXPR's pointer
    fields.  */
@@ -880,6 +892,9 @@ streamer_write_tree_body (struct output_
   if (CODE_CONTAINS_STRUCT (code, TS_VECTOR))
     write_ts_vector_tree_pointers (ob, expr, ref_p);
 
+  if (CODE_CONTAINS_STRUCT (code, TS_POLY_INT_CST))
+    write_ts_poly_tree_pointers (ob, expr, ref_p);
+
   if (CODE_CONTAINS_STRUCT (code, TS_COMPLEX))
     write_ts_complex_tree_pointers (ob, expr, ref_p);
 
Index: gcc/tree-streamer.c
===================================================================
--- gcc/tree-streamer.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-streamer.c	2017-10-23 17:00:57.780965645 +0100
@@ -55,6 +55,7 @@ streamer_check_handled_ts_structures (vo
   handled_p[TS_TYPED] = true;
   handled_p[TS_COMMON] = true;
   handled_p[TS_INT_CST] = true;
+  handled_p[TS_POLY_INT_CST] = true;
   handled_p[TS_REAL_CST] = true;
   handled_p[TS_FIXED_CST] = true;
   handled_p[TS_VECTOR] = true;
Index: gcc/asan.c
===================================================================
--- gcc/asan.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/asan.c	2017-10-23 17:00:57.770974734 +0100
@@ -1647,6 +1647,7 @@ asan_protect_global (tree decl)
 	  && !section_sanitized_p (DECL_SECTION_NAME (decl)))
       || DECL_SIZE (decl) == 0
       || ASAN_RED_ZONE_SIZE * BITS_PER_UNIT > MAX_OFILE_ALIGNMENT
+      || TREE_CODE (DECL_SIZE_UNIT (decl)) != INTEGER_CST
       || !valid_constant_size_p (DECL_SIZE_UNIT (decl))
       || DECL_ALIGN_UNIT (decl) > 2 * ASAN_RED_ZONE_SIZE
       || TREE_TYPE (decl) == ubsan_get_source_location_type ()
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/cfgexpand.c	2017-10-23 17:00:57.770974734 +0100
@@ -4244,6 +4244,9 @@ expand_debug_expr (tree exp)
       op0 = expand_expr (exp, NULL_RTX, mode, EXPAND_INITIALIZER);
       return op0;
 
+    case POLY_INT_CST:
+      return immed_wide_int_const (poly_int_cst_value (exp), mode);
+
     case COMPLEX_CST:
       gcc_assert (COMPLEX_MODE_P (mode));
       op0 = expand_debug_expr (TREE_REALPART (exp));
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:00:54.442003055 +0100
+++ gcc/expr.c	2017-10-23 17:00:57.772972916 +0100
@@ -7717,6 +7717,8 @@ const_vector_element (scalar_mode mode,
     return const_double_from_real_value (TREE_REAL_CST (elt), mode);
   if (TREE_CODE (elt) == FIXED_CST)
     return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode);
+  if (POLY_INT_CST_P (elt))
+    return immed_wide_int_const (poly_int_cst_value (elt), mode);
   return immed_wide_int_const (wi::to_wide (elt), mode);
 }
 
@@ -10132,6 +10134,9 @@ expand_expr_real_1 (tree exp, rtx target
 				      copy_rtx (XEXP (temp, 0)));
       return temp;
 
+    case POLY_INT_CST:
+      return immed_wide_int_const (poly_int_cst_value (exp), mode);
+
     case SAVE_EXPR:
       {
 	tree val = treeop0;
Index: gcc/gimple-expr.h
===================================================================
--- gcc/gimple-expr.h	2017-10-23 16:52:20.504766418 +0100
+++ gcc/gimple-expr.h	2017-10-23 17:00:57.774971099 +0100
@@ -130,6 +130,7 @@ is_gimple_constant (const_tree t)
   switch (TREE_CODE (t))
     {
     case INTEGER_CST:
+    case POLY_INT_CST:
     case REAL_CST:
     case FIXED_CST:
     case COMPLEX_CST:
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/gimplify.c	2017-10-23 17:00:57.776969281 +0100
@@ -3028,7 +3028,7 @@ maybe_with_size_expr (tree *expr_p)
 
   /* If the size isn't known or is a constant, we have nothing to do.  */
   size = TYPE_SIZE_UNIT (type);
-  if (!size || TREE_CODE (size) == INTEGER_CST)
+  if (!size || poly_int_tree_p (size))
     return;
 
   /* Otherwise, make a WITH_SIZE_EXPR.  */
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/print-tree.c	2017-10-23 17:00:57.776969281 +0100
@@ -814,6 +814,18 @@ print_node (FILE *file, const char *pref
 	  }
 	  break;
 
+	case POLY_INT_CST:
+	  {
+	    char buf[10];
+	    for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	      {
+		snprintf (buf, sizeof (buf), "elt%u: ", i);
+		print_node (file, buf, POLY_INT_CST_COEFF (node, i),
+			    indent + 4);
+	      }
+	  }
+	  break;
+
 	case IDENTIFIER_NODE:
 	  lang_hooks.print_identifier (file, node, indent);
 	  break;
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-data-ref.c	2017-10-23 17:00:57.778967463 +0100
@@ -1235,6 +1235,10 @@ data_ref_compare_tree (tree t1, tree t2)
       break;
 
     default:
+      if (POLY_INT_CST_P (t1))
+	return compare_sizes_for_sort (wi::to_poly_widest (t1),
+				       wi::to_poly_widest (t2));
+
       tclass = TREE_CODE_CLASS (code);
 
       /* For decls, compare their UIDs.  */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-pretty-print.c	2017-10-23 17:00:57.779966554 +0100
@@ -1744,6 +1744,18 @@ dump_generic_node (pretty_printer *pp, t
 	pp_string (pp, "(OVF)");
       break;
 
+    case POLY_INT_CST:
+      pp_string (pp, "POLY_INT_CST [");
+      dump_generic_node (pp, POLY_INT_CST_COEFF (node, 0), spc, flags, false);
+      for (unsigned int i = 1; i < NUM_POLY_INT_COEFFS; ++i)
+	{
+	  pp_string (pp, ", ");
+	  dump_generic_node (pp, POLY_INT_CST_COEFF (node, i),
+			     spc, flags, false);
+	}
+      pp_string (pp, "]");
+      break;
+
     case REAL_CST:
       /* Code copied from print_node.  */
       {
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-ssa-address.c	2017-10-23 17:00:57.779966554 +0100
@@ -203,7 +203,8 @@ addr_for_mem_ref (struct mem_address *ad
 
   if (addr->offset && !integer_zerop (addr->offset))
     {
-      offset_int dc = offset_int::from (wi::to_wide (addr->offset), SIGNED);
+      poly_offset_int dc
+	= poly_offset_int::from (wi::to_poly_wide (addr->offset), SIGNED);
       off = immed_wide_int_const (dc, pointer_mode);
     }
   else
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:00:57.781964737 +0100
@@ -2753,7 +2753,7 @@ dr_group_sort_cmp (const void *dra_, con
     return cmp;
 
   /* Then sort after DR_INIT.  In case of identical DRs sort after stmt UID.  */
-  cmp = tree_int_cst_compare (DR_INIT (dra), DR_INIT (drb));
+  cmp = data_ref_compare_tree (DR_INIT (dra), DR_INIT (drb));
   if (cmp == 0)
     return gimple_uid (DR_STMT (dra)) < gimple_uid (DR_STMT (drb)) ? -1 : 1;
   return cmp;
Index: gcc/tree-vrp.c
===================================================================
--- gcc/tree-vrp.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-vrp.c	2017-10-23 17:00:57.782963828 +0100
@@ -1121,7 +1121,24 @@ compare_values_warnv (tree val1, tree va
       if (TREE_OVERFLOW (val1) || TREE_OVERFLOW (val2))
 	return -2;
 
-      return tree_int_cst_compare (val1, val2);
+      if (TREE_CODE (val1) == INTEGER_CST
+	  && TREE_CODE (val2) == INTEGER_CST)
+	return tree_int_cst_compare (val1, val2);
+
+      if (poly_int_tree_p (val1) && poly_int_tree_p (val2))
+	{
+	  if (must_eq (wi::to_poly_widest (val1),
+		       wi::to_poly_widest (val2)))
+	    return 0;
+	  if (must_lt (wi::to_poly_widest (val1),
+		       wi::to_poly_widest (val2)))
+	    return -1;
+	  if (must_gt (wi::to_poly_widest (val1),
+		       wi::to_poly_widest (val2)))
+	    return 1;
+	}
+
+      return -2;
     }
   else
     {
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-ssa-loop-ivopts.c	2017-10-23 17:00:57.780965645 +0100
@@ -1127,6 +1127,8 @@ determine_base_object (tree expr)
       gcc_unreachable ();
 
     default:
+      if (POLY_INT_CST_P (expr))
+	return NULL_TREE;
       return fold_convert (ptr_type_node, expr);
     }
 }
@@ -2168,6 +2170,12 @@ constant_multiple_of (tree top, tree bot
       return res == 0;
 
     default:
+      if (POLY_INT_CST_P (top)
+	  && POLY_INT_CST_P (bot)
+	  && constant_multiple_p (wi::to_poly_widest (top),
+				  wi::to_poly_widest (bot), mul))
+	return true;
+
       return false;
     }
 }
@@ -2967,7 +2975,8 @@ get_loop_invariant_expr (struct ivopts_d
 {
   STRIP_NOPS (inv_expr);
 
-  if (TREE_CODE (inv_expr) == INTEGER_CST || TREE_CODE (inv_expr) == SSA_NAME)
+  if (poly_int_tree_p (inv_expr)
+      || TREE_CODE (inv_expr) == SSA_NAME)
     return NULL;
 
   /* Don't strip constant part away as we used to.  */
@@ -3064,7 +3073,7 @@ add_candidate_1 (struct ivopts_data *dat
       cand->incremented_at = incremented_at;
       data->vcands.safe_push (cand);
 
-      if (TREE_CODE (step) != INTEGER_CST)
+      if (!poly_int_tree_p (step))
 	{
 	  find_inv_vars (data, &step, &cand->inv_vars);
 
@@ -3800,7 +3809,7 @@ get_computation_aff_1 (struct loop *loop
   if (TYPE_PRECISION (utype) < TYPE_PRECISION (ctype))
     {
       if (cand->orig_iv != NULL && CONVERT_EXPR_P (cbase)
-	  && (CONVERT_EXPR_P (cstep) || TREE_CODE (cstep) == INTEGER_CST))
+	  && (CONVERT_EXPR_P (cstep) || poly_int_tree_p (cstep)))
 	{
 	  tree inner_base, inner_step, inner_type;
 	  inner_base = TREE_OPERAND (cbase, 0);
@@ -4058,7 +4067,7 @@ force_expr_to_var_cost (tree expr, bool
 
   if (is_gimple_min_invariant (expr))
     {
-      if (TREE_CODE (expr) == INTEGER_CST)
+      if (poly_int_tree_p (expr))
 	return comp_cost (integer_cost [speed], 0);
 
       if (TREE_CODE (expr) == ADDR_EXPR)
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-ssa-loop.c	2017-10-23 17:00:57.780965645 +0100
@@ -620,6 +620,7 @@ for_each_index (tree *addr_p, bool (*cbc
 	case VEC_SERIES_CST:
 	case COMPLEX_CST:
 	case INTEGER_CST:
+	case POLY_INT_CST:
 	case REAL_CST:
 	case FIXED_CST:
 	case CONSTRUCTOR:
Index: gcc/fold-const.h
===================================================================
--- gcc/fold-const.h	2017-10-23 16:52:20.504766418 +0100
+++ gcc/fold-const.h	2017-10-23 17:00:57.774971099 +0100
@@ -115,7 +115,7 @@ extern tree build_simple_mem_ref_loc (lo
 #define build_simple_mem_ref(T)\
 	build_simple_mem_ref_loc (UNKNOWN_LOCATION, T)
 extern offset_int mem_ref_offset (const_tree);
-extern tree build_invariant_address (tree, tree, HOST_WIDE_INT);
+extern tree build_invariant_address (tree, tree, poly_int64);
 extern tree constant_boolean_node (bool, tree);
 extern tree div_if_zero_remainder (const_tree, const_tree);
 
@@ -152,7 +152,7 @@ #define round_up(T,N) round_up_loc (UNKN
 extern tree round_up_loc (location_t, tree, unsigned int);
 #define round_down(T,N) round_down_loc (UNKNOWN_LOCATION, T, N)
 extern tree round_down_loc (location_t, tree, int);
-extern tree size_int_kind (HOST_WIDE_INT, enum size_type_kind);
+extern tree size_int_kind (poly_int64, enum size_type_kind);
 #define size_binop(CODE,T1,T2)\
    size_binop_loc (UNKNOWN_LOCATION, CODE, T1, T2)
 extern tree size_binop_loc (location_t, enum tree_code, tree, tree);
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/fold-const.c	2017-10-23 17:00:57.774971099 +0100
@@ -553,10 +553,8 @@ fold_negate_expr_1 (location_t loc, tree
 	return tem;
       break;
 
+    case POLY_INT_CST:
     case REAL_CST:
-      tem = fold_negate_const (t, type);
-      return tem;
-
     case FIXED_CST:
       tem = fold_negate_const (t, type);
       return tem;
@@ -986,13 +984,10 @@ int_binop_types_match_p (enum tree_code
 	 && TYPE_MODE (type1) == TYPE_MODE (type2);
 }
 
-
-/* Combine two integer constants PARG1 and PARG2 under operation CODE
-   to produce a new constant.  Return NULL_TREE if we don't know how
-   to evaluate CODE at compile-time.  */
+/* Subroutine of int_const_binop_1 that handles two INTEGER_CSTs.  */
 
 static tree
-int_const_binop_1 (enum tree_code code, const_tree parg1, const_tree parg2,
+int_const_binop_2 (enum tree_code code, const_tree parg1, const_tree parg2,
 		   int overflowable)
 {
   wide_int res;
@@ -1140,6 +1135,74 @@ int_const_binop_1 (enum tree_code code,
   return t;
 }
 
+/* Combine two integer constants PARG1 and PARG2 under operation CODE
+   to produce a new constant.  Return NULL_TREE if we don't know how
+   to evaluate CODE at compile-time.  */
+
+static tree
+int_const_binop_1 (enum tree_code code, const_tree arg1, const_tree arg2,
+		   int overflowable)
+{
+  if (TREE_CODE (arg1) == INTEGER_CST && TREE_CODE (arg2) == INTEGER_CST)
+    return int_const_binop_2 (code, arg1, arg2, overflowable);
+
+  gcc_assert (NUM_POLY_INT_COEFFS != 1);
+
+  if (poly_int_tree_p (arg1) && poly_int_tree_p (arg2))
+    {
+      poly_wide_int res;
+      bool overflow;
+      tree type = TREE_TYPE (arg1);
+      signop sign = TYPE_SIGN (type);
+      switch (code)
+	{
+	case PLUS_EXPR:
+	  res = wi::add (wi::to_poly_wide (arg1),
+			 wi::to_poly_wide (arg2), sign, &overflow);
+	  break;
+
+	case MINUS_EXPR:
+	  res = wi::sub (wi::to_poly_wide (arg1),
+			 wi::to_poly_wide (arg2), sign, &overflow);
+	  break;
+
+	case MULT_EXPR:
+	  if (TREE_CODE (arg2) == INTEGER_CST)
+	    res = wi::mul (wi::to_poly_wide (arg1),
+			   wi::to_wide (arg2), sign, &overflow);
+	  else if (TREE_CODE (arg1) == INTEGER_CST)
+	    res = wi::mul (wi::to_poly_wide (arg2),
+			   wi::to_wide (arg1), sign, &overflow);
+	  else
+	    return NULL_TREE;
+	  break;
+
+	case LSHIFT_EXPR:
+	  if (TREE_CODE (arg2) == INTEGER_CST)
+	    res = wi::to_poly_wide (arg1) << wi::to_wide (arg2);
+	  else
+	    return NULL_TREE;
+	  break;
+
+	case BIT_IOR_EXPR:
+	  if (TREE_CODE (arg2) != INTEGER_CST
+	      || !can_ior_p (wi::to_poly_wide (arg1), wi::to_wide (arg2),
+			     &res))
+	    return NULL_TREE;
+	  break;
+
+	default:
+	  return NULL_TREE;
+	}
+      return force_fit_type (type, res, overflowable,
+			     (((sign == SIGNED || overflowable == -1)
+			       && overflow)
+			      | TREE_OVERFLOW (arg1) | TREE_OVERFLOW (arg2)));
+    }
+
+  return NULL_TREE;
+}
+
 tree
 int_const_binop (enum tree_code code, const_tree arg1, const_tree arg2)
 {
@@ -1183,7 +1246,7 @@ const_binop (enum tree_code code, tree a
   STRIP_NOPS (arg1);
   STRIP_NOPS (arg2);
 
-  if (TREE_CODE (arg1) == INTEGER_CST && TREE_CODE (arg2) == INTEGER_CST)
+  if (poly_int_tree_p (arg1) && poly_int_tree_p (arg2))
     {
       if (code == POINTER_PLUS_EXPR)
 	return int_const_binop (PLUS_EXPR,
@@ -1721,6 +1784,8 @@ const_unop (enum tree_code code, tree ty
     case BIT_NOT_EXPR:
       if (TREE_CODE (arg0) == INTEGER_CST)
 	return fold_not_const (arg0, type);
+      else if (POLY_INT_CST_P (arg0))
+	return wide_int_to_tree (type, -poly_int_cst_value (arg0));
       /* Perform BIT_NOT_EXPR on each element individually.  */
       else if (TREE_CODE (arg0) == VECTOR_CST)
 	{
@@ -1847,7 +1912,7 @@ const_unop (enum tree_code code, tree ty
    indicates which particular sizetype to create.  */
 
 tree
-size_int_kind (HOST_WIDE_INT number, enum size_type_kind kind)
+size_int_kind (poly_int64 number, enum size_type_kind kind)
 {
   return build_int_cst (sizetype_tab[(int) kind], number);
 }
@@ -1868,8 +1933,8 @@ size_binop_loc (location_t loc, enum tre
   gcc_assert (int_binop_types_match_p (code, TREE_TYPE (arg0),
                                        TREE_TYPE (arg1)));
 
-  /* Handle the special case of two integer constants faster.  */
-  if (TREE_CODE (arg0) == INTEGER_CST && TREE_CODE (arg1) == INTEGER_CST)
+  /* Handle the special case of two poly_int constants faster.  */
+  if (poly_int_tree_p (arg0) && poly_int_tree_p (arg1))
     {
       /* And some specific cases even faster than that.  */
       if (code == PLUS_EXPR)
@@ -1893,7 +1958,9 @@ size_binop_loc (location_t loc, enum tre
       /* Handle general case of two integer constants.  For sizetype
          constant calculations we always want to know about overflow,
 	 even in the unsigned case.  */
-      return int_const_binop_1 (code, arg0, arg1, -1);
+      tree res = int_const_binop_1 (code, arg0, arg1, -1);
+      if (res != NULL_TREE)
+	return res;
     }
 
   return fold_build2_loc (loc, code, type, arg0, arg1);
@@ -2217,9 +2284,20 @@ fold_convert_const_fixed_from_real (tree
 static tree
 fold_convert_const (enum tree_code code, tree type, tree arg1)
 {
-  if (TREE_TYPE (arg1) == type)
+  tree arg_type = TREE_TYPE (arg1);
+  if (arg_type == type)
     return arg1;
 
+  /* We can't widen types, since the runtime value could overflow the
+     original type before being extended to the new type.  */
+  if (POLY_INT_CST_P (arg1)
+      && (POINTER_TYPE_P (type) || INTEGRAL_TYPE_P (type))
+      && TYPE_PRECISION (type) <= TYPE_PRECISION (arg_type))
+    return build_poly_int_cst (type,
+			       poly_wide_int::from (poly_int_cst_value (arg1),
+						    TYPE_PRECISION (type),
+						    TYPE_SIGN (arg_type)));
+
   if (POINTER_TYPE_P (type) || INTEGRAL_TYPE_P (type)
       || TREE_CODE (type) == OFFSET_TYPE)
     {
@@ -12666,6 +12744,10 @@ multiple_of_p (tree type, const_tree top
       /* fall through */
 
     default:
+      if (POLY_INT_CST_P (top) && poly_int_tree_p (bottom))
+	return multiple_p (wi::to_poly_widest (top),
+			   wi::to_poly_widest (bottom));
+
       return 0;
     }
 }
@@ -13722,16 +13804,6 @@ fold_negate_const (tree arg0, tree type)
 
   switch (TREE_CODE (arg0))
     {
-    case INTEGER_CST:
-      {
-	bool overflow;
-	wide_int val = wi::neg (wi::to_wide (arg0), &overflow);
-	t = force_fit_type (type, val, 1,
-			    (overflow && ! TYPE_UNSIGNED (type))
-			    || TREE_OVERFLOW (arg0));
-	break;
-      }
-
     case REAL_CST:
       t = build_real (type, real_value_negate (&TREE_REAL_CST (arg0)));
       break;
@@ -13750,6 +13822,16 @@ fold_negate_const (tree arg0, tree type)
       }
 
     default:
+      if (poly_int_tree_p (arg0))
+	{
+	  bool overflow;
+	  poly_wide_int res = wi::neg (wi::to_poly_wide (arg0), &overflow);
+	  t = force_fit_type (type, res, 1,
+			      (overflow && ! TYPE_UNSIGNED (type))
+			      || TREE_OVERFLOW (arg0));
+	  break;
+	}
+
       gcc_unreachable ();
     }
 
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 17:00:54.441003964 +0100
+++ gcc/expmed.c	2017-10-23 17:00:57.771973825 +0100
@@ -5276,6 +5276,9 @@ make_tree (tree type, rtx x)
       /* fall through.  */
 
     default:
+      if (CONST_POLY_INT_P (x))
+	return wide_int_to_tree (t, const_poly_int_value (x));
+
       t = build_decl (RTL_LOCATION (x), VAR_DECL, NULL_TREE, type);
 
       /* If TYPE is a POINTER_TYPE, we might need to convert X from
Index: gcc/gimple-ssa-strength-reduction.c
===================================================================
--- gcc/gimple-ssa-strength-reduction.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/gimple-ssa-strength-reduction.c	2017-10-23 17:00:57.775970190 +0100
@@ -1258,7 +1258,7 @@ slsr_process_mul (gimple *gs, tree rhs1,
       c2 = create_mul_ssa_cand (gs, rhs2, rhs1, speed);
       c->next_interp = c2->cand_num;
     }
-  else
+  else if (TREE_CODE (rhs2) == INTEGER_CST)
     {
       /* Record an interpretation for the multiply-immediate.  */
       c = create_mul_imm_cand (gs, rhs1, rhs2, speed);
@@ -1499,7 +1499,7 @@ slsr_process_add (gimple *gs, tree rhs1,
 	    add_cand_for_stmt (gs, c2);
 	}
     }
-  else
+  else if (TREE_CODE (rhs2) == INTEGER_CST)
     {
       /* Record an interpretation for the add-immediate.  */
       widest_int index = wi::to_widest (rhs2);
Index: gcc/stor-layout.c
===================================================================
--- gcc/stor-layout.c	2017-10-23 17:00:52.669615373 +0100
+++ gcc/stor-layout.c	2017-10-23 17:00:57.777968372 +0100
@@ -840,6 +840,28 @@ start_record_layout (tree t)
   return rli;
 }
 
+/* Fold sizetype value X to bitsizetype, given that X represents a type
+   size or offset.  */
+
+static tree
+bits_from_bytes (tree x)
+{
+  if (POLY_INT_CST_P (x))
+    /* The runtime calculation isn't allowed to overflow sizetype;
+       increasing the runtime values must always increase the size
+       or offset of the object.  This means that the object imposes
+       a maximum value on the runtime parameters, but we don't record
+       what that is.  */
+    return build_poly_int_cst
+      (bitsizetype,
+       poly_wide_int::from (poly_int_cst_value (x),
+			    TYPE_PRECISION (bitsizetype),
+			    TYPE_SIGN (TREE_TYPE (x))));
+  x = fold_convert (bitsizetype, x);
+  gcc_checking_assert (x);
+  return x;
+}
+
 /* Return the combined bit position for the byte offset OFFSET and the
    bit position BITPOS.
 
@@ -853,8 +875,7 @@ start_record_layout (tree t)
 bit_from_pos (tree offset, tree bitpos)
 {
   return size_binop (PLUS_EXPR, bitpos,
-		     size_binop (MULT_EXPR,
-				 fold_convert (bitsizetype, offset),
+		     size_binop (MULT_EXPR, bits_from_bytes (offset),
 				 bitsize_unit_node));
 }
 
@@ -2268,9 +2289,10 @@ layout_type (tree type)
 	  TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR,
 						   TYPE_SIZE_UNIT (innertype),
 						   size_int (nunits));
-	TYPE_SIZE (type) = int_const_binop (MULT_EXPR,
-					    TYPE_SIZE (innertype),
-					    bitsize_int (nunits));
+	TYPE_SIZE (type) = int_const_binop
+	  (MULT_EXPR,
+	   bits_from_bytes (TYPE_SIZE_UNIT (type)),
+	   bitsize_int (BITS_PER_UNIT));
 
 	/* For vector types, we do not default to the mode's alignment.
 	   Instead, query a target hook, defaulting to natural alignment.
@@ -2383,8 +2405,7 @@ layout_type (tree type)
 	      length = size_zero_node;
 
 	    TYPE_SIZE (type) = size_binop (MULT_EXPR, element_size,
-					   fold_convert (bitsizetype,
-							 length));
+					   bits_from_bytes (length));
 
 	    /* If we know the size of the element, calculate the total size
 	       directly, rather than do some division thing below.  This
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-10-23 16:52:20.504766418 +0100
+++ gcc/tree-cfg.c	2017-10-23 17:00:57.777968372 +0100
@@ -2952,7 +2952,7 @@ #define CHECK_OP(N, MSG) \
 	  error ("invalid first operand of MEM_REF");
 	  return x;
 	}
-      if (TREE_CODE (TREE_OPERAND (t, 1)) != INTEGER_CST
+      if (!poly_int_tree_p (TREE_OPERAND (t, 1))
 	  || !POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (t, 1))))
 	{
 	  error ("invalid offset operand of MEM_REF");
@@ -3358,7 +3358,7 @@ verify_types_in_gimple_reference (tree e
 	  debug_generic_stmt (expr);
 	  return true;
 	}
-      if (TREE_CODE (TREE_OPERAND (expr, 1)) != INTEGER_CST
+      if (!poly_int_tree_p (TREE_OPERAND (expr, 1))
 	  || !POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (expr, 1))))
 	{
 	  error ("invalid offset operand in MEM_REF");
@@ -3375,7 +3375,7 @@ verify_types_in_gimple_reference (tree e
 	  return true;
 	}
       if (!TMR_OFFSET (expr)
-	  || TREE_CODE (TMR_OFFSET (expr)) != INTEGER_CST
+	  || !poly_int_tree_p (TMR_OFFSET (expr))
 	  || !POINTER_TYPE_P (TREE_TYPE (TMR_OFFSET (expr))))
 	{
 	  error ("invalid offset operand in TARGET_MEM_REF");

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [007/nnn] poly_int: dump routines
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (5 preceding siblings ...)
  2017-10-23 17:02 ` [006/nnn] poly_int: tree constants Richard Sandiford
@ 2017-10-23 17:02 ` Richard Sandiford
  2017-11-17  3:38   ` Jeff Law
  2017-10-23 17:03 ` [008/nnn] poly_int: create_integer_operand Richard Sandiford
                   ` (100 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:02 UTC (permalink / raw)
  To: gcc-patches

Add poly_int routines for the dumpfile.h and pretty-print.h frameworks.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* dumpfile.h (dump_dec): Declare.
	* dumpfile.c (dump_dec): New function.
	* pretty-print.h (pp_wide_integer): Turn into a function and
	declare a poly_int version.
	* pretty-print.c (pp_wide_integer): New function for poly_ints.

Index: gcc/dumpfile.h
===================================================================
--- gcc/dumpfile.h	2017-10-23 16:52:20.417686430 +0100
+++ gcc/dumpfile.h	2017-10-23 17:01:00.431554440 +0100
@@ -174,6 +174,9 @@ extern void dump_gimple_stmt (dump_flags
 extern void print_combine_total_stats (void);
 extern bool enable_rtl_dump_file (void);
 
+template<unsigned int N, typename C>
+void dump_dec (int, const poly_int<N, C> &);
+
 /* In tree-dump.c  */
 extern void dump_node (const_tree, dump_flags_t, FILE *);
 
Index: gcc/dumpfile.c
===================================================================
--- gcc/dumpfile.c	2017-10-23 16:52:20.417686430 +0100
+++ gcc/dumpfile.c	2017-10-23 17:01:00.431554440 +0100
@@ -473,6 +473,27 @@ dump_printf_loc (dump_flags_t dump_kind,
     }
 }
 
+/* Output VALUE in decimal to appropriate dump streams.  */
+
+template<unsigned int N, typename C>
+void
+dump_dec (int dump_kind, const poly_int<N, C> &value)
+{
+  STATIC_ASSERT (poly_coeff_traits<C>::signedness >= 0);
+  signop sgn = poly_coeff_traits<C>::signedness ? SIGNED : UNSIGNED;
+  if (dump_file && (dump_kind & pflags))
+    print_dec (value, dump_file, sgn);
+
+  if (alt_dump_file && (dump_kind & alt_flags))
+    print_dec (value, alt_dump_file, sgn);
+}
+
+template void dump_dec (int, const poly_uint16 &);
+template void dump_dec (int, const poly_int64 &);
+template void dump_dec (int, const poly_uint64 &);
+template void dump_dec (int, const poly_offset_int &);
+template void dump_dec (int, const poly_widest_int &);
+
 /* Start a dump for PHASE. Store user-supplied dump flags in
    *FLAG_PTR.  Return the number of streams opened.  Set globals
    DUMP_FILE, and ALT_DUMP_FILE to point to the opened streams, and
Index: gcc/pretty-print.h
===================================================================
--- gcc/pretty-print.h	2017-10-23 16:52:20.417686430 +0100
+++ gcc/pretty-print.h	2017-10-23 17:01:00.431554440 +0100
@@ -328,8 +328,6 @@ #define pp_wide_int(PP, W, SGN)					\
       pp_string (PP, pp_buffer (PP)->digit_buffer);		\
     }								\
   while (0)
-#define pp_wide_integer(PP, I) \
-   pp_scalar (PP, HOST_WIDE_INT_PRINT_DEC, (HOST_WIDE_INT) I)
 #define pp_pointer(PP, P)      pp_scalar (PP, "%p", P)
 
 #define pp_identifier(PP, ID)  pp_string (PP, (pp_translate_identifiers (PP) \
@@ -401,4 +399,15 @@ extern const char *identifier_to_locale
 extern void *(*identifier_to_locale_alloc) (size_t);
 extern void (*identifier_to_locale_free) (void *);
 
+/* Print I to PP in decimal.  */
+
+inline void
+pp_wide_integer (pretty_printer *pp, HOST_WIDE_INT i)
+{
+  pp_scalar (pp, HOST_WIDE_INT_PRINT_DEC, i);
+}
+
+template<unsigned int N, typename T>
+void pp_wide_integer (pretty_printer *pp, const poly_int_pod<N, T> &);
+
 #endif /* GCC_PRETTY_PRINT_H */
Index: gcc/pretty-print.c
===================================================================
--- gcc/pretty-print.c	2017-10-23 16:52:20.417686430 +0100
+++ gcc/pretty-print.c	2017-10-23 17:01:00.431554440 +0100
@@ -795,6 +795,30 @@ pp_clear_state (pretty_printer *pp)
   pp_indentation (pp) = 0;
 }
 
+/* Print X to PP in decimal.  */
+template<unsigned int N, typename T>
+void
+pp_wide_integer (pretty_printer *pp, const poly_int_pod<N, T> &x)
+{
+  if (x.is_constant ())
+    pp_wide_integer (pp, x.coeffs[0]);
+  else
+    {
+      pp_left_bracket (pp);
+      for (unsigned int i = 0; i < N; ++i)
+	{
+	  if (i != 0)
+	    pp_comma (pp);
+	  pp_wide_integer (pp, x.coeffs[i]);
+	}
+      pp_right_bracket (pp);
+    }
+}
+
+template void pp_wide_integer (pretty_printer *, const poly_uint16_pod &);
+template void pp_wide_integer (pretty_printer *, const poly_int64_pod &);
+template void pp_wide_integer (pretty_printer *, const poly_uint64_pod &);
+
 /* Flush the formatted text of PRETTY-PRINTER onto the attached stream.  */
 void
 pp_write_text_to_stream (pretty_printer *pp)

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [008/nnn] poly_int: create_integer_operand
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (6 preceding siblings ...)
  2017-10-23 17:02 ` [007/nnn] poly_int: dump routines Richard Sandiford
@ 2017-10-23 17:03 ` Richard Sandiford
  2017-11-17  3:40   ` Jeff Law
  2017-10-23 17:04 ` [010/nnn] poly_int: REG_OFFSET Richard Sandiford
                   ` (99 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:03 UTC (permalink / raw)
  To: gcc-patches

This patch generalises create_integer_operand so that it accepts
poly_int64s rather than HOST_WIDE_INTs.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* optabs.h (expand_operand): Add an int_value field.
	(create_expand_operand): Add an int_value parameter and use it
	to initialize the new expand_operand field.
	(create_integer_operand): Replace with a declaration of a function
	that accepts poly_int64s.  Move the implementation to...
	* optabs.c (create_integer_operand): ...here.
	(maybe_legitimize_operand): For EXPAND_INTEGER, check whether the
	mode preserves the value of int_value, instead of calling
	const_int_operand on the rtx.

Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	2017-10-23 16:52:20.393664364 +0100
+++ gcc/optabs.h	2017-10-23 17:01:02.532643107 +0100
@@ -60,6 +60,9 @@ struct expand_operand {
 
   /* The value of the operand.  */
   rtx value;
+
+  /* The value of an EXPAND_INTEGER operand.  */
+  poly_int64 int_value;
 };
 
 /* Initialize OP with the given fields.  Initialise the other fields
@@ -69,13 +72,14 @@ struct expand_operand {
 create_expand_operand (struct expand_operand *op,
 		       enum expand_operand_type type,
 		       rtx value, machine_mode mode,
-		       bool unsigned_p)
+		       bool unsigned_p, poly_int64 int_value = 0)
 {
   op->type = type;
   op->unsigned_p = unsigned_p;
   op->unused = 0;
   op->mode = mode;
   op->value = value;
+  op->int_value = int_value;
 }
 
 /* Make OP describe an operand that must use rtx X, even if X is volatile.  */
@@ -142,18 +146,7 @@ create_address_operand (struct expand_op
   create_expand_operand (op, EXPAND_ADDRESS, value, Pmode, false);
 }
 
-/* Make OP describe an input operand that has value INTVAL and that has
-   no inherent mode.  This function should only be used for operands that
-   are always expand-time constants.  The backend may request that INTVAL
-   be copied into a different kind of rtx, but it must specify the mode
-   of that rtx if so.  */
-
-static inline void
-create_integer_operand (struct expand_operand *op, HOST_WIDE_INT intval)
-{
-  create_expand_operand (op, EXPAND_INTEGER, GEN_INT (intval), VOIDmode, false);
-}
-
+extern void create_integer_operand (struct expand_operand *, poly_int64);
 
 /* Passed to expand_simple_binop and expand_binop to say which options
    to try to use if the requested operation can't be open-coded on the
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 16:52:20.393664364 +0100
+++ gcc/optabs.c	2017-10-23 17:01:02.531644016 +0100
@@ -6959,6 +6959,20 @@ valid_multiword_target_p (rtx target)
   return true;
 }
 
+/* Make OP describe an input operand that has value INTVAL and that has
+   no inherent mode.  This function should only be used for operands that
+   are always expand-time constants.  The backend may request that INTVAL
+   be copied into a different kind of rtx, but it must specify the mode
+   of that rtx if so.  */
+
+void
+create_integer_operand (struct expand_operand *op, poly_int64 intval)
+{
+  create_expand_operand (op, EXPAND_INTEGER,
+			 gen_int_mode (intval, MAX_MODE_INT),
+			 VOIDmode, false, intval);
+}
+
 /* Like maybe_legitimize_operand, but do not change the code of the
    current rtx value.  */
 
@@ -7071,7 +7085,9 @@ maybe_legitimize_operand (enum insn_code
 
     case EXPAND_INTEGER:
       mode = insn_data[(int) icode].operand[opno].mode;
-      if (mode != VOIDmode && const_int_operand (op->value, mode))
+      if (mode != VOIDmode
+	  && must_eq (trunc_int_for_mode (op->int_value, mode),
+		      op->int_value))
 	goto input;
       break;
     }

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [009/nnn] poly_int: TRULY_NOOP_TRUNCATION
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (8 preceding siblings ...)
  2017-10-23 17:04 ` [010/nnn] poly_int: REG_OFFSET Richard Sandiford
@ 2017-10-23 17:04 ` Richard Sandiford
  2017-11-17  3:40   ` Jeff Law
  2017-10-23 17:05 ` [013/nnn] poly_int: same_addr_size_stores_p Richard Sandiford
                   ` (97 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:04 UTC (permalink / raw)
  To: gcc-patches

This patch makes TRULY_NOOP_TRUNCATION take the mode sizes as
poly_uint64s instead of unsigned ints.  The function bodies
don't need to change.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* target.def (truly_noop_truncation): Take poly_uint64s instead of
	unsigned ints.  Change default to hook_bool_puint64_puint64_true.
	* doc/tm.texi: Regenerate.
	* hooks.h (hook_bool_uint_uint_true): Delete.
	(hook_bool_puint64_puint64_true): Declare.
	* hooks.c (hook_bool_uint_uint_true): Delete.
	(hook_bool_puint64_puint64_true): New function.
	* config/mips/mips.c (mips_truly_noop_truncation): Take poly_uint64s
	instead of unsigned ints.
	* config/spu/spu.c (spu_truly_noop_truncation): Likewise.
	* config/tilegx/tilegx.c (tilegx_truly_noop_truncation): Likewise.

Index: gcc/target.def
===================================================================
--- gcc/target.def	2017-10-23 17:00:20.920834919 +0100
+++ gcc/target.def	2017-10-23 17:01:04.215112587 +0100
@@ -3155,8 +3155,8 @@ is correct for most machines.\n\
 If @code{TARGET_MODES_TIEABLE_P} returns false for a pair of modes,\n\
 suboptimal code can result if this hook returns true for the corresponding\n\
 mode sizes.  Making this hook return false in such cases may improve things.",
- bool, (unsigned int outprec, unsigned int inprec),
- hook_bool_uint_uint_true)
+ bool, (poly_uint64 outprec, poly_uint64 inprec),
+ hook_bool_puint64_puint64_true)
 
 /* If the representation of integral MODE is such that values are
    always sign-extended to a wider mode MODE_REP then return
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2017-10-23 17:00:20.917834257 +0100
+++ gcc/doc/tm.texi	2017-10-23 17:01:04.214113496 +0100
@@ -10823,7 +10823,7 @@ nevertheless truncate the shift count, y
 by overriding it.
 @end deftypefn
 
-@deftypefn {Target Hook} bool TARGET_TRULY_NOOP_TRUNCATION (unsigned int @var{outprec}, unsigned int @var{inprec})
+@deftypefn {Target Hook} bool TARGET_TRULY_NOOP_TRUNCATION (poly_uint64 @var{outprec}, poly_uint64 @var{inprec})
 This hook returns true if it is safe to ``convert'' a value of
 @var{inprec} bits to one of @var{outprec} bits (where @var{outprec} is
 smaller than @var{inprec}) by merely operating on it as if it had only
Index: gcc/hooks.h
===================================================================
--- gcc/hooks.h	2017-10-23 16:52:20.369642299 +0100
+++ gcc/hooks.h	2017-10-23 17:01:04.214113496 +0100
@@ -39,7 +39,7 @@ extern bool hook_bool_const_rtx_insn_con
 							  const rtx_insn *);
 extern bool hook_bool_mode_uhwi_false (machine_mode,
 				       unsigned HOST_WIDE_INT);
-extern bool hook_bool_uint_uint_true (unsigned int, unsigned int);
+extern bool hook_bool_puint64_puint64_true (poly_uint64, poly_uint64);
 extern bool hook_bool_uint_mode_false (unsigned int, machine_mode);
 extern bool hook_bool_uint_mode_true (unsigned int, machine_mode);
 extern bool hook_bool_tree_false (tree);
Index: gcc/hooks.c
===================================================================
--- gcc/hooks.c	2017-10-23 16:52:20.369642299 +0100
+++ gcc/hooks.c	2017-10-23 17:01:04.214113496 +0100
@@ -133,9 +133,9 @@ hook_bool_mode_uhwi_false (machine_mode,
   return false;
 }
 
-/* Generic hook that takes (unsigned int, unsigned int) and returns true.  */
+/* Generic hook that takes (poly_uint64, poly_uint64) and returns true.  */
 bool
-hook_bool_uint_uint_true (unsigned int, unsigned int)
+hook_bool_puint64_puint64_true (poly_uint64, poly_uint64)
 {
   return true;
 }
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2017-10-23 17:00:43.528930533 +0100
+++ gcc/config/mips/mips.c	2017-10-23 17:01:04.211116223 +0100
@@ -22322,7 +22322,7 @@ mips_promote_function_mode (const_tree t
 /* Implement TARGET_TRULY_NOOP_TRUNCATION.  */
 
 static bool
-mips_truly_noop_truncation (unsigned int outprec, unsigned int inprec)
+mips_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
 {
   return !TARGET_64BIT || inprec <= 32 || outprec > 32;
 }
Index: gcc/config/spu/spu.c
===================================================================
--- gcc/config/spu/spu.c	2017-10-23 17:00:43.548912356 +0100
+++ gcc/config/spu/spu.c	2017-10-23 17:01:04.212115314 +0100
@@ -7182,7 +7182,7 @@ spu_can_change_mode_class (machine_mode
 /* Implement TARGET_TRULY_NOOP_TRUNCATION.  */
 
 static bool
-spu_truly_noop_truncation (unsigned int outprec, unsigned int inprec)
+spu_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
 {
   return inprec <= 32 && outprec <= inprec;
 }
Index: gcc/config/tilegx/tilegx.c
===================================================================
--- gcc/config/tilegx/tilegx.c	2017-10-23 17:00:43.551909629 +0100
+++ gcc/config/tilegx/tilegx.c	2017-10-23 17:01:04.213114405 +0100
@@ -5566,7 +5566,7 @@ tilegx_file_end (void)
    as sign-extended DI values in registers.  */
 
 static bool
-tilegx_truly_noop_truncation (unsigned int outprec, unsigned int inprec)
+tilegx_truly_noop_truncation (poly_uint64 outprec, poly_uint64 inprec)
 {
   return inprec <= 32 || outprec > 32;
 }

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [010/nnn] poly_int: REG_OFFSET
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (7 preceding siblings ...)
  2017-10-23 17:03 ` [008/nnn] poly_int: create_integer_operand Richard Sandiford
@ 2017-10-23 17:04 ` Richard Sandiford
  2017-11-17  3:41   ` Jeff Law
  2017-10-23 17:04 ` [009/nnn] poly_int: TRULY_NOOP_TRUNCATION Richard Sandiford
                   ` (98 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:04 UTC (permalink / raw)
  To: gcc-patches

This patch changes the type of the reg_attrs offset field
from HOST_WIDE_INT to poly_int64 and updates uses accordingly.
This includes changing reg_attr_hasher::hash to use inchash.
(Doing this has no effect on code generation since the only
use of the hasher is to avoid creating duplicate objects.)


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtl.h (reg_attrs::offset): Change from HOST_WIDE_INT to poly_int64.
	(gen_rtx_REG_offset): Take the offset as a poly_int64.
	* inchash.h (inchash::hash::add_poly_hwi): New function.
	* gengtype.c (main): Register poly_int64.
	* emit-rtl.c (reg_attr_hasher::hash): Use inchash.  Treat the
	offset as a poly_int.
	(reg_attr_hasher::equal): Use must_eq to compare offsets.
	(get_reg_attrs, update_reg_offset, gen_rtx_REG_offset): Take the
	offset as a poly_int64.
	(set_reg_attrs_from_value): Treat the offset as a poly_int64.
	* print-rtl.c (print_poly_int): New function.
	(rtx_writer::print_rtx_operand_code_r): Treat REG_OFFSET as
	a poly_int.
	* var-tracking.c (track_offset_p, get_tracked_reg_offset): New
	functions.
	(var_reg_set, var_reg_delete_and_set, var_reg_delete): Use them.
	(same_variable_part_p, track_loc_p): Take the offset as a poly_int64.
	(vt_get_decl_and_offset): Return the offset as a poly_int64.
	Enforce track_offset_p for parts of a PARALLEL.
	(vt_add_function_parameter): Use const_offset for the final
	offset to track.  Use get_tracked_reg_offset for the parts
	of a PARALLEL.

Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 17:01:15.119130016 +0100
+++ gcc/rtl.h	2017-10-23 17:01:43.314993320 +0100
@@ -187,7 +187,7 @@ struct GTY(()) mem_attrs
 
 struct GTY((for_user)) reg_attrs {
   tree decl;			/* decl corresponding to REG.  */
-  HOST_WIDE_INT offset;		/* Offset from start of DECL.  */
+  poly_int64 offset;		/* Offset from start of DECL.  */
 };
 
 /* Common union for an element of an rtx.  */
@@ -2997,7 +2997,7 @@ subreg_promoted_mode (rtx x)
 extern rtvec gen_rtvec_v (int, rtx *);
 extern rtvec gen_rtvec_v (int, rtx_insn **);
 extern rtx gen_reg_rtx (machine_mode);
-extern rtx gen_rtx_REG_offset (rtx, machine_mode, unsigned int, int);
+extern rtx gen_rtx_REG_offset (rtx, machine_mode, unsigned int, poly_int64);
 extern rtx gen_reg_rtx_offset (rtx, machine_mode, int);
 extern rtx gen_reg_rtx_and_attrs (rtx);
 extern rtx_code_label *gen_label_rtx (void);
Index: gcc/inchash.h
===================================================================
--- gcc/inchash.h	2017-10-23 17:01:29.530765486 +0100
+++ gcc/inchash.h	2017-10-23 17:01:43.314993320 +0100
@@ -63,6 +63,14 @@ hashval_t iterative_hash_hashval_t (hash
     val = iterative_hash_host_wide_int (v, val);
   }
 
+  /* Add polynomial value V, treating each element as a HOST_WIDE_INT.  */
+  template<unsigned int N, typename T>
+  void add_poly_hwi (const poly_int_pod<N, T> &v)
+  {
+    for (unsigned int i = 0; i < N; ++i)
+      add_hwi (v.coeffs[i]);
+  }
+
   /* Add wide_int-based value V.  */
   template<typename T>
   void add_wide_int (const generic_wide_int<T> &x)
Index: gcc/gengtype.c
===================================================================
--- gcc/gengtype.c	2017-10-23 17:01:15.119130016 +0100
+++ gcc/gengtype.c	2017-10-23 17:01:43.313994743 +0100
@@ -5190,6 +5190,7 @@ #define POS_HERE(Call) do { pos.file = t
       POS_HERE (do_scalar_typedef ("offset_int", &pos));
       POS_HERE (do_scalar_typedef ("widest_int", &pos));
       POS_HERE (do_scalar_typedef ("int64_t", &pos));
+      POS_HERE (do_scalar_typedef ("poly_int64", &pos));
       POS_HERE (do_scalar_typedef ("uint64_t", &pos));
       POS_HERE (do_scalar_typedef ("uint8", &pos));
       POS_HERE (do_scalar_typedef ("uintptr_t", &pos));
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 17:01:15.119130016 +0100
+++ gcc/emit-rtl.c	2017-10-23 17:01:43.313994743 +0100
@@ -205,7 +205,6 @@ static rtx lookup_const_wide_int (rtx);
 #endif
 static rtx lookup_const_double (rtx);
 static rtx lookup_const_fixed (rtx);
-static reg_attrs *get_reg_attrs (tree, int);
 static rtx gen_const_vector (machine_mode, int);
 static void copy_rtx_if_shared_1 (rtx *orig);
 
@@ -424,7 +423,10 @@ reg_attr_hasher::hash (reg_attrs *x)
 {
   const reg_attrs *const p = x;
 
-  return ((p->offset * 1000) ^ (intptr_t) p->decl);
+  inchash::hash h;
+  h.add_ptr (p->decl);
+  h.add_poly_hwi (p->offset);
+  return h.end ();
 }
 
 /* Returns nonzero if the value represented by X  is the same as that given by
@@ -436,19 +438,19 @@ reg_attr_hasher::equal (reg_attrs *x, re
   const reg_attrs *const p = x;
   const reg_attrs *const q = y;
 
-  return (p->decl == q->decl && p->offset == q->offset);
+  return (p->decl == q->decl && must_eq (p->offset, q->offset));
 }
 /* Allocate a new reg_attrs structure and insert it into the hash table if
    one identical to it is not already in the table.  We are doing this for
    MEM of mode MODE.  */
 
 static reg_attrs *
-get_reg_attrs (tree decl, int offset)
+get_reg_attrs (tree decl, poly_int64 offset)
 {
   reg_attrs attrs;
 
   /* If everything is the default, we can just return zero.  */
-  if (decl == 0 && offset == 0)
+  if (decl == 0 && known_zero (offset))
     return 0;
 
   attrs.decl = decl;
@@ -1241,10 +1243,10 @@ reg_is_parm_p (rtx reg)
    to the REG_OFFSET.  */
 
 static void
-update_reg_offset (rtx new_rtx, rtx reg, int offset)
+update_reg_offset (rtx new_rtx, rtx reg, poly_int64 offset)
 {
   REG_ATTRS (new_rtx) = get_reg_attrs (REG_EXPR (reg),
-				   REG_OFFSET (reg) + offset);
+				       REG_OFFSET (reg) + offset);
 }
 
 /* Generate a register with same attributes as REG, but with OFFSET
@@ -1252,7 +1254,7 @@ update_reg_offset (rtx new_rtx, rtx reg,
 
 rtx
 gen_rtx_REG_offset (rtx reg, machine_mode mode, unsigned int regno,
-		    int offset)
+		    poly_int64 offset)
 {
   rtx new_rtx = gen_rtx_REG (mode, regno);
 
@@ -1288,7 +1290,7 @@ adjust_reg_mode (rtx reg, machine_mode m
 void
 set_reg_attrs_from_value (rtx reg, rtx x)
 {
-  int offset;
+  poly_int64 offset;
   bool can_be_reg_pointer = true;
 
   /* Don't call mark_reg_pointer for incompatible pointer sign
Index: gcc/print-rtl.c
===================================================================
--- gcc/print-rtl.c	2017-10-23 17:01:15.119130016 +0100
+++ gcc/print-rtl.c	2017-10-23 17:01:43.314993320 +0100
@@ -178,6 +178,23 @@ print_mem_expr (FILE *outfile, const_tre
   fputc (' ', outfile);
   print_generic_expr (outfile, CONST_CAST_TREE (expr), dump_flags);
 }
+
+/* Print X to FILE.  */
+
+static void
+print_poly_int (FILE *file, poly_int64 x)
+{
+  HOST_WIDE_INT const_x;
+  if (x.is_constant (&const_x))
+    fprintf (file, HOST_WIDE_INT_PRINT_DEC, const_x);
+  else
+    {
+      fprintf (file, "[" HOST_WIDE_INT_PRINT_DEC, x.coeffs[0]);
+      for (int i = 1; i < NUM_POLY_INT_COEFFS; ++i)
+	fprintf (file, ", " HOST_WIDE_INT_PRINT_DEC, x.coeffs[i]);
+      fprintf (file, "]");
+    }
+}
 #endif
 
 /* Subroutine of print_rtx_operand for handling code '0'.
@@ -499,9 +516,11 @@ rtx_writer::print_rtx_operand_code_r (co
       if (REG_EXPR (in_rtx))
 	print_mem_expr (m_outfile, REG_EXPR (in_rtx));
 
-      if (REG_OFFSET (in_rtx))
-	fprintf (m_outfile, "+" HOST_WIDE_INT_PRINT_DEC,
-		 REG_OFFSET (in_rtx));
+      if (maybe_nonzero (REG_OFFSET (in_rtx)))
+	{
+	  fprintf (m_outfile, "+");
+	  print_poly_int (m_outfile, REG_OFFSET (in_rtx));
+	}
       fputs (" ]", m_outfile);
     }
   if (regno != ORIGINAL_REGNO (in_rtx))
Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c	2017-10-23 17:01:15.119130016 +0100
+++ gcc/var-tracking.c	2017-10-23 17:01:43.315991896 +0100
@@ -673,7 +673,6 @@ static bool dataflow_set_different (data
 static void dataflow_set_destroy (dataflow_set *);
 
 static bool track_expr_p (tree, bool);
-static bool same_variable_part_p (rtx, tree, HOST_WIDE_INT);
 static void add_uses_1 (rtx *, void *);
 static void add_stores (rtx, const_rtx, void *);
 static bool compute_bb_dataflow (basic_block);
@@ -704,7 +703,6 @@ static void delete_variable_part (datafl
 static void emit_notes_in_bb (basic_block, dataflow_set *);
 static void vt_emit_notes (void);
 
-static bool vt_get_decl_and_offset (rtx, tree *, HOST_WIDE_INT *);
 static void vt_add_function_parameters (void);
 static bool vt_initialize (void);
 static void vt_finalize (void);
@@ -1850,6 +1848,32 @@ var_reg_decl_set (dataflow_set *set, rtx
   set_variable_part (set, loc, dv, offset, initialized, set_src, iopt);
 }
 
+/* Return true if we should track a location that is OFFSET bytes from
+   a variable.  Store the constant offset in *OFFSET_OUT if so.  */
+
+static bool
+track_offset_p (poly_int64 offset, HOST_WIDE_INT *offset_out)
+{
+  HOST_WIDE_INT const_offset;
+  if (!offset.is_constant (&const_offset)
+      || !IN_RANGE (const_offset, 0, MAX_VAR_PARTS - 1))
+    return false;
+  *offset_out = const_offset;
+  return true;
+}
+
+/* Return the offset of a register that track_offset_p says we
+   should track.  */
+
+static HOST_WIDE_INT
+get_tracked_reg_offset (rtx loc)
+{
+  HOST_WIDE_INT offset;
+  if (!track_offset_p (REG_OFFSET (loc), &offset))
+    gcc_unreachable ();
+  return offset;
+}
+
 /* Set the register to contain REG_EXPR (LOC), REG_OFFSET (LOC).  */
 
 static void
@@ -1857,7 +1881,7 @@ var_reg_set (dataflow_set *set, rtx loc,
 	     rtx set_src)
 {
   tree decl = REG_EXPR (loc);
-  HOST_WIDE_INT offset = REG_OFFSET (loc);
+  HOST_WIDE_INT offset = get_tracked_reg_offset (loc);
 
   var_reg_decl_set (set, loc, initialized,
 		    dv_from_decl (decl), offset, set_src, INSERT);
@@ -1903,7 +1927,7 @@ var_reg_delete_and_set (dataflow_set *se
 			enum var_init_status initialized, rtx set_src)
 {
   tree decl = REG_EXPR (loc);
-  HOST_WIDE_INT offset = REG_OFFSET (loc);
+  HOST_WIDE_INT offset = get_tracked_reg_offset (loc);
   attrs *node, *next;
   attrs **nextp;
 
@@ -1944,10 +1968,10 @@ var_reg_delete (dataflow_set *set, rtx l
   attrs **nextp = &set->regs[REGNO (loc)];
   attrs *node, *next;
 
-  if (clobber)
+  HOST_WIDE_INT offset;
+  if (clobber && track_offset_p (REG_OFFSET (loc), &offset))
     {
       tree decl = REG_EXPR (loc);
-      HOST_WIDE_INT offset = REG_OFFSET (loc);
 
       decl = var_debug_decl (decl);
 
@@ -5245,10 +5269,10 @@ track_expr_p (tree expr, bool need_rtl)
    EXPR+OFFSET.  */
 
 static bool
-same_variable_part_p (rtx loc, tree expr, HOST_WIDE_INT offset)
+same_variable_part_p (rtx loc, tree expr, poly_int64 offset)
 {
   tree expr2;
-  HOST_WIDE_INT offset2;
+  poly_int64 offset2;
 
   if (! DECL_P (expr))
     return false;
@@ -5272,7 +5296,7 @@ same_variable_part_p (rtx loc, tree expr
   expr = var_debug_decl (expr);
   expr2 = var_debug_decl (expr2);
 
-  return (expr == expr2 && offset == offset2);
+  return (expr == expr2 && must_eq (offset, offset2));
 }
 
 /* LOC is a REG or MEM that we would like to track if possible.
@@ -5286,7 +5310,7 @@ same_variable_part_p (rtx loc, tree expr
    from EXPR in *OFFSET_OUT (if nonnull).  */
 
 static bool
-track_loc_p (rtx loc, tree expr, HOST_WIDE_INT offset, bool store_reg_p,
+track_loc_p (rtx loc, tree expr, poly_int64 offset, bool store_reg_p,
 	     machine_mode *mode_out, HOST_WIDE_INT *offset_out)
 {
   machine_mode mode;
@@ -5320,19 +5344,20 @@ track_loc_p (rtx loc, tree expr, HOST_WI
        || (store_reg_p
 	   && !COMPLEX_MODE_P (DECL_MODE (expr))
 	   && hard_regno_nregs (REGNO (loc), DECL_MODE (expr)) == 1))
-      && offset + byte_lowpart_offset (DECL_MODE (expr), mode) == 0)
+      && known_zero (offset + byte_lowpart_offset (DECL_MODE (expr), mode)))
     {
       mode = DECL_MODE (expr);
       offset = 0;
     }
 
-  if (offset < 0 || offset >= MAX_VAR_PARTS)
+  HOST_WIDE_INT const_offset;
+  if (!track_offset_p (offset, &const_offset))
     return false;
 
   if (mode_out)
     *mode_out = mode;
   if (offset_out)
-    *offset_out = offset;
+    *offset_out = const_offset;
   return true;
 }
 
@@ -9544,7 +9569,7 @@ vt_emit_notes (void)
    assign declaration to *DECLP and offset to *OFFSETP, and return true.  */
 
 static bool
-vt_get_decl_and_offset (rtx rtl, tree *declp, HOST_WIDE_INT *offsetp)
+vt_get_decl_and_offset (rtx rtl, tree *declp, poly_int64 *offsetp)
 {
   if (REG_P (rtl))
     {
@@ -9570,8 +9595,10 @@ vt_get_decl_and_offset (rtx rtl, tree *d
 	    decl = REG_EXPR (reg);
 	  if (REG_EXPR (reg) != decl)
 	    break;
-	  if (REG_OFFSET (reg) < offset)
-	    offset = REG_OFFSET (reg);
+	  HOST_WIDE_INT this_offset;
+	  if (!track_offset_p (REG_OFFSET (reg), &this_offset))
+	    break;
+	  offset = MIN (offset, this_offset);
 	}
 
       if (i == len)
@@ -9615,7 +9642,7 @@ vt_add_function_parameter (tree parm)
   rtx incoming = DECL_INCOMING_RTL (parm);
   tree decl;
   machine_mode mode;
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
   dataflow_set *out;
   decl_or_value dv;
 
@@ -9738,7 +9765,8 @@ vt_add_function_parameter (tree parm)
       offset = 0;
     }
 
-  if (!track_loc_p (incoming, parm, offset, false, &mode, &offset))
+  HOST_WIDE_INT const_offset;
+  if (!track_loc_p (incoming, parm, offset, false, &mode, &const_offset))
     return;
 
   out = &VTI (ENTRY_BLOCK_PTR_FOR_FN (cfun))->out;
@@ -9759,7 +9787,7 @@ vt_add_function_parameter (tree parm)
 	 arguments passed by invisible reference aren't dealt with
 	 above: incoming-rtl will have Pmode rather than the
 	 expected mode for the type.  */
-      if (offset)
+      if (const_offset)
 	return;
 
       lowpart = var_lowpart (mode, incoming);
@@ -9774,7 +9802,7 @@ vt_add_function_parameter (tree parm)
       if (val)
 	{
 	  preserve_value (val);
-	  set_variable_part (out, val->val_rtx, dv, offset,
+	  set_variable_part (out, val->val_rtx, dv, const_offset,
 			     VAR_INIT_STATUS_INITIALIZED, NULL, INSERT);
 	  dv = dv_from_value (val->val_rtx);
 	}
@@ -9795,9 +9823,9 @@ vt_add_function_parameter (tree parm)
     {
       incoming = var_lowpart (mode, incoming);
       gcc_assert (REGNO (incoming) < FIRST_PSEUDO_REGISTER);
-      attrs_list_insert (&out->regs[REGNO (incoming)], dv, offset,
+      attrs_list_insert (&out->regs[REGNO (incoming)], dv, const_offset,
 			 incoming);
-      set_variable_part (out, incoming, dv, offset,
+      set_variable_part (out, incoming, dv, const_offset,
 			 VAR_INIT_STATUS_INITIALIZED, NULL, INSERT);
       if (dv_is_value_p (dv))
 	{
@@ -9828,17 +9856,19 @@ vt_add_function_parameter (tree parm)
       for (i = 0; i < XVECLEN (incoming, 0); i++)
 	{
 	  rtx reg = XEXP (XVECEXP (incoming, 0, i), 0);
-	  offset = REG_OFFSET (reg);
+	  /* vt_get_decl_and_offset has already checked that the offset
+	     is a valid variable part.  */
+	  const_offset = get_tracked_reg_offset (reg);
 	  gcc_assert (REGNO (reg) < FIRST_PSEUDO_REGISTER);
-	  attrs_list_insert (&out->regs[REGNO (reg)], dv, offset, reg);
-	  set_variable_part (out, reg, dv, offset,
+	  attrs_list_insert (&out->regs[REGNO (reg)], dv, const_offset, reg);
+	  set_variable_part (out, reg, dv, const_offset,
 			     VAR_INIT_STATUS_INITIALIZED, NULL, INSERT);
 	}
     }
   else if (MEM_P (incoming))
     {
       incoming = var_lowpart (mode, incoming);
-      set_variable_part (out, incoming, dv, offset,
+      set_variable_part (out, incoming, dv, const_offset,
 			 VAR_INIT_STATUS_INITIALIZED, NULL, INSERT);
     }
 }

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [011/nnn] poly_int: DWARF locations
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (10 preceding siblings ...)
  2017-10-23 17:05 ` [013/nnn] poly_int: same_addr_size_stores_p Richard Sandiford
@ 2017-10-23 17:05 ` Richard Sandiford
  2017-11-17 17:40   ` Jeff Law
  2017-10-23 17:05 ` [012/nnn] poly_int: fold_ctor_reference Richard Sandiford
                   ` (95 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:05 UTC (permalink / raw)
  To: gcc-patches

This patch adds support for DWARF location expressions
that involve polynomial offsets.  It adds a target hook that
says how the runtime invariants used in the offsets should be
represented in DWARF.  SVE vectors have to be a multiple of
128 bits in size, so the GCC port uses the number of 128-bit
blocks minus one as the runtime invariant.  However, in DWARF,
the vector length is exposed via a pseudo "VG" register that
holds the number of 64-bit elements in a vector.  Thus:

  indeterminate 1 == (VG / 2) - 1

The hook needs to be general enough to express this.
Note that in most cases the division and subtraction fold
away into surrounding expressions.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* target.def (dwarf_poly_indeterminate_value): New hook.
	* targhooks.h (default_dwarf_poly_indeterminate_value): Declare.
	* targhooks.c (default_dwarf_poly_indeterminate_value): New function.
	* doc/tm.texi.in (TARGET_DWARF_POLY_INDETERMINATE_VALUE): Document.
	* doc/tm.texi: Regenerate.
	* dwarf2out.h (build_cfa_loc, build_cfa_aligned_loc): Take the
	offset as a poly_int64.
	* dwarf2out.c (new_reg_loc_descr): Move later in file.  Take the
	offset as a poly_int64.
	(loc_descr_plus_const, loc_list_plus_const, build_cfa_aligned_loc):
	Take the offset as a poly_int64.
	(build_cfa_loc): Likewise.  Use loc_descr_plus_const.
	(frame_pointer_fb_offset): Change to a poly_int64.
	(int_loc_descriptor): Take the offset as a poly_int64.  Use
	targetm.dwarf_poly_indeterminate_value for polynomial offsets.
	(based_loc_descr): Take the offset as a poly_int64.
	Use strip_offset_and_add to handle (plus X (const)).
	Use new_reg_loc_descr instead of an open-coded version of the
	previous implementation.
	(mem_loc_descriptor): Handle CONST_POLY_INT.
	(compute_frame_pointer_to_fb_displacement): Take the offset as a
	poly_int64.  Use strip_offset_and_add to handle (plus X (const)).

Index: gcc/target.def
===================================================================
--- gcc/target.def	2017-10-23 17:01:04.215112587 +0100
+++ gcc/target.def	2017-10-23 17:01:45.057509456 +0100
@@ -4124,6 +4124,21 @@ the CFI label attached to the insn, @var
 the insn and @var{index} is @code{UNSPEC_INDEX} or @code{UNSPECV_INDEX}.",
  void, (const char *label, rtx pattern, int index), NULL)
 
+DEFHOOK
+(dwarf_poly_indeterminate_value,
+ "Express the value of @code{poly_int} indeterminate @var{i} as a DWARF\n\
+expression, with @var{i} counting from 1.  Return the number of a DWARF\n\
+register @var{R} and set @samp{*@var{factor}} and @samp{*@var{offset}} such\n\
+that the value of the indeterminate is:\n\
+@smallexample\n\
+value_of(@var{R}) / @var{factor} - @var{offset}\n\
+@end smallexample\n\
+\n\
+A target only needs to define this hook if it sets\n\
+@samp{NUM_POLY_INT_COEFFS} to a value greater than 1.",
+ unsigned int, (unsigned int i, unsigned int *factor, int *offset),
+ default_dwarf_poly_indeterminate_value)
+
 /* ??? Documenting this hook requires a GFDL license grant.  */
 DEFHOOK_UNDOC
 (stdarg_optimize_hook,
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	2017-10-23 17:00:20.920834919 +0100
+++ gcc/targhooks.h	2017-10-23 17:01:45.057509456 +0100
@@ -234,6 +234,9 @@ extern int default_label_align_max_skip
 extern int default_jump_align_max_skip (rtx_insn *);
 extern section * default_function_section(tree decl, enum node_frequency freq,
 					  bool startup, bool exit);
+extern unsigned int default_dwarf_poly_indeterminate_value (unsigned int,
+							    unsigned int *,
+							    int *);
 extern machine_mode default_dwarf_frame_reg_mode (int);
 extern fixed_size_mode default_get_reg_raw_mode (int);
 extern bool default_keep_leaf_when_profiled ();
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2017-10-23 17:00:49.664349224 +0100
+++ gcc/targhooks.c	2017-10-23 17:01:45.057509456 +0100
@@ -1838,6 +1838,15 @@ default_debug_unwind_info (void)
   return UI_NONE;
 }
 
+/* Targets that set NUM_POLY_INT_COEFFS to something greater than 1
+   must define this hook.  */
+
+unsigned int
+default_dwarf_poly_indeterminate_value (unsigned int, unsigned int *, int *)
+{
+  gcc_unreachable ();
+}
+
 /* Determine the correct mode for a Dwarf frame register that represents
    register REGNO.  */
 
Index: gcc/doc/tm.texi.in
===================================================================
--- gcc/doc/tm.texi.in	2017-10-23 17:00:20.918834478 +0100
+++ gcc/doc/tm.texi.in	2017-10-23 17:01:45.053515150 +0100
@@ -2553,6 +2553,8 @@ terminate the stack backtrace.  New port
 
 @hook TARGET_DWARF_HANDLE_FRAME_UNSPEC
 
+@hook TARGET_DWARF_POLY_INDETERMINATE_VALUE
+
 @defmac INCOMING_FRAME_SP_OFFSET
 A C expression whose value is an integer giving the offset, in bytes,
 from the value of the stack pointer register to the top of the stack
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2017-10-23 17:01:04.214113496 +0100
+++ gcc/doc/tm.texi	2017-10-23 17:01:45.052516573 +0100
@@ -3133,6 +3133,19 @@ the CFI label attached to the insn, @var
 the insn and @var{index} is @code{UNSPEC_INDEX} or @code{UNSPECV_INDEX}.
 @end deftypefn
 
+@deftypefn {Target Hook} {unsigned int} TARGET_DWARF_POLY_INDETERMINATE_VALUE (unsigned int @var{i}, unsigned int *@var{factor}, int *@var{offset})
+Express the value of @code{poly_int} indeterminate @var{i} as a DWARF
+expression, with @var{i} counting from 1.  Return the number of a DWARF
+register @var{R} and set @samp{*@var{factor}} and @samp{*@var{offset}} such
+that the value of the indeterminate is:
+@smallexample
+value_of(@var{R}) / @var{factor} - @var{offset}
+@end smallexample
+
+A target only needs to define this hook if it sets
+@samp{NUM_POLY_INT_COEFFS} to a value greater than 1.
+@end deftypefn
+
 @defmac INCOMING_FRAME_SP_OFFSET
 A C expression whose value is an integer giving the offset, in bytes,
 from the value of the stack pointer register to the top of the stack
Index: gcc/dwarf2out.h
===================================================================
--- gcc/dwarf2out.h	2017-10-23 16:52:20.259541165 +0100
+++ gcc/dwarf2out.h	2017-10-23 17:01:45.056510879 +0100
@@ -267,9 +267,9 @@ struct GTY(()) dw_discr_list_node {
 
 /* Interface from dwarf2out.c to dwarf2cfi.c.  */
 extern struct dw_loc_descr_node *build_cfa_loc
-  (dw_cfa_location *, HOST_WIDE_INT);
+  (dw_cfa_location *, poly_int64);
 extern struct dw_loc_descr_node *build_cfa_aligned_loc
-  (dw_cfa_location *, HOST_WIDE_INT offset, HOST_WIDE_INT alignment);
+  (dw_cfa_location *, poly_int64, HOST_WIDE_INT);
 extern struct dw_loc_descr_node *mem_loc_descriptor
   (rtx, machine_mode mode, machine_mode mem_mode,
    enum var_init_status);
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 17:00:54.439005782 +0100
+++ gcc/dwarf2out.c	2017-10-23 17:01:45.056510879 +0100
@@ -1307,7 +1307,7 @@ typedef struct GTY(()) dw_loc_list_struc
   bool force;
 } dw_loc_list_node;
 
-static dw_loc_descr_ref int_loc_descriptor (HOST_WIDE_INT);
+static dw_loc_descr_ref int_loc_descriptor (poly_int64);
 static dw_loc_descr_ref uint_loc_descriptor (unsigned HOST_WIDE_INT);
 
 /* Convert a DWARF stack opcode into its string name.  */
@@ -1344,19 +1344,6 @@ new_loc_descr (enum dwarf_location_atom
   return descr;
 }
 
-/* Return a pointer to a newly allocated location description for
-   REG and OFFSET.  */
-
-static inline dw_loc_descr_ref
-new_reg_loc_descr (unsigned int reg,  unsigned HOST_WIDE_INT offset)
-{
-  if (reg <= 31)
-    return new_loc_descr ((enum dwarf_location_atom) (DW_OP_breg0 + reg),
-			  offset, 0);
-  else
-    return new_loc_descr (DW_OP_bregx, reg, offset);
-}
-
 /* Add a location description term to a location description expression.  */
 
 static inline void
@@ -1489,23 +1476,31 @@ loc_descr_equal_p (dw_loc_descr_ref a, d
 }
 
 
-/* Add a constant OFFSET to a location expression.  */
+/* Add a constant POLY_OFFSET to a location expression.  */
 
 static void
-loc_descr_plus_const (dw_loc_descr_ref *list_head, HOST_WIDE_INT offset)
+loc_descr_plus_const (dw_loc_descr_ref *list_head, poly_int64 poly_offset)
 {
   dw_loc_descr_ref loc;
   HOST_WIDE_INT *p;
 
   gcc_assert (*list_head != NULL);
 
-  if (!offset)
+  if (known_zero (poly_offset))
     return;
 
   /* Find the end of the chain.  */
   for (loc = *list_head; loc->dw_loc_next != NULL; loc = loc->dw_loc_next)
     ;
 
+  HOST_WIDE_INT offset;
+  if (!poly_offset.is_constant (&offset))
+    {
+      loc->dw_loc_next = int_loc_descriptor (poly_offset);
+      add_loc_descr (&loc->dw_loc_next, new_loc_descr (DW_OP_plus, 0, 0));
+      return;
+    }
+
   p = NULL;
   if (loc->dw_loc_opc == DW_OP_fbreg
       || (loc->dw_loc_opc >= DW_OP_breg0 && loc->dw_loc_opc <= DW_OP_breg31))
@@ -1531,10 +1526,33 @@ loc_descr_plus_const (dw_loc_descr_ref *
     }
 }
 
+/* Return a pointer to a newly allocated location description for
+   REG and OFFSET.  */
+
+static inline dw_loc_descr_ref
+new_reg_loc_descr (unsigned int reg, poly_int64 offset)
+{
+  HOST_WIDE_INT const_offset;
+  if (offset.is_constant (&const_offset))
+    {
+      if (reg <= 31)
+	return new_loc_descr ((enum dwarf_location_atom) (DW_OP_breg0 + reg),
+			      const_offset, 0);
+      else
+	return new_loc_descr (DW_OP_bregx, reg, const_offset);
+    }
+  else
+    {
+      dw_loc_descr_ref ret = new_reg_loc_descr (reg, 0);
+      loc_descr_plus_const (&ret, offset);
+      return ret;
+    }
+}
+
 /* Add a constant OFFSET to a location list.  */
 
 static void
-loc_list_plus_const (dw_loc_list_ref list_head, HOST_WIDE_INT offset)
+loc_list_plus_const (dw_loc_list_ref list_head, poly_int64 offset)
 {
   dw_loc_list_ref d;
   for (d = list_head; d != NULL; d = d->dw_loc_next)
@@ -2614,7 +2632,7 @@ output_loc_sequence_raw (dw_loc_descr_re
    expression.  */
 
 struct dw_loc_descr_node *
-build_cfa_loc (dw_cfa_location *cfa, HOST_WIDE_INT offset)
+build_cfa_loc (dw_cfa_location *cfa, poly_int64 offset)
 {
   struct dw_loc_descr_node *head, *tmp;
 
@@ -2627,11 +2645,7 @@ build_cfa_loc (dw_cfa_location *cfa, HOS
       head->dw_loc_oprnd1.val_entry = NULL;
       tmp = new_loc_descr (DW_OP_deref, 0, 0);
       add_loc_descr (&head, tmp);
-      if (offset != 0)
-	{
-	  tmp = new_loc_descr (DW_OP_plus_uconst, offset, 0);
-	  add_loc_descr (&head, tmp);
-	}
+      loc_descr_plus_const (&head, offset);
     }
   else
     head = new_reg_loc_descr (cfa->reg, offset);
@@ -2645,7 +2659,7 @@ build_cfa_loc (dw_cfa_location *cfa, HOS
 
 struct dw_loc_descr_node *
 build_cfa_aligned_loc (dw_cfa_location *cfa,
-		       HOST_WIDE_INT offset, HOST_WIDE_INT alignment)
+		       poly_int64 offset, HOST_WIDE_INT alignment)
 {
   struct dw_loc_descr_node *head;
   unsigned int dwarf_fp
@@ -3331,7 +3345,7 @@ static GTY(()) vec<tree, va_gc> *generic
 
 /* Offset from the "steady-state frame pointer" to the frame base,
    within the current function.  */
-static HOST_WIDE_INT frame_pointer_fb_offset;
+static poly_int64 frame_pointer_fb_offset;
 static bool frame_pointer_fb_offset_valid;
 
 static vec<dw_die_ref> base_types;
@@ -3505,7 +3519,7 @@ static dw_loc_descr_ref one_reg_loc_desc
 						enum var_init_status);
 static dw_loc_descr_ref multiple_reg_loc_descriptor (rtx, rtx,
 						     enum var_init_status);
-static dw_loc_descr_ref based_loc_descr (rtx, HOST_WIDE_INT,
+static dw_loc_descr_ref based_loc_descr (rtx, poly_int64,
 					 enum var_init_status);
 static int is_based_loc (const_rtx);
 static bool resolve_one_addr (rtx *);
@@ -13202,13 +13216,58 @@ int_shift_loc_descriptor (HOST_WIDE_INT
   return ret;
 }
 
-/* Return a location descriptor that designates a constant.  */
+/* Return a location descriptor that designates constant POLY_I.  */
 
 static dw_loc_descr_ref
-int_loc_descriptor (HOST_WIDE_INT i)
+int_loc_descriptor (poly_int64 poly_i)
 {
   enum dwarf_location_atom op;
 
+  HOST_WIDE_INT i;
+  if (!poly_i.is_constant (&i))
+    {
+      /* Create location descriptions for the non-constant part and
+	 add any constant offset at the end.  */
+      dw_loc_descr_ref ret = NULL;
+      HOST_WIDE_INT constant = poly_i.coeffs[0];
+      for (unsigned int j = 1; j < NUM_POLY_INT_COEFFS; ++j)
+	{
+	  HOST_WIDE_INT coeff = poly_i.coeffs[j];
+	  if (coeff != 0)
+	    {
+	      dw_loc_descr_ref start = ret;
+	      unsigned int factor;
+	      int bias;
+	      unsigned int regno = targetm.dwarf_poly_indeterminate_value
+		(j, &factor, &bias);
+
+	      /* Add COEFF * ((REGNO / FACTOR) - BIAS) to the value:
+		 add COEFF * (REGNO / FACTOR) now and subtract
+		 COEFF * BIAS from the final constant part.  */
+	      constant -= coeff * bias;
+	      add_loc_descr (&ret, new_reg_loc_descr (regno, 0));
+	      if (coeff % factor == 0)
+		coeff /= factor;
+	      else
+		{
+		  int amount = exact_log2 (factor);
+		  gcc_assert (amount >= 0);
+		  add_loc_descr (&ret, int_loc_descriptor (amount));
+		  add_loc_descr (&ret, new_loc_descr (DW_OP_shr, 0, 0));
+		}
+	      if (coeff != 1)
+		{
+		  add_loc_descr (&ret, int_loc_descriptor (coeff));
+		  add_loc_descr (&ret, new_loc_descr (DW_OP_mul, 0, 0));
+		}
+	      if (start)
+		add_loc_descr (&ret, new_loc_descr (DW_OP_plus, 0, 0));
+	    }
+	}
+      loc_descr_plus_const (&ret, constant);
+      return ret;
+    }
+
   /* Pick the smallest representation of a constant, rather than just
      defaulting to the LEB encoding.  */
   if (i >= 0)
@@ -13574,7 +13633,7 @@ address_of_int_loc_descriptor (int size,
 /* Return a location descriptor that designates a base+offset location.  */
 
 static dw_loc_descr_ref
-based_loc_descr (rtx reg, HOST_WIDE_INT offset,
+based_loc_descr (rtx reg, poly_int64 offset,
 		 enum var_init_status initialized)
 {
   unsigned int regno;
@@ -13593,11 +13652,7 @@ based_loc_descr (rtx reg, HOST_WIDE_INT
 
       if (elim != reg)
 	{
-	  if (GET_CODE (elim) == PLUS)
-	    {
-	      offset += INTVAL (XEXP (elim, 1));
-	      elim = XEXP (elim, 0);
-	    }
+	  elim = strip_offset_and_add (elim, &offset);
 	  gcc_assert ((SUPPORTS_STACK_ALIGNMENT
 		       && (elim == hard_frame_pointer_rtx
 			   || elim == stack_pointer_rtx))
@@ -13621,7 +13676,15 @@ based_loc_descr (rtx reg, HOST_WIDE_INT
 
 	  gcc_assert (frame_pointer_fb_offset_valid);
 	  offset += frame_pointer_fb_offset;
-	  return new_loc_descr (DW_OP_fbreg, offset, 0);
+	  HOST_WIDE_INT const_offset;
+	  if (offset.is_constant (&const_offset))
+	    return new_loc_descr (DW_OP_fbreg, const_offset, 0);
+	  else
+	    {
+	      dw_loc_descr_ref ret = new_loc_descr (DW_OP_fbreg, 0, 0);
+	      loc_descr_plus_const (&ret, offset);
+	      return ret;
+	    }
 	}
     }
 
@@ -13636,8 +13699,10 @@ based_loc_descr (rtx reg, HOST_WIDE_INT
 #endif
   regno = DWARF_FRAME_REGNUM (regno);
 
+  HOST_WIDE_INT const_offset;
   if (!optimize && fde
-      && (fde->drap_reg == regno || fde->vdrap_reg == regno))
+      && (fde->drap_reg == regno || fde->vdrap_reg == regno)
+      && offset.is_constant (&const_offset))
     {
       /* Use cfa+offset to represent the location of arguments passed
 	 on the stack when drap is used to align stack.
@@ -13645,14 +13710,10 @@ based_loc_descr (rtx reg, HOST_WIDE_INT
 	 is supposed to track where the arguments live and the register
 	 used as vdrap or drap in some spot might be used for something
 	 else in other part of the routine.  */
-      return new_loc_descr (DW_OP_fbreg, offset, 0);
+      return new_loc_descr (DW_OP_fbreg, const_offset, 0);
     }
 
-  if (regno <= 31)
-    result = new_loc_descr ((enum dwarf_location_atom) (DW_OP_breg0 + regno),
-			    offset, 0);
-  else
-    result = new_loc_descr (DW_OP_bregx, regno, offset);
+  result = new_reg_loc_descr (regno, offset);
 
   if (initialized == VAR_INIT_STATUS_UNINITIALIZED)
     add_loc_descr (&result, new_loc_descr (DW_OP_GNU_uninit, 0, 0));
@@ -14648,6 +14709,7 @@ mem_loc_descriptor (rtx rtl, machine_mod
   enum dwarf_location_atom op;
   dw_loc_descr_ref op0, op1;
   rtx inner = NULL_RTX;
+  poly_int64 offset;
 
   if (mode == VOIDmode)
     mode = GET_MODE (rtl);
@@ -15328,6 +15390,10 @@ mem_loc_descriptor (rtx rtl, machine_mod
 	}
       break;
 
+    case CONST_POLY_INT:
+      mem_loc_result = int_loc_descriptor (rtx_to_poly_int64 (rtl));
+      break;
+
     case EQ:
       mem_loc_result = scompare_loc_descriptor (DW_OP_eq, rtl, mem_mode);
       break;
@@ -19637,7 +19703,7 @@ convert_cfa_to_fb_loc_list (HOST_WIDE_IN
    before the latter is negated.  */
 
 static void
-compute_frame_pointer_to_fb_displacement (HOST_WIDE_INT offset)
+compute_frame_pointer_to_fb_displacement (poly_int64 offset)
 {
   rtx reg, elim;
 
@@ -19652,11 +19718,7 @@ compute_frame_pointer_to_fb_displacement
   elim = (ira_use_lra_p
 	  ? lra_eliminate_regs (reg, VOIDmode, NULL_RTX)
 	  : eliminate_regs (reg, VOIDmode, NULL_RTX));
-  if (GET_CODE (elim) == PLUS)
-    {
-      offset += INTVAL (XEXP (elim, 1));
-      elim = XEXP (elim, 0);
-    }
+  elim = strip_offset_and_add (elim, &offset);
 
   frame_pointer_fb_offset = -offset;
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [012/nnn] poly_int: fold_ctor_reference
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (11 preceding siblings ...)
  2017-10-23 17:05 ` [011/nnn] poly_int: DWARF locations Richard Sandiford
@ 2017-10-23 17:05 ` Richard Sandiford
  2017-11-17  3:59   ` Jeff Law
  2017-10-23 17:06 ` [015/nnn] poly_int: ao_ref and vn_reference_op_t Richard Sandiford
                   ` (94 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:05 UTC (permalink / raw)
  To: gcc-patches

This patch changes the offset and size arguments to
fold_ctor_reference from unsigned HOST_WIDE_INT to poly_uint64.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* gimple-fold.h (fold_ctor_reference): Take the offset and size
	as poly_uint64 rather than unsigned HOST_WIDE_INT.
	* gimple-fold.c (fold_ctor_reference): Likewise.

Index: gcc/gimple-fold.h
===================================================================
--- gcc/gimple-fold.h	2017-10-23 16:52:20.201487839 +0100
+++ gcc/gimple-fold.h	2017-10-23 17:01:48.165079780 +0100
@@ -44,8 +44,7 @@ extern tree follow_single_use_edges (tre
 extern tree gimple_fold_stmt_to_constant_1 (gimple *, tree (*) (tree),
 					    tree (*) (tree) = no_follow_ssa_edges);
 extern tree gimple_fold_stmt_to_constant (gimple *, tree (*) (tree));
-extern tree fold_ctor_reference (tree, tree, unsigned HOST_WIDE_INT,
-				 unsigned HOST_WIDE_INT, tree);
+extern tree fold_ctor_reference (tree, tree, poly_uint64, poly_uint64, tree);
 extern tree fold_const_aggregate_ref_1 (tree, tree (*) (tree));
 extern tree fold_const_aggregate_ref (tree);
 extern tree gimple_get_virt_method_for_binfo (HOST_WIDE_INT, tree,
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	2017-10-23 16:52:20.201487839 +0100
+++ gcc/gimple-fold.c	2017-10-23 17:01:48.164081204 +0100
@@ -6365,20 +6365,25 @@ fold_nonarray_ctor_reference (tree type,
   return build_zero_cst (type);
 }
 
-/* CTOR is value initializing memory, fold reference of type TYPE and size SIZE
-   to the memory at bit OFFSET.  */
+/* CTOR is value initializing memory, fold reference of type TYPE and
+   size POLY_SIZE to the memory at bit POLY_OFFSET.  */
 
 tree
-fold_ctor_reference (tree type, tree ctor, unsigned HOST_WIDE_INT offset,
-		     unsigned HOST_WIDE_INT size, tree from_decl)
+fold_ctor_reference (tree type, tree ctor, poly_uint64 poly_offset,
+		     poly_uint64 poly_size, tree from_decl)
 {
   tree ret;
 
   /* We found the field with exact match.  */
   if (useless_type_conversion_p (type, TREE_TYPE (ctor))
-      && !offset)
+      && known_zero (poly_offset))
     return canonicalize_constructor_val (unshare_expr (ctor), from_decl);
 
+  /* The remaining optimizations need a constant size and offset.  */
+  unsigned HOST_WIDE_INT size, offset;
+  if (!poly_size.is_constant (&size) || !poly_offset.is_constant (&offset))
+    return NULL_TREE;
+
   /* We are at the end of walk, see if we can view convert the
      result.  */
   if (!AGGREGATE_TYPE_P (TREE_TYPE (ctor)) && !offset

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [013/nnn] poly_int: same_addr_size_stores_p
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (9 preceding siblings ...)
  2017-10-23 17:04 ` [009/nnn] poly_int: TRULY_NOOP_TRUNCATION Richard Sandiford
@ 2017-10-23 17:05 ` Richard Sandiford
  2017-11-17  4:11   ` Jeff Law
  2017-10-23 17:05 ` [011/nnn] poly_int: DWARF locations Richard Sandiford
                   ` (96 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:05 UTC (permalink / raw)
  To: gcc-patches

This patch makes tree-ssa-alias.c:same_addr_size_stores_p handle
poly_int sizes and offsets.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-ssa-alias.c (same_addr_size_stores_p): Take the offsets and
	sizes as poly_int64s rather than HOST_WIDE_INTs.

Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c	2017-10-23 16:52:20.150440950 +0100
+++ gcc/tree-ssa-alias.c	2017-10-23 17:01:49.579064221 +0100
@@ -2322,14 +2322,14 @@ stmt_may_clobber_ref_p (gimple *stmt, tr
    address.  */
 
 static bool
-same_addr_size_stores_p (tree base1, HOST_WIDE_INT offset1, HOST_WIDE_INT size1,
-			 HOST_WIDE_INT max_size1,
-			 tree base2, HOST_WIDE_INT offset2, HOST_WIDE_INT size2,
-			 HOST_WIDE_INT max_size2)
+same_addr_size_stores_p (tree base1, poly_int64 offset1, poly_int64 size1,
+			 poly_int64 max_size1,
+			 tree base2, poly_int64 offset2, poly_int64 size2,
+			 poly_int64 max_size2)
 {
   /* Offsets need to be 0.  */
-  if (offset1 != 0
-      || offset2 != 0)
+  if (maybe_nonzero (offset1)
+      || maybe_nonzero (offset2))
     return false;
 
   bool base1_obj_p = SSA_VAR_P (base1);
@@ -2348,17 +2348,19 @@ same_addr_size_stores_p (tree base1, HOS
   tree memref = base1_memref_p ? base1 : base2;
 
   /* Sizes need to be valid.  */
-  if (max_size1 == -1 || max_size2 == -1
-      || size1 == -1 || size2 == -1)
+  if (!known_size_p (max_size1)
+      || !known_size_p (max_size2)
+      || !known_size_p (size1)
+      || !known_size_p (size2))
     return false;
 
   /* Max_size needs to match size.  */
-  if (max_size1 != size1
-      || max_size2 != size2)
+  if (may_ne (max_size1, size1)
+      || may_ne (max_size2, size2))
     return false;
 
   /* Sizes need to match.  */
-  if (size1 != size2)
+  if (may_ne (size1, size2))
     return false;
 
 
@@ -2386,10 +2388,9 @@ same_addr_size_stores_p (tree base1, HOS
 
   /* Check that the object size is the same as the store size.  That ensures us
      that ptr points to the start of obj.  */
-  if (!tree_fits_shwi_p (DECL_SIZE (obj)))
-    return false;
-  HOST_WIDE_INT obj_size = tree_to_shwi (DECL_SIZE (obj));
-  return obj_size == size1;
+  return (DECL_SIZE (obj)
+	  && poly_int_tree_p (DECL_SIZE (obj))
+	  && must_eq (wi::to_poly_offset (DECL_SIZE (obj)), size1));
 }
 
 /* If STMT kills the memory reference REF return true, otherwise

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [015/nnn] poly_int: ao_ref and vn_reference_op_t
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (12 preceding siblings ...)
  2017-10-23 17:05 ` [012/nnn] poly_int: fold_ctor_reference Richard Sandiford
@ 2017-10-23 17:06 ` Richard Sandiford
  2017-11-18  4:25   ` Jeff Law
  2017-10-23 17:06 ` [014/nnn] poly_int: indirect_refs_may_alias_p Richard Sandiford
                   ` (93 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:06 UTC (permalink / raw)
  To: gcc-patches

This patch changes the offset, size and max_size fields
of ao_ref from HOST_WIDE_INT to poly_int64 and propagates
the change through the code that references it.  This includes
changing the off field of vn_reference_op_struct in the same way.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* inchash.h (inchash::hash::add_poly_int): New function.
	* tree-ssa-alias.h (ao_ref::offset, ao_ref::size, ao_ref::max_size):
	Use poly_int64 rather than HOST_WIDE_INT.
	(ao_ref::max_size_known_p): New function.
	* tree-ssa-sccvn.h (vn_reference_op_struct::off): Use poly_int64_pod
	rather than HOST_WIDE_INT.
	* tree-ssa-alias.c (ao_ref_base): Apply get_ref_base_and_extent
	to temporaries until its interface is adjusted to match.
	(ao_ref_init_from_ptr_and_size): Handle polynomial offsets and sizes.
	(aliasing_component_refs_p, decl_refs_may_alias_p)
	(indirect_ref_may_alias_decl_p, indirect_refs_may_alias_p): Take
	the offsets and max_sizes as poly_int64s instead of HOST_WIDE_INTs.
	(refs_may_alias_p_1, stmt_kills_ref_p): Adjust for changes to
	ao_ref fields.
	* alias.c (ao_ref_from_mem): Likewise.
	* tree-ssa-dce.c (mark_aliased_reaching_defs_necessary_1): Likewise.
	* tree-ssa-dse.c (valid_ao_ref_for_dse, normalize_ref)
	(clear_bytes_written_by, setup_live_bytes_from_ref, compute_trims)
	(maybe_trim_complex_store, maybe_trim_constructor_store)
	(live_bytes_read, dse_classify_store): Likewise.
	* tree-ssa-sccvn.c (vn_reference_compute_hash, vn_reference_eq):
	(copy_reference_ops_from_ref, ao_ref_init_from_vn_reference)
	(fully_constant_vn_reference_p, valueize_refs_1): Likewise.
	(vn_reference_lookup_3): Likewise.
	* tree-ssa-uninit.c (warn_uninitialized_vars): Likewise.

Index: gcc/inchash.h
===================================================================
--- gcc/inchash.h	2017-10-23 17:01:43.314993320 +0100
+++ gcc/inchash.h	2017-10-23 17:01:52.303181137 +0100
@@ -57,6 +57,14 @@ hashval_t iterative_hash_hashval_t (hash
     val = iterative_hash_hashval_t (v, val);
   }
 
+  /* Add polynomial value V, treating each element as an unsigned int.  */
+  template<unsigned int N, typename T>
+  void add_poly_int (const poly_int_pod<N, T> &v)
+  {
+    for (unsigned int i = 0; i < N; ++i)
+      add_int (v.coeffs[i]);
+  }
+
   /* Add HOST_WIDE_INT value V.  */
   void add_hwi (HOST_WIDE_INT v)
   {
Index: gcc/tree-ssa-alias.h
===================================================================
--- gcc/tree-ssa-alias.h	2017-10-23 16:52:20.058356365 +0100
+++ gcc/tree-ssa-alias.h	2017-10-23 17:01:52.304179714 +0100
@@ -80,11 +80,11 @@ struct ao_ref
      the following fields are not yet computed.  */
   tree base;
   /* The offset relative to the base.  */
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
   /* The size of the access.  */
-  HOST_WIDE_INT size;
+  poly_int64 size;
   /* The maximum possible extent of the access or -1 if unconstrained.  */
-  HOST_WIDE_INT max_size;
+  poly_int64 max_size;
 
   /* The alias set of the access or -1 if not yet computed.  */
   alias_set_type ref_alias_set;
@@ -94,8 +94,18 @@ struct ao_ref
 
   /* Whether the memory is considered a volatile access.  */
   bool volatile_p;
+
+  bool max_size_known_p () const;
 };
 
+/* Return true if the maximum size is known, rather than the special -1
+   marker.  */
+
+inline bool
+ao_ref::max_size_known_p () const
+{
+  return known_size_p (max_size);
+}
 
 /* In tree-ssa-alias.c  */
 extern void ao_ref_init (ao_ref *, tree);
Index: gcc/tree-ssa-sccvn.h
===================================================================
--- gcc/tree-ssa-sccvn.h	2017-10-23 16:52:20.058356365 +0100
+++ gcc/tree-ssa-sccvn.h	2017-10-23 17:01:52.305178291 +0100
@@ -93,7 +93,7 @@ typedef struct vn_reference_op_struct
   /* For storing TYPE_ALIGN for array ref element size computation.  */
   unsigned align : 6;
   /* Constant offset this op adds or -1 if it is variable.  */
-  HOST_WIDE_INT off;
+  poly_int64_pod off;
   tree type;
   tree op0;
   tree op1;
Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c	2017-10-23 17:01:51.044974644 +0100
+++ gcc/tree-ssa-alias.c	2017-10-23 17:01:52.304179714 +0100
@@ -635,11 +635,15 @@ ao_ref_init (ao_ref *r, tree ref)
 ao_ref_base (ao_ref *ref)
 {
   bool reverse;
+  HOST_WIDE_INT offset, size, max_size;
 
   if (ref->base)
     return ref->base;
-  ref->base = get_ref_base_and_extent (ref->ref, &ref->offset, &ref->size,
-				       &ref->max_size, &reverse);
+  ref->base = get_ref_base_and_extent (ref->ref, &offset, &size,
+				       &max_size, &reverse);
+  ref->offset = offset;
+  ref->size = size;
+  ref->max_size = max_size;
   return ref->base;
 }
 
@@ -679,7 +683,8 @@ ao_ref_alias_set (ao_ref *ref)
 void
 ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size)
 {
-  HOST_WIDE_INT t, size_hwi, extra_offset = 0;
+  HOST_WIDE_INT t;
+  poly_int64 size_hwi, extra_offset = 0;
   ref->ref = NULL_TREE;
   if (TREE_CODE (ptr) == SSA_NAME)
     {
@@ -689,11 +694,10 @@ ao_ref_init_from_ptr_and_size (ao_ref *r
 	ptr = gimple_assign_rhs1 (stmt);
       else if (is_gimple_assign (stmt)
 	       && gimple_assign_rhs_code (stmt) == POINTER_PLUS_EXPR
-	       && TREE_CODE (gimple_assign_rhs2 (stmt)) == INTEGER_CST)
+	       && ptrdiff_tree_p (gimple_assign_rhs2 (stmt), &extra_offset))
 	{
 	  ptr = gimple_assign_rhs1 (stmt);
-	  extra_offset = BITS_PER_UNIT
-			 * int_cst_value (gimple_assign_rhs2 (stmt));
+	  extra_offset *= BITS_PER_UNIT;
 	}
     }
 
@@ -717,8 +721,8 @@ ao_ref_init_from_ptr_and_size (ao_ref *r
     }
   ref->offset += extra_offset;
   if (size
-      && tree_fits_shwi_p (size)
-      && (size_hwi = tree_to_shwi (size)) <= HOST_WIDE_INT_MAX / BITS_PER_UNIT)
+      && poly_int_tree_p (size, &size_hwi)
+      && coeffs_in_range_p (size_hwi, 0, HOST_WIDE_INT_MAX / BITS_PER_UNIT))
     ref->max_size = ref->size = size_hwi * BITS_PER_UNIT;
   else
     ref->max_size = ref->size = -1;
@@ -779,11 +783,11 @@ same_type_for_tbaa (tree type1, tree typ
 aliasing_component_refs_p (tree ref1,
 			   alias_set_type ref1_alias_set,
 			   alias_set_type base1_alias_set,
-			   HOST_WIDE_INT offset1, HOST_WIDE_INT max_size1,
+			   poly_int64 offset1, poly_int64 max_size1,
 			   tree ref2,
 			   alias_set_type ref2_alias_set,
 			   alias_set_type base2_alias_set,
-			   HOST_WIDE_INT offset2, HOST_WIDE_INT max_size2,
+			   poly_int64 offset2, poly_int64 max_size2,
 			   bool ref2_is_decl)
 {
   /* If one reference is a component references through pointers try to find a
@@ -825,7 +829,7 @@ aliasing_component_refs_p (tree ref1,
       offset2 -= offadj;
       get_ref_base_and_extent (base1, &offadj, &sztmp, &msztmp, &reverse);
       offset1 -= offadj;
-      return ranges_overlap_p (offset1, max_size1, offset2, max_size2);
+      return ranges_may_overlap_p (offset1, max_size1, offset2, max_size2);
     }
   /* If we didn't find a common base, try the other way around.  */
   refp = &ref1;
@@ -844,7 +848,7 @@ aliasing_component_refs_p (tree ref1,
       offset1 -= offadj;
       get_ref_base_and_extent (base2, &offadj, &sztmp, &msztmp, &reverse);
       offset2 -= offadj;
-      return ranges_overlap_p (offset1, max_size1, offset2, max_size2);
+      return ranges_may_overlap_p (offset1, max_size1, offset2, max_size2);
     }
 
   /* If we have two type access paths B1.path1 and B2.path2 they may
@@ -1090,9 +1094,9 @@ nonoverlapping_component_refs_p (const_t
 
 static bool
 decl_refs_may_alias_p (tree ref1, tree base1,
-		       HOST_WIDE_INT offset1, HOST_WIDE_INT max_size1,
+		       poly_int64 offset1, poly_int64 max_size1,
 		       tree ref2, tree base2,
-		       HOST_WIDE_INT offset2, HOST_WIDE_INT max_size2)
+		       poly_int64 offset2, poly_int64 max_size2)
 {
   gcc_checking_assert (DECL_P (base1) && DECL_P (base2));
 
@@ -1102,7 +1106,7 @@ decl_refs_may_alias_p (tree ref1, tree b
 
   /* If both references are based on the same variable, they cannot alias if
      the accesses do not overlap.  */
-  if (!ranges_overlap_p (offset1, max_size1, offset2, max_size2))
+  if (!ranges_may_overlap_p (offset1, max_size1, offset2, max_size2))
     return false;
 
   /* For components with variable position, the above test isn't sufficient,
@@ -1124,12 +1128,11 @@ decl_refs_may_alias_p (tree ref1, tree b
 
 static bool
 indirect_ref_may_alias_decl_p (tree ref1 ATTRIBUTE_UNUSED, tree base1,
-			       HOST_WIDE_INT offset1,
-			       HOST_WIDE_INT max_size1 ATTRIBUTE_UNUSED,
+			       poly_int64 offset1, poly_int64 max_size1,
 			       alias_set_type ref1_alias_set,
 			       alias_set_type base1_alias_set,
 			       tree ref2 ATTRIBUTE_UNUSED, tree base2,
-			       HOST_WIDE_INT offset2, HOST_WIDE_INT max_size2,
+			       poly_int64 offset2, poly_int64 max_size2,
 			       alias_set_type ref2_alias_set,
 			       alias_set_type base2_alias_set, bool tbaa_p)
 {
@@ -1185,14 +1188,15 @@ indirect_ref_may_alias_decl_p (tree ref1
      is bigger than the size of the decl we can't possibly access the
      decl via that pointer.  */
   if (DECL_SIZE (base2) && COMPLETE_TYPE_P (TREE_TYPE (ptrtype1))
-      && TREE_CODE (DECL_SIZE (base2)) == INTEGER_CST
-      && TREE_CODE (TYPE_SIZE (TREE_TYPE (ptrtype1))) == INTEGER_CST
+      && poly_int_tree_p (DECL_SIZE (base2))
+      && poly_int_tree_p (TYPE_SIZE (TREE_TYPE (ptrtype1)))
       /* ???  This in turn may run afoul when a decl of type T which is
 	 a member of union type U is accessed through a pointer to
 	 type U and sizeof T is smaller than sizeof U.  */
       && TREE_CODE (TREE_TYPE (ptrtype1)) != UNION_TYPE
       && TREE_CODE (TREE_TYPE (ptrtype1)) != QUAL_UNION_TYPE
-      && tree_int_cst_lt (DECL_SIZE (base2), TYPE_SIZE (TREE_TYPE (ptrtype1))))
+      && must_lt (wi::to_poly_widest (DECL_SIZE (base2)),
+		  wi::to_poly_widest (TYPE_SIZE (TREE_TYPE (ptrtype1)))))
     return false;
 
   if (!ref2)
@@ -1203,8 +1207,8 @@ indirect_ref_may_alias_decl_p (tree ref1
   dbase2 = ref2;
   while (handled_component_p (dbase2))
     dbase2 = TREE_OPERAND (dbase2, 0);
-  HOST_WIDE_INT doffset1 = offset1;
-  offset_int doffset2 = offset2;
+  poly_int64 doffset1 = offset1;
+  poly_offset_int doffset2 = offset2;
   if (TREE_CODE (dbase2) == MEM_REF
       || TREE_CODE (dbase2) == TARGET_MEM_REF)
     doffset2 -= mem_ref_offset (dbase2) << LOG2_BITS_PER_UNIT;
@@ -1252,11 +1256,11 @@ indirect_ref_may_alias_decl_p (tree ref1
 
 static bool
 indirect_refs_may_alias_p (tree ref1 ATTRIBUTE_UNUSED, tree base1,
-			   HOST_WIDE_INT offset1, HOST_WIDE_INT max_size1,
+			   poly_int64 offset1, poly_int64 max_size1,
 			   alias_set_type ref1_alias_set,
 			   alias_set_type base1_alias_set,
 			   tree ref2 ATTRIBUTE_UNUSED, tree base2,
-			   HOST_WIDE_INT offset2, HOST_WIDE_INT max_size2,
+			   poly_int64 offset2, poly_int64 max_size2,
 			   alias_set_type ref2_alias_set,
 			   alias_set_type base2_alias_set, bool tbaa_p)
 {
@@ -1330,7 +1334,7 @@ indirect_refs_may_alias_p (tree ref1 ATT
       /* But avoid treating arrays as "objects", instead assume they
          can overlap by an exact multiple of their element size.  */
       && TREE_CODE (TREE_TYPE (ptrtype1)) != ARRAY_TYPE)
-    return ranges_overlap_p (offset1, max_size1, offset2, max_size2);
+    return ranges_may_overlap_p (offset1, max_size1, offset2, max_size2);
 
   /* Do type-based disambiguation.  */
   if (base1_alias_set != base2_alias_set
@@ -1365,8 +1369,8 @@ indirect_refs_may_alias_p (tree ref1 ATT
 refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p)
 {
   tree base1, base2;
-  HOST_WIDE_INT offset1 = 0, offset2 = 0;
-  HOST_WIDE_INT max_size1 = -1, max_size2 = -1;
+  poly_int64 offset1 = 0, offset2 = 0;
+  poly_int64 max_size1 = -1, max_size2 = -1;
   bool var1_p, var2_p, ind1_p, ind2_p;
 
   gcc_checking_assert ((!ref1->ref
@@ -2444,14 +2448,17 @@ stmt_kills_ref_p (gimple *stmt, ao_ref *
          handling constant offset and size.  */
       /* For a must-alias check we need to be able to constrain
 	 the access properly.  */
-      if (ref->max_size == -1)
+      if (!ref->max_size_known_p ())
 	return false;
-      HOST_WIDE_INT size, offset, max_size, ref_offset = ref->offset;
+      HOST_WIDE_INT size, max_size, const_offset;
+      poly_int64 ref_offset = ref->offset;
       bool reverse;
       tree base
-	= get_ref_base_and_extent (lhs, &offset, &size, &max_size, &reverse);
+	= get_ref_base_and_extent (lhs, &const_offset, &size, &max_size,
+				   &reverse);
       /* We can get MEM[symbol: sZ, index: D.8862_1] here,
 	 so base == ref->base does not always hold.  */
+      poly_int64 offset = const_offset;
       if (base != ref->base)
 	{
 	  /* Try using points-to info.  */
@@ -2468,18 +2475,13 @@ stmt_kills_ref_p (gimple *stmt, ao_ref *
 	      if (!tree_int_cst_equal (TREE_OPERAND (base, 1),
 				       TREE_OPERAND (ref->base, 1)))
 		{
-		  offset_int off1 = mem_ref_offset (base);
+		  poly_offset_int off1 = mem_ref_offset (base);
 		  off1 <<= LOG2_BITS_PER_UNIT;
 		  off1 += offset;
-		  offset_int off2 = mem_ref_offset (ref->base);
+		  poly_offset_int off2 = mem_ref_offset (ref->base);
 		  off2 <<= LOG2_BITS_PER_UNIT;
 		  off2 += ref_offset;
-		  if (wi::fits_shwi_p (off1) && wi::fits_shwi_p (off2))
-		    {
-		      offset = off1.to_shwi ();
-		      ref_offset = off2.to_shwi ();
-		    }
-		  else
+		  if (!off1.to_shwi (&offset) || !off2.to_shwi (&ref_offset))
 		    size = -1;
 		}
 	    }
@@ -2488,12 +2490,9 @@ stmt_kills_ref_p (gimple *stmt, ao_ref *
 	}
       /* For a must-alias check we need to be able to constrain
 	 the access properly.  */
-      if (size != -1 && size == max_size)
-	{
-	  if (offset <= ref_offset
-	      && offset + size >= ref_offset + ref->max_size)
-	    return true;
-	}
+      if (size == max_size
+	  && known_subrange_p (ref_offset, ref->max_size, offset, size))
+	return true;
     }
 
   if (is_gimple_call (stmt))
@@ -2526,19 +2525,19 @@ stmt_kills_ref_p (gimple *stmt, ao_ref *
 	    {
 	      /* For a must-alias check we need to be able to constrain
 		 the access properly.  */
-	      if (ref->max_size == -1)
+	      if (!ref->max_size_known_p ())
 		return false;
 	      tree dest = gimple_call_arg (stmt, 0);
 	      tree len = gimple_call_arg (stmt, 2);
-	      if (!tree_fits_shwi_p (len))
+	      if (!poly_int_tree_p (len))
 		return false;
 	      tree rbase = ref->base;
-	      offset_int roffset = ref->offset;
+	      poly_offset_int roffset = ref->offset;
 	      ao_ref dref;
 	      ao_ref_init_from_ptr_and_size (&dref, dest, len);
 	      tree base = ao_ref_base (&dref);
-	      offset_int offset = dref.offset;
-	      if (!base || dref.size == -1)
+	      poly_offset_int offset = dref.offset;
+	      if (!base || !known_size_p (dref.size))
 		return false;
 	      if (TREE_CODE (base) == MEM_REF)
 		{
@@ -2551,9 +2550,9 @@ stmt_kills_ref_p (gimple *stmt, ao_ref *
 		  rbase = TREE_OPERAND (rbase, 0);
 		}
 	      if (base == rbase
-		  && offset <= roffset
-		  && (roffset + ref->max_size
-		      <= offset + (wi::to_offset (len) << LOG2_BITS_PER_UNIT)))
+		  && known_subrange_p (roffset, ref->max_size, offset,
+				       wi::to_poly_offset (len)
+				       << LOG2_BITS_PER_UNIT))
 		return true;
 	      break;
 	    }
Index: gcc/alias.c
===================================================================
--- gcc/alias.c	2017-10-23 16:52:20.058356365 +0100
+++ gcc/alias.c	2017-10-23 17:01:52.303181137 +0100
@@ -331,9 +331,9 @@ ao_ref_from_mem (ao_ref *ref, const_rtx
   /* If MEM_OFFSET/MEM_SIZE get us outside of ref->offset/ref->max_size
      drop ref->ref.  */
   if (MEM_OFFSET (mem) < 0
-      || (ref->max_size != -1
-	  && ((MEM_OFFSET (mem) + MEM_SIZE (mem)) * BITS_PER_UNIT
-	      > ref->max_size)))
+      || (ref->max_size_known_p ()
+	  && may_gt ((MEM_OFFSET (mem) + MEM_SIZE (mem)) * BITS_PER_UNIT,
+		     ref->max_size)))
     ref->ref = NULL_TREE;
 
   /* Refine size and offset we got from analyzing MEM_EXPR by using
@@ -344,19 +344,18 @@ ao_ref_from_mem (ao_ref *ref, const_rtx
 
   /* The MEM may extend into adjacent fields, so adjust max_size if
      necessary.  */
-  if (ref->max_size != -1
-      && ref->size > ref->max_size)
-    ref->max_size = ref->size;
+  if (ref->max_size_known_p ())
+    ref->max_size = upper_bound (ref->max_size, ref->size);
 
-  /* If MEM_OFFSET and MEM_SIZE get us outside of the base object of
+  /* If MEM_OFFSET and MEM_SIZE might get us outside of the base object of
      the MEM_EXPR punt.  This happens for STRICT_ALIGNMENT targets a lot.  */
   if (MEM_EXPR (mem) != get_spill_slot_decl (false)
-      && (ref->offset < 0
+      && (may_lt (ref->offset, 0)
 	  || (DECL_P (ref->base)
 	      && (DECL_SIZE (ref->base) == NULL_TREE
-		  || TREE_CODE (DECL_SIZE (ref->base)) != INTEGER_CST
-		  || wi::ltu_p (wi::to_offset (DECL_SIZE (ref->base)),
-				ref->offset + ref->size)))))
+		  || !poly_int_tree_p (DECL_SIZE (ref->base))
+		  || may_lt (wi::to_poly_offset (DECL_SIZE (ref->base)),
+			     ref->offset + ref->size)))))
     return false;
 
   return true;
Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c	2017-10-23 16:52:20.058356365 +0100
+++ gcc/tree-ssa-dce.c	2017-10-23 17:01:52.304179714 +0100
@@ -488,13 +488,9 @@ mark_aliased_reaching_defs_necessary_1 (
 	{
 	  /* For a must-alias check we need to be able to constrain
 	     the accesses properly.  */
-	  if (size != -1 && size == max_size
-	      && ref->max_size != -1)
-	    {
-	      if (offset <= ref->offset
-		  && offset + size >= ref->offset + ref->max_size)
-		return true;
-	    }
+	  if (size == max_size
+	      && known_subrange_p (ref->offset, ref->max_size, offset, size))
+	    return true;
 	  /* Or they need to be exactly the same.  */
 	  else if (ref->ref
 		   /* Make sure there is no induction variable involved
Index: gcc/tree-ssa-dse.c
===================================================================
--- gcc/tree-ssa-dse.c	2017-10-23 16:52:20.058356365 +0100
+++ gcc/tree-ssa-dse.c	2017-10-23 17:01:52.304179714 +0100
@@ -128,13 +128,12 @@ initialize_ao_ref_for_dse (gimple *stmt,
 valid_ao_ref_for_dse (ao_ref *ref)
 {
   return (ao_ref_base (ref)
-	  && ref->max_size != -1
-	  && ref->size != 0
-	  && ref->max_size == ref->size
-	  && ref->offset >= 0
-	  && (ref->offset % BITS_PER_UNIT) == 0
-	  && (ref->size % BITS_PER_UNIT) == 0
-	  && (ref->size != -1));
+	  && known_size_p (ref->max_size)
+	  && maybe_nonzero (ref->size)
+	  && must_eq (ref->max_size, ref->size)
+	  && must_ge (ref->offset, 0)
+	  && multiple_p (ref->offset, BITS_PER_UNIT)
+	  && multiple_p (ref->size, BITS_PER_UNIT));
 }
 
 /* Try to normalize COPY (an ao_ref) relative to REF.  Essentially when we are
@@ -144,25 +143,31 @@ valid_ao_ref_for_dse (ao_ref *ref)
 static bool
 normalize_ref (ao_ref *copy, ao_ref *ref)
 {
+  if (!ordered_p (copy->offset, ref->offset))
+    return false;
+
   /* If COPY starts before REF, then reset the beginning of
      COPY to match REF and decrease the size of COPY by the
      number of bytes removed from COPY.  */
-  if (copy->offset < ref->offset)
+  if (may_lt (copy->offset, ref->offset))
     {
-      HOST_WIDE_INT diff = ref->offset - copy->offset;
-      if (copy->size <= diff)
+      poly_int64 diff = ref->offset - copy->offset;
+      if (may_le (copy->size, diff))
 	return false;
       copy->size -= diff;
       copy->offset = ref->offset;
     }
 
-  HOST_WIDE_INT diff = copy->offset - ref->offset;
-  if (ref->size <= diff)
+  poly_int64 diff = copy->offset - ref->offset;
+  if (may_le (ref->size, diff))
     return false;
 
   /* If COPY extends beyond REF, chop off its size appropriately.  */
-  HOST_WIDE_INT limit = ref->size - diff;
-  if (copy->size > limit)
+  poly_int64 limit = ref->size - diff;
+  if (!ordered_p (limit, copy->size))
+    return false;
+
+  if (may_gt (copy->size, limit))
     copy->size = limit;
   return true;
 }
@@ -183,15 +188,15 @@ clear_bytes_written_by (sbitmap live_byt
 
   /* Verify we have the same base memory address, the write
      has a known size and overlaps with REF.  */
+  HOST_WIDE_INT start, size;
   if (valid_ao_ref_for_dse (&write)
       && operand_equal_p (write.base, ref->base, OEP_ADDRESS_OF)
-      && write.size == write.max_size
-      && normalize_ref (&write, ref))
-    {
-      HOST_WIDE_INT start = write.offset - ref->offset;
-      bitmap_clear_range (live_bytes, start / BITS_PER_UNIT,
-			  write.size / BITS_PER_UNIT);
-    }
+      && must_eq (write.size, write.max_size)
+      && normalize_ref (&write, ref)
+      && (write.offset - ref->offset).is_constant (&start)
+      && write.size.is_constant (&size))
+    bitmap_clear_range (live_bytes, start / BITS_PER_UNIT,
+			size / BITS_PER_UNIT);
 }
 
 /* REF is a memory write.  Extract relevant information from it and
@@ -201,12 +206,14 @@ clear_bytes_written_by (sbitmap live_byt
 static bool
 setup_live_bytes_from_ref (ao_ref *ref, sbitmap live_bytes)
 {
+  HOST_WIDE_INT const_size;
   if (valid_ao_ref_for_dse (ref)
-      && (ref->size / BITS_PER_UNIT
+      && ref->size.is_constant (&const_size)
+      && (const_size / BITS_PER_UNIT
 	  <= PARAM_VALUE (PARAM_DSE_MAX_OBJECT_SIZE)))
     {
       bitmap_clear (live_bytes);
-      bitmap_set_range (live_bytes, 0, ref->size / BITS_PER_UNIT);
+      bitmap_set_range (live_bytes, 0, const_size / BITS_PER_UNIT);
       return true;
     }
   return false;
@@ -231,9 +238,15 @@ compute_trims (ao_ref *ref, sbitmap live
      the REF to compute the trims.  */
 
   /* Now identify how much, if any of the tail we can chop off.  */
-  int last_orig = (ref->size / BITS_PER_UNIT) - 1;
-  int last_live = bitmap_last_set_bit (live);
-  *trim_tail = (last_orig - last_live) & ~0x1;
+  HOST_WIDE_INT const_size;
+  if (ref->size.is_constant (&const_size))
+    {
+      int last_orig = (const_size / BITS_PER_UNIT) - 1;
+      int last_live = bitmap_last_set_bit (live);
+      *trim_tail = (last_orig - last_live) & ~0x1;
+    }
+  else
+    *trim_tail = 0;
 
   /* Identify how much, if any of the head we can chop off.  */
   int first_orig = 0;
@@ -267,7 +280,7 @@ maybe_trim_complex_store (ao_ref *ref, s
      least half the size of the object to ensure we're trimming
      the entire real or imaginary half.  By writing things this
      way we avoid more O(n) bitmap operations.  */
-  if (trim_tail * 2 >= ref->size / BITS_PER_UNIT)
+  if (must_ge (trim_tail * 2 * BITS_PER_UNIT, ref->size))
     {
       /* TREE_REALPART is live */
       tree x = TREE_REALPART (gimple_assign_rhs1 (stmt));
@@ -276,7 +289,7 @@ maybe_trim_complex_store (ao_ref *ref, s
       gimple_assign_set_lhs (stmt, y);
       gimple_assign_set_rhs1 (stmt, x);
     }
-  else if (trim_head * 2 >= ref->size / BITS_PER_UNIT)
+  else if (must_ge (trim_head * 2 * BITS_PER_UNIT, ref->size))
     {
       /* TREE_IMAGPART is live */
       tree x = TREE_IMAGPART (gimple_assign_rhs1 (stmt));
@@ -326,7 +339,8 @@ maybe_trim_constructor_store (ao_ref *re
 	return;
 
       /* The number of bytes for the new constructor.  */
-      int count = (ref->size / BITS_PER_UNIT) - head_trim - tail_trim;
+      poly_int64 ref_bytes = exact_div (ref->size, BITS_PER_UNIT);
+      poly_int64 count = ref_bytes - head_trim - tail_trim;
 
       /* And the new type for the CONSTRUCTOR.  Essentially it's just
 	 a char array large enough to cover the non-trimmed parts of
@@ -483,15 +497,15 @@ live_bytes_read (ao_ref use_ref, ao_ref
 {
   /* We have already verified that USE_REF and REF hit the same object.
      Now verify that there's actually an overlap between USE_REF and REF.  */
-  if (normalize_ref (&use_ref, ref))
+  HOST_WIDE_INT start, size;
+  if (normalize_ref (&use_ref, ref)
+      && (use_ref.offset - ref->offset).is_constant (&start)
+      && use_ref.size.is_constant (&size))
     {
-      HOST_WIDE_INT start = use_ref.offset - ref->offset;
-      HOST_WIDE_INT size = use_ref.size;
-
       /* If USE_REF covers all of REF, then it will hit one or more
 	 live bytes.   This avoids useless iteration over the bitmap
 	 below.  */
-      if (start == 0 && size == ref->size)
+      if (start == 0 && must_eq (size, ref->size))
 	return true;
 
       /* Now check if any of the remaining bits in use_ref are set in LIVE.  */
@@ -592,8 +606,8 @@ dse_classify_store (ao_ref *ref, gimple
 		      ao_ref use_ref;
 		      ao_ref_init (&use_ref, gimple_assign_rhs1 (use_stmt));
 		      if (valid_ao_ref_for_dse (&use_ref)
-			  && use_ref.base == ref->base
-			  && use_ref.size == use_ref.max_size
+			  && must_eq (use_ref.base, ref->base)
+			  && must_eq (use_ref.size, use_ref.max_size)
 			  && !live_bytes_read (use_ref, ref, live_bytes))
 			{
 			  /* If this statement has a VDEF, then it is the
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	2017-10-23 16:52:20.058356365 +0100
+++ gcc/tree-ssa-sccvn.c	2017-10-23 17:01:52.305178291 +0100
@@ -547,7 +547,7 @@ vn_reference_compute_hash (const vn_refe
   hashval_t result;
   int i;
   vn_reference_op_t vro;
-  HOST_WIDE_INT off = -1;
+  poly_int64 off = -1;
   bool deref = false;
 
   FOR_EACH_VEC_ELT (vr1->operands, i, vro)
@@ -556,17 +556,17 @@ vn_reference_compute_hash (const vn_refe
 	deref = true;
       else if (vro->opcode != ADDR_EXPR)
 	deref = false;
-      if (vro->off != -1)
+      if (may_ne (vro->off, -1))
 	{
-	  if (off == -1)
+	  if (must_eq (off, -1))
 	    off = 0;
 	  off += vro->off;
 	}
       else
 	{
-	  if (off != -1
-	      && off != 0)
-	    hstate.add_int (off);
+	  if (may_ne (off, -1)
+	      && may_ne (off, 0))
+	    hstate.add_poly_int (off);
 	  off = -1;
 	  if (deref
 	      && vro->opcode == ADDR_EXPR)
@@ -632,7 +632,7 @@ vn_reference_eq (const_vn_reference_t co
   j = 0;
   do
     {
-      HOST_WIDE_INT off1 = 0, off2 = 0;
+      poly_int64 off1 = 0, off2 = 0;
       vn_reference_op_t vro1, vro2;
       vn_reference_op_s tem1, tem2;
       bool deref1 = false, deref2 = false;
@@ -643,7 +643,7 @@ vn_reference_eq (const_vn_reference_t co
 	  /* Do not look through a storage order barrier.  */
 	  else if (vro1->opcode == VIEW_CONVERT_EXPR && vro1->reverse)
 	    return false;
-	  if (vro1->off == -1)
+	  if (must_eq (vro1->off, -1))
 	    break;
 	  off1 += vro1->off;
 	}
@@ -654,11 +654,11 @@ vn_reference_eq (const_vn_reference_t co
 	  /* Do not look through a storage order barrier.  */
 	  else if (vro2->opcode == VIEW_CONVERT_EXPR && vro2->reverse)
 	    return false;
-	  if (vro2->off == -1)
+	  if (must_eq (vro2->off, -1))
 	    break;
 	  off2 += vro2->off;
 	}
-      if (off1 != off2)
+      if (may_ne (off1, off2))
 	return false;
       if (deref1 && vro1->opcode == ADDR_EXPR)
 	{
@@ -784,24 +784,23 @@ copy_reference_ops_from_ref (tree ref, v
 	  {
 	    tree this_offset = component_ref_field_offset (ref);
 	    if (this_offset
-		&& TREE_CODE (this_offset) == INTEGER_CST)
+		&& poly_int_tree_p (this_offset))
 	      {
 		tree bit_offset = DECL_FIELD_BIT_OFFSET (TREE_OPERAND (ref, 1));
 		if (TREE_INT_CST_LOW (bit_offset) % BITS_PER_UNIT == 0)
 		  {
-		    offset_int off
-		      = (wi::to_offset (this_offset)
+		    poly_offset_int off
+		      = (wi::to_poly_offset (this_offset)
 			 + (wi::to_offset (bit_offset) >> LOG2_BITS_PER_UNIT));
-		    if (wi::fits_shwi_p (off)
-			/* Probibit value-numbering zero offset components
-			   of addresses the same before the pass folding
-			   __builtin_object_size had a chance to run
-			   (checking cfun->after_inlining does the
-			   trick here).  */
-			&& (TREE_CODE (orig) != ADDR_EXPR
-			    || off != 0
-			    || cfun->after_inlining))
-		      temp.off = off.to_shwi ();
+		    /* Probibit value-numbering zero offset components
+		       of addresses the same before the pass folding
+		       __builtin_object_size had a chance to run
+		       (checking cfun->after_inlining does the
+		       trick here).  */
+		    if (TREE_CODE (orig) != ADDR_EXPR
+			|| maybe_nonzero (off)
+			|| cfun->after_inlining)
+		      off.to_shwi (&temp.off);
 		  }
 	      }
 	  }
@@ -820,16 +819,15 @@ copy_reference_ops_from_ref (tree ref, v
 	    if (! temp.op2)
 	      temp.op2 = size_binop (EXACT_DIV_EXPR, TYPE_SIZE_UNIT (eltype),
 				     size_int (TYPE_ALIGN_UNIT (eltype)));
-	    if (TREE_CODE (temp.op0) == INTEGER_CST
-		&& TREE_CODE (temp.op1) == INTEGER_CST
+	    if (poly_int_tree_p (temp.op0)
+		&& poly_int_tree_p (temp.op1)
 		&& TREE_CODE (temp.op2) == INTEGER_CST)
 	      {
-		offset_int off = ((wi::to_offset (temp.op0)
-				   - wi::to_offset (temp.op1))
-				  * wi::to_offset (temp.op2)
-				  * vn_ref_op_align_unit (&temp));
-		if (wi::fits_shwi_p (off))
-		  temp.off = off.to_shwi();
+		poly_offset_int off = ((wi::to_poly_offset (temp.op0)
+					- wi::to_poly_offset (temp.op1))
+				       * wi::to_offset (temp.op2)
+				       * vn_ref_op_align_unit (&temp));
+		off.to_shwi (&temp.off);
 	      }
 	  }
 	  break;
@@ -918,9 +916,9 @@ ao_ref_init_from_vn_reference (ao_ref *r
   unsigned i;
   tree base = NULL_TREE;
   tree *op0_p = &base;
-  offset_int offset = 0;
-  offset_int max_size;
-  offset_int size = -1;
+  poly_offset_int offset = 0;
+  poly_offset_int max_size;
+  poly_offset_int size = -1;
   tree size_tree = NULL_TREE;
   alias_set_type base_alias_set = -1;
 
@@ -936,11 +934,11 @@ ao_ref_init_from_vn_reference (ao_ref *r
       if (mode == BLKmode)
 	size_tree = TYPE_SIZE (type);
       else
-	size = int (GET_MODE_BITSIZE (mode));
+	size = GET_MODE_BITSIZE (mode);
     }
   if (size_tree != NULL_TREE
-      && TREE_CODE (size_tree) == INTEGER_CST)
-    size = wi::to_offset (size_tree);
+      && poly_int_tree_p (size_tree))
+    size = wi::to_poly_offset (size_tree);
 
   /* Initially, maxsize is the same as the accessed element size.
      In the following it will only grow (or become -1).  */
@@ -963,7 +961,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
 	    {
 	      vn_reference_op_t pop = &ops[i-1];
 	      base = TREE_OPERAND (op->op0, 0);
-	      if (pop->off == -1)
+	      if (must_eq (pop->off, -1))
 		{
 		  max_size = -1;
 		  offset = 0;
@@ -1008,12 +1006,12 @@ ao_ref_init_from_vn_reference (ao_ref *r
 	       parts manually.  */
 	    tree this_offset = DECL_FIELD_OFFSET (field);
 
-	    if (op->op1 || TREE_CODE (this_offset) != INTEGER_CST)
+	    if (op->op1 || !poly_int_tree_p (this_offset))
 	      max_size = -1;
 	    else
 	      {
-		offset_int woffset = (wi::to_offset (this_offset)
-				      << LOG2_BITS_PER_UNIT);
+		poly_offset_int woffset = (wi::to_poly_offset (this_offset)
+					   << LOG2_BITS_PER_UNIT);
 		woffset += wi::to_offset (DECL_FIELD_BIT_OFFSET (field));
 		offset += woffset;
 	      }
@@ -1023,14 +1021,15 @@ ao_ref_init_from_vn_reference (ao_ref *r
 	case ARRAY_RANGE_REF:
 	case ARRAY_REF:
 	  /* We recorded the lower bound and the element size.  */
-	  if (TREE_CODE (op->op0) != INTEGER_CST
-	      || TREE_CODE (op->op1) != INTEGER_CST
+	  if (!poly_int_tree_p (op->op0)
+	      || !poly_int_tree_p (op->op1)
 	      || TREE_CODE (op->op2) != INTEGER_CST)
 	    max_size = -1;
 	  else
 	    {
-	      offset_int woffset
-		= wi::sext (wi::to_offset (op->op0) - wi::to_offset (op->op1),
+	      poly_offset_int woffset
+		= wi::sext (wi::to_poly_offset (op->op0)
+			    - wi::to_poly_offset (op->op1),
 			    TYPE_PRECISION (TREE_TYPE (op->op0)));
 	      woffset *= wi::to_offset (op->op2) * vn_ref_op_align_unit (op);
 	      woffset <<= LOG2_BITS_PER_UNIT;
@@ -1077,7 +1076,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
   /* We discount volatiles from value-numbering elsewhere.  */
   ref->volatile_p = false;
 
-  if (!wi::fits_shwi_p (size) || wi::neg_p (size))
+  if (!size.to_shwi (&ref->size) || may_lt (ref->size, 0))
     {
       ref->offset = 0;
       ref->size = -1;
@@ -1085,21 +1084,15 @@ ao_ref_init_from_vn_reference (ao_ref *r
       return true;
     }
 
-  ref->size = size.to_shwi ();
-
-  if (!wi::fits_shwi_p (offset))
+  if (!offset.to_shwi (&ref->offset))
     {
       ref->offset = 0;
       ref->max_size = -1;
       return true;
     }
 
-  ref->offset = offset.to_shwi ();
-
-  if (!wi::fits_shwi_p (max_size) || wi::neg_p (max_size))
+  if (!max_size.to_shwi (&ref->max_size) || may_lt (ref->max_size, 0))
     ref->max_size = -1;
-  else
-    ref->max_size = max_size.to_shwi ();
 
   return true;
 }
@@ -1344,7 +1337,7 @@ fully_constant_vn_reference_p (vn_refere
 	   && (!INTEGRAL_TYPE_P (ref->type)
 	       || TYPE_PRECISION (ref->type) % BITS_PER_UNIT == 0))
     {
-      HOST_WIDE_INT off = 0;
+      poly_int64 off = 0;
       HOST_WIDE_INT size;
       if (INTEGRAL_TYPE_P (ref->type))
 	size = TYPE_PRECISION (ref->type);
@@ -1362,7 +1355,7 @@ fully_constant_vn_reference_p (vn_refere
 	      ++i;
 	      break;
 	    }
-	  if (operands[i].off == -1)
+	  if (must_eq (operands[i].off, -1))
 	    return NULL_TREE;
 	  off += operands[i].off;
 	  if (operands[i].opcode == MEM_REF)
@@ -1388,6 +1381,7 @@ fully_constant_vn_reference_p (vn_refere
 	return build_zero_cst (ref->type);
       else if (ctor != error_mark_node)
 	{
+	  HOST_WIDE_INT const_off;
 	  if (decl)
 	    {
 	      tree res = fold_ctor_reference (ref->type, ctor,
@@ -1400,10 +1394,10 @@ fully_constant_vn_reference_p (vn_refere
 		    return res;
 		}
 	    }
-	  else
+	  else if (off.is_constant (&const_off))
 	    {
 	      unsigned char buf[MAX_BITSIZE_MODE_ANY_MODE / BITS_PER_UNIT];
-	      int len = native_encode_expr (ctor, buf, size, off);
+	      int len = native_encode_expr (ctor, buf, size, const_off);
 	      if (len > 0)
 		return native_interpret_expr (ref->type, buf, len);
 	    }
@@ -1495,17 +1489,16 @@ valueize_refs_1 (vec<vn_reference_op_s>
       /* If it transforms a non-constant ARRAY_REF into a constant
 	 one, adjust the constant offset.  */
       else if (vro->opcode == ARRAY_REF
-	       && vro->off == -1
-	       && TREE_CODE (vro->op0) == INTEGER_CST
-	       && TREE_CODE (vro->op1) == INTEGER_CST
+	       && must_eq (vro->off, -1)
+	       && poly_int_tree_p (vro->op0)
+	       && poly_int_tree_p (vro->op1)
 	       && TREE_CODE (vro->op2) == INTEGER_CST)
 	{
-	  offset_int off = ((wi::to_offset (vro->op0)
-			     - wi::to_offset (vro->op1))
-			    * wi::to_offset (vro->op2)
-			    * vn_ref_op_align_unit (vro));
-	  if (wi::fits_shwi_p (off))
-	    vro->off = off.to_shwi ();
+	  poly_offset_int off = ((wi::to_poly_offset (vro->op0)
+				  - wi::to_poly_offset (vro->op1))
+				 * wi::to_offset (vro->op2)
+				 * vn_ref_op_align_unit (vro));
+	  off.to_shwi (&vro->off);
 	}
     }
 
@@ -1821,10 +1814,11 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   vn_reference_t vr = (vn_reference_t)vr_;
   gimple *def_stmt = SSA_NAME_DEF_STMT (vuse);
   tree base = ao_ref_base (ref);
-  HOST_WIDE_INT offset, maxsize;
+  HOST_WIDE_INT offseti, maxsizei;
   static vec<vn_reference_op_s> lhs_ops;
   ao_ref lhs_ref;
   bool lhs_ref_ok = false;
+  poly_int64 copy_size;
 
   /* If the reference is based on a parameter that was determined as
      pointing to readonly memory it doesn't change.  */
@@ -1903,14 +1897,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   if (*disambiguate_only)
     return (void *)-1;
 
-  offset = ref->offset;
-  maxsize = ref->max_size;
-
   /* If we cannot constrain the size of the reference we cannot
      test if anything kills it.  */
-  if (maxsize == -1)
+  if (!ref->max_size_known_p ())
     return (void *)-1;
 
+  poly_int64 offset = ref->offset;
+  poly_int64 maxsize = ref->max_size;
+
   /* We can't deduce anything useful from clobbers.  */
   if (gimple_clobber_p (def_stmt))
     return (void *)-1;
@@ -1921,7 +1915,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
   if (is_gimple_reg_type (vr->type)
       && gimple_call_builtin_p (def_stmt, BUILT_IN_MEMSET)
       && integer_zerop (gimple_call_arg (def_stmt, 1))
-      && tree_fits_uhwi_p (gimple_call_arg (def_stmt, 2))
+      && poly_int_tree_p (gimple_call_arg (def_stmt, 2))
       && TREE_CODE (gimple_call_arg (def_stmt, 0)) == ADDR_EXPR)
     {
       tree ref2 = TREE_OPERAND (gimple_call_arg (def_stmt, 0), 0);
@@ -1930,13 +1924,11 @@ vn_reference_lookup_3 (ao_ref *ref, tree
       bool reverse;
       base2 = get_ref_base_and_extent (ref2, &offset2, &size2, &maxsize2,
 				       &reverse);
-      size2 = tree_to_uhwi (gimple_call_arg (def_stmt, 2)) * 8;
-      if ((unsigned HOST_WIDE_INT)size2 / 8
-	  == tree_to_uhwi (gimple_call_arg (def_stmt, 2))
-	  && maxsize2 != -1
+      tree len = gimple_call_arg (def_stmt, 2);
+      if (known_size_p (maxsize2)
 	  && operand_equal_p (base, base2, 0)
-	  && offset2 <= offset
-	  && offset2 + size2 >= offset + maxsize)
+	  && known_subrange_p (offset, maxsize, offset2,
+			       wi::to_poly_offset (len) << LOG2_BITS_PER_UNIT))
 	{
 	  tree val = build_zero_cst (vr->type);
 	  return vn_reference_lookup_or_insert_for_pieces
@@ -1955,10 +1947,9 @@ vn_reference_lookup_3 (ao_ref *ref, tree
       bool reverse;
       base2 = get_ref_base_and_extent (gimple_assign_lhs (def_stmt),
 				       &offset2, &size2, &maxsize2, &reverse);
-      if (maxsize2 != -1
+      if (known_size_p (maxsize2)
 	  && operand_equal_p (base, base2, 0)
-	  && offset2 <= offset
-	  && offset2 + size2 >= offset + maxsize)
+	  && known_subrange_p (offset, maxsize, offset2, size2))
 	{
 	  tree val = build_zero_cst (vr->type);
 	  return vn_reference_lookup_or_insert_for_pieces
@@ -1968,13 +1959,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 
   /* 3) Assignment from a constant.  We can use folds native encode/interpret
      routines to extract the assigned bits.  */
-  else if (ref->size == maxsize
+  else if (must_eq (ref->size, maxsize)
 	   && is_gimple_reg_type (vr->type)
 	   && !contains_storage_order_barrier_p (vr->operands)
 	   && gimple_assign_single_p (def_stmt)
 	   && CHAR_BIT == 8 && BITS_PER_UNIT == 8
-	   && maxsize % BITS_PER_UNIT == 0
-	   && offset % BITS_PER_UNIT == 0
+	   /* native_encode and native_decode operate on arrays of bytes
+	      and so fundamentally need a compile-time size and offset.  */
+	   && maxsize.is_constant (&maxsizei)
+	   && maxsizei % BITS_PER_UNIT == 0
+	   && offset.is_constant (&offseti)
+	   && offseti % BITS_PER_UNIT == 0
 	   && (is_gimple_min_invariant (gimple_assign_rhs1 (def_stmt))
 	       || (TREE_CODE (gimple_assign_rhs1 (def_stmt)) == SSA_NAME
 		   && is_gimple_min_invariant (SSA_VAL (gimple_assign_rhs1 (def_stmt))))))
@@ -1990,8 +1985,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	  && size2 % BITS_PER_UNIT == 0
 	  && offset2 % BITS_PER_UNIT == 0
 	  && operand_equal_p (base, base2, 0)
-	  && offset2 <= offset
-	  && offset2 + size2 >= offset + maxsize)
+	  && known_subrange_p (offseti, maxsizei, offset2, size2))
 	{
 	  /* We support up to 512-bit values (for V8DFmode).  */
 	  unsigned char buffer[64];
@@ -2008,14 +2002,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	      /* Make sure to interpret in a type that has a range
 	         covering the whole access size.  */
 	      if (INTEGRAL_TYPE_P (vr->type)
-		  && ref->size != TYPE_PRECISION (vr->type))
-		type = build_nonstandard_integer_type (ref->size,
+		  && maxsizei != TYPE_PRECISION (vr->type))
+		type = build_nonstandard_integer_type (maxsizei,
 						       TYPE_UNSIGNED (type));
 	      tree val = native_interpret_expr (type,
 						buffer
-						+ ((offset - offset2)
+						+ ((offseti - offset2)
 						   / BITS_PER_UNIT),
-						ref->size / BITS_PER_UNIT);
+						maxsizei / BITS_PER_UNIT);
 	      /* If we chop off bits because the types precision doesn't
 		 match the memory access size this is ok when optimizing
 		 reads but not when called from the DSE code during
@@ -2038,7 +2032,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 
   /* 4) Assignment from an SSA name which definition we may be able
      to access pieces from.  */
-  else if (ref->size == maxsize
+  else if (must_eq (ref->size, maxsize)
 	   && is_gimple_reg_type (vr->type)
 	   && !contains_storage_order_barrier_p (vr->operands)
 	   && gimple_assign_single_p (def_stmt)
@@ -2054,15 +2048,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	  && maxsize2 != -1
 	  && maxsize2 == size2
 	  && operand_equal_p (base, base2, 0)
-	  && offset2 <= offset
-	  && offset2 + size2 >= offset + maxsize
+	  && known_subrange_p (offset, maxsize, offset2, size2)
 	  /* ???  We can't handle bitfield precision extracts without
 	     either using an alternate type for the BIT_FIELD_REF and
 	     then doing a conversion or possibly adjusting the offset
 	     according to endianness.  */
 	  && (! INTEGRAL_TYPE_P (vr->type)
-	      || ref->size == TYPE_PRECISION (vr->type))
-	  && ref->size % BITS_PER_UNIT == 0)
+	      || must_eq (ref->size, TYPE_PRECISION (vr->type)))
+	  && multiple_p (ref->size, BITS_PER_UNIT))
 	{
 	  code_helper rcode = BIT_FIELD_REF;
 	  tree ops[3];
@@ -2090,7 +2083,6 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	       || handled_component_p (gimple_assign_rhs1 (def_stmt))))
     {
       tree base2;
-      HOST_WIDE_INT maxsize2;
       int i, j, k;
       auto_vec<vn_reference_op_s> rhs;
       vn_reference_op_t vro;
@@ -2101,8 +2093,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 
       /* See if the assignment kills REF.  */
       base2 = ao_ref_base (&lhs_ref);
-      maxsize2 = lhs_ref.max_size;
-      if (maxsize2 == -1
+      if (!lhs_ref.max_size_known_p ()
 	  || (base != base2
 	      && (TREE_CODE (base) != MEM_REF
 		  || TREE_CODE (base2) != MEM_REF
@@ -2129,15 +2120,15 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	 may fail when comparing types for compatibility.  But we really
 	 don't care here - further lookups with the rewritten operands
 	 will simply fail if we messed up types too badly.  */
-      HOST_WIDE_INT extra_off = 0;
+      poly_int64 extra_off = 0;
       if (j == 0 && i >= 0
 	  && lhs_ops[0].opcode == MEM_REF
-	  && lhs_ops[0].off != -1)
+	  && may_ne (lhs_ops[0].off, -1))
 	{
-	  if (lhs_ops[0].off == vr->operands[i].off)
+	  if (must_eq (lhs_ops[0].off, vr->operands[i].off))
 	    i--, j--;
 	  else if (vr->operands[i].opcode == MEM_REF
-		   && vr->operands[i].off != -1)
+		   && may_ne (vr->operands[i].off, -1))
 	    {
 	      extra_off = vr->operands[i].off - lhs_ops[0].off;
 	      i--, j--;
@@ -2163,11 +2154,11 @@ vn_reference_lookup_3 (ao_ref *ref, tree
       copy_reference_ops_from_ref (gimple_assign_rhs1 (def_stmt), &rhs);
 
       /* Apply an extra offset to the inner MEM_REF of the RHS.  */
-      if (extra_off != 0)
+      if (maybe_nonzero (extra_off))
 	{
 	  if (rhs.length () < 2
 	      || rhs[0].opcode != MEM_REF
-	      || rhs[0].off == -1)
+	      || must_eq (rhs[0].off, -1))
 	    return (void *)-1;
 	  rhs[0].off += extra_off;
 	  rhs[0].op0 = int_const_binop (PLUS_EXPR, rhs[0].op0,
@@ -2198,7 +2189,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
       if (!ao_ref_init_from_vn_reference (&r, vr->set, vr->type, vr->operands))
 	return (void *)-1;
       /* This can happen with bitfields.  */
-      if (ref->size != r.size)
+      if (may_ne (ref->size, r.size))
 	return (void *)-1;
       *ref = r;
 
@@ -2221,20 +2212,20 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	       || TREE_CODE (gimple_call_arg (def_stmt, 0)) == SSA_NAME)
 	   && (TREE_CODE (gimple_call_arg (def_stmt, 1)) == ADDR_EXPR
 	       || TREE_CODE (gimple_call_arg (def_stmt, 1)) == SSA_NAME)
-	   && tree_fits_uhwi_p (gimple_call_arg (def_stmt, 2)))
+	   && poly_int_tree_p (gimple_call_arg (def_stmt, 2), &copy_size))
     {
       tree lhs, rhs;
       ao_ref r;
-      HOST_WIDE_INT rhs_offset, copy_size, lhs_offset;
+      poly_int64 rhs_offset, lhs_offset;
       vn_reference_op_s op;
-      HOST_WIDE_INT at;
+      poly_uint64 mem_offset;
+      poly_int64 at, byte_maxsize;
 
       /* Only handle non-variable, addressable refs.  */
-      if (ref->size != maxsize
-	  || offset % BITS_PER_UNIT != 0
-	  || ref->size % BITS_PER_UNIT != 0)
+      if (may_ne (ref->size, maxsize)
+	  || !multiple_p (offset, BITS_PER_UNIT, &at)
+	  || !multiple_p (maxsize, BITS_PER_UNIT, &byte_maxsize))
 	return (void *)-1;
-      at = offset / BITS_PER_UNIT;
 
       /* Extract a pointer base and an offset for the destination.  */
       lhs = gimple_call_arg (def_stmt, 0);
@@ -2252,17 +2243,19 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	}
       if (TREE_CODE (lhs) == ADDR_EXPR)
 	{
+	  HOST_WIDE_INT tmp_lhs_offset;
 	  tree tem = get_addr_base_and_unit_offset (TREE_OPERAND (lhs, 0),
-						    &lhs_offset);
+						    &tmp_lhs_offset);
+	  lhs_offset = tmp_lhs_offset;
 	  if (!tem)
 	    return (void *)-1;
 	  if (TREE_CODE (tem) == MEM_REF
-	      && tree_fits_uhwi_p (TREE_OPERAND (tem, 1)))
+	      && poly_int_tree_p (TREE_OPERAND (tem, 1), &mem_offset))
 	    {
 	      lhs = TREE_OPERAND (tem, 0);
 	      if (TREE_CODE (lhs) == SSA_NAME)
 		lhs = SSA_VAL (lhs);
-	      lhs_offset += tree_to_uhwi (TREE_OPERAND (tem, 1));
+	      lhs_offset += mem_offset;
 	    }
 	  else if (DECL_P (tem))
 	    lhs = build_fold_addr_expr (tem);
@@ -2280,15 +2273,17 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	rhs = SSA_VAL (rhs);
       if (TREE_CODE (rhs) == ADDR_EXPR)
 	{
+	  HOST_WIDE_INT tmp_rhs_offset;
 	  tree tem = get_addr_base_and_unit_offset (TREE_OPERAND (rhs, 0),
-						    &rhs_offset);
+						    &tmp_rhs_offset);
+	  rhs_offset = tmp_rhs_offset;
 	  if (!tem)
 	    return (void *)-1;
 	  if (TREE_CODE (tem) == MEM_REF
-	      && tree_fits_uhwi_p (TREE_OPERAND (tem, 1)))
+	      && poly_int_tree_p (TREE_OPERAND (tem, 1), &mem_offset))
 	    {
 	      rhs = TREE_OPERAND (tem, 0);
-	      rhs_offset += tree_to_uhwi (TREE_OPERAND (tem, 1));
+	      rhs_offset += mem_offset;
 	    }
 	  else if (DECL_P (tem))
 	    rhs = build_fold_addr_expr (tem);
@@ -2299,15 +2294,13 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	  && TREE_CODE (rhs) != ADDR_EXPR)
 	return (void *)-1;
 
-      copy_size = tree_to_uhwi (gimple_call_arg (def_stmt, 2));
-
       /* The bases of the destination and the references have to agree.  */
       if (TREE_CODE (base) == MEM_REF)
 	{
 	  if (TREE_OPERAND (base, 0) != lhs
-	      || !tree_fits_uhwi_p (TREE_OPERAND (base, 1)))
+	      || !poly_int_tree_p (TREE_OPERAND (base, 1), &mem_offset))
 	    return (void *) -1;
-	  at += tree_to_uhwi (TREE_OPERAND (base, 1));
+	  at += mem_offset;
 	}
       else if (!DECL_P (base)
 	       || TREE_CODE (lhs) != ADDR_EXPR
@@ -2316,12 +2309,10 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 
       /* If the access is completely outside of the memcpy destination
 	 area there is no aliasing.  */
-      if (lhs_offset >= at + maxsize / BITS_PER_UNIT
-	  || lhs_offset + copy_size <= at)
+      if (!ranges_may_overlap_p (lhs_offset, copy_size, at, byte_maxsize))
 	return NULL;
       /* And the access has to be contained within the memcpy destination.  */
-      if (lhs_offset > at
-	  || lhs_offset + copy_size < at + maxsize / BITS_PER_UNIT)
+      if (!known_subrange_p (at, byte_maxsize, lhs_offset, copy_size))
 	return (void *)-1;
 
       /* Make room for 2 operands in the new reference.  */
@@ -2359,7 +2350,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
       if (!ao_ref_init_from_vn_reference (&r, vr->set, vr->type, vr->operands))
 	return (void *)-1;
       /* This can happen with bitfields.  */
-      if (ref->size != r.size)
+      if (may_ne (ref->size, r.size))
 	return (void *)-1;
       *ref = r;
 
Index: gcc/tree-ssa-uninit.c
===================================================================
--- gcc/tree-ssa-uninit.c	2017-10-23 16:52:20.058356365 +0100
+++ gcc/tree-ssa-uninit.c	2017-10-23 17:01:52.305178291 +0100
@@ -294,15 +294,15 @@ warn_uninitialized_vars (bool warn_possi
 
 	      /* Do not warn if the access is fully outside of the
 	         variable.  */
+	      poly_int64 decl_size;
 	      if (DECL_P (base)
-		  && ref.size != -1
-		  && ref.max_size == ref.size
-		  && (ref.offset + ref.size <= 0
-		      || (ref.offset >= 0
+		  && known_size_p (ref.size)
+		  && must_eq (ref.max_size, ref.size)
+		  && (must_le (ref.offset + ref.size, 0)
+		      || (must_ge (ref.offset, 0)
 			  && DECL_SIZE (base)
-			  && TREE_CODE (DECL_SIZE (base)) == INTEGER_CST
-			  && compare_tree_int (DECL_SIZE (base),
-					       ref.offset) <= 0)))
+			  && poly_int_tree_p (DECL_SIZE (base), &decl_size)
+			  && must_le (decl_size, ref.offset))))
 		continue;
 
 	      /* Do not warn if the access is then used for a BIT_INSERT_EXPR. */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [014/nnn] poly_int: indirect_refs_may_alias_p
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (13 preceding siblings ...)
  2017-10-23 17:06 ` [015/nnn] poly_int: ao_ref and vn_reference_op_t Richard Sandiford
@ 2017-10-23 17:06 ` Richard Sandiford
  2017-11-17 18:11   ` Jeff Law
  2017-10-23 17:07 ` [016/nnn] poly_int: dse.c Richard Sandiford
                   ` (92 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:06 UTC (permalink / raw)
  To: gcc-patches

This patch makes indirect_refs_may_alias_p use ranges_may_overlap_p
rather than ranges_overlap_p.  Unlike the former, the latter can handle
negative offsets, so the fix for PR44852 should no longer be necessary.
It can also handle offset_int, so avoids unchecked truncations to
HOST_WIDE_INT.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-ssa-alias.c (indirect_ref_may_alias_decl_p)
	(indirect_refs_may_alias_p): Use ranges_may_overlap_p
	instead of ranges_overlap_p.

Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c	2017-10-23 17:01:49.579064221 +0100
+++ gcc/tree-ssa-alias.c	2017-10-23 17:01:51.044974644 +0100
@@ -1135,23 +1135,13 @@ indirect_ref_may_alias_decl_p (tree ref1
 {
   tree ptr1;
   tree ptrtype1, dbase2;
-  HOST_WIDE_INT offset1p = offset1, offset2p = offset2;
-  HOST_WIDE_INT doffset1, doffset2;
 
   gcc_checking_assert ((TREE_CODE (base1) == MEM_REF
 			|| TREE_CODE (base1) == TARGET_MEM_REF)
 		       && DECL_P (base2));
 
   ptr1 = TREE_OPERAND (base1, 0);
-
-  /* The offset embedded in MEM_REFs can be negative.  Bias them
-     so that the resulting offset adjustment is positive.  */
-  offset_int moff = mem_ref_offset (base1);
-  moff <<= LOG2_BITS_PER_UNIT;
-  if (wi::neg_p (moff))
-    offset2p += (-moff).to_short_addr ();
-  else
-    offset1p += moff.to_short_addr ();
+  offset_int moff = mem_ref_offset (base1) << LOG2_BITS_PER_UNIT;
 
   /* If only one reference is based on a variable, they cannot alias if
      the pointer access is beyond the extent of the variable access.
@@ -1160,7 +1150,7 @@ indirect_ref_may_alias_decl_p (tree ref1
      ???  IVOPTs creates bases that do not honor this restriction,
      so do not apply this optimization for TARGET_MEM_REFs.  */
   if (TREE_CODE (base1) != TARGET_MEM_REF
-      && !ranges_overlap_p (MAX (0, offset1p), -1, offset2p, max_size2))
+      && !ranges_may_overlap_p (offset1 + moff, -1, offset2, max_size2))
     return false;
   /* They also cannot alias if the pointer may not point to the decl.  */
   if (!ptr_deref_may_alias_decl_p (ptr1, base2))
@@ -1213,18 +1203,11 @@ indirect_ref_may_alias_decl_p (tree ref1
   dbase2 = ref2;
   while (handled_component_p (dbase2))
     dbase2 = TREE_OPERAND (dbase2, 0);
-  doffset1 = offset1;
-  doffset2 = offset2;
+  HOST_WIDE_INT doffset1 = offset1;
+  offset_int doffset2 = offset2;
   if (TREE_CODE (dbase2) == MEM_REF
       || TREE_CODE (dbase2) == TARGET_MEM_REF)
-    {
-      offset_int moff = mem_ref_offset (dbase2);
-      moff <<= LOG2_BITS_PER_UNIT;
-      if (wi::neg_p (moff))
-	doffset1 -= (-moff).to_short_addr ();
-      else
-	doffset2 -= moff.to_short_addr ();
-    }
+    doffset2 -= mem_ref_offset (dbase2) << LOG2_BITS_PER_UNIT;
 
   /* If either reference is view-converted, give up now.  */
   if (same_type_for_tbaa (TREE_TYPE (base1), TREE_TYPE (ptrtype1)) != 1
@@ -1241,7 +1224,7 @@ indirect_ref_may_alias_decl_p (tree ref1
   if ((TREE_CODE (base1) != TARGET_MEM_REF
        || (!TMR_INDEX (base1) && !TMR_INDEX2 (base1)))
       && same_type_for_tbaa (TREE_TYPE (base1), TREE_TYPE (dbase2)) == 1)
-    return ranges_overlap_p (doffset1, max_size1, doffset2, max_size2);
+    return ranges_may_overlap_p (doffset1, max_size1, doffset2, max_size2);
 
   if (ref1 && ref2
       && nonoverlapping_component_refs_p (ref1, ref2))
@@ -1313,22 +1296,10 @@ indirect_refs_may_alias_p (tree ref1 ATT
 		      && operand_equal_p (TMR_INDEX2 (base1),
 					  TMR_INDEX2 (base2), 0))))))
     {
-      offset_int moff;
-      /* The offset embedded in MEM_REFs can be negative.  Bias them
-	 so that the resulting offset adjustment is positive.  */
-      moff = mem_ref_offset (base1);
-      moff <<= LOG2_BITS_PER_UNIT;
-      if (wi::neg_p (moff))
-	offset2 += (-moff).to_short_addr ();
-      else
-	offset1 += moff.to_shwi ();
-      moff = mem_ref_offset (base2);
-      moff <<= LOG2_BITS_PER_UNIT;
-      if (wi::neg_p (moff))
-	offset1 += (-moff).to_short_addr ();
-      else
-	offset2 += moff.to_short_addr ();
-      return ranges_overlap_p (offset1, max_size1, offset2, max_size2);
+      offset_int moff1 = mem_ref_offset (base1) << LOG2_BITS_PER_UNIT;
+      offset_int moff2 = mem_ref_offset (base2) << LOG2_BITS_PER_UNIT;
+      return ranges_may_overlap_p (offset1 + moff1, max_size1,
+				   offset2 + moff2, max_size2);
     }
   if (!ptr_derefs_may_alias_p (ptr1, ptr2))
     return false;

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [016/nnn] poly_int: dse.c
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (14 preceding siblings ...)
  2017-10-23 17:06 ` [014/nnn] poly_int: indirect_refs_may_alias_p Richard Sandiford
@ 2017-10-23 17:07 ` Richard Sandiford
  2017-11-18  4:30   ` Jeff Law
  2017-10-23 17:07 ` [017/nnn] poly_int: rtx_addr_can_trap_p_1 Richard Sandiford
                   ` (91 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:07 UTC (permalink / raw)
  To: gcc-patches

This patch makes RTL DSE use poly_int for offsets and sizes.
The local phase can optimise them normally but the global phase
treats them as wild accesses.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* dse.c (store_info): Change offset and width from HOST_WIDE_INT
	to poly_int64.  Update commentary for positions_needed.large.
	(read_info_type): Change offset and width from HOST_WIDE_INT
	to poly_int64.
	(set_usage_bits): Likewise.
	(canon_address): Return the offset as a poly_int64 rather than
	a HOST_WIDE_INT.  Use strip_offset_and_add.
	(set_all_positions_unneeded, any_positions_needed_p): Use
	positions_needed.large to track stores with non-constant widths.
	(all_positions_needed_p): Likewise.  Take the offset and width
	as poly_int64s rather than ints.  Assert that rhs is nonnull.
	(record_store): Cope with non-constant offsets and widths.
	Nullify the rhs of an earlier store if we can't tell which bytes
	of it are needed.
	(find_shift_sequence): Take the access_size and shift as poly_int64s
	rather than ints.
	(get_stored_val): Take the read_offset and read_width as poly_int64s
	rather than HOST_WIDE_INTs.
	(check_mem_read_rtx, scan_stores, scan_reads, dse_step5): Handle
	non-constant offsets and widths.

Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2017-10-23 16:52:20.003305798 +0100
+++ gcc/dse.c	2017-10-23 17:01:54.249406896 +0100
@@ -244,11 +244,11 @@ struct store_info
   rtx mem_addr;
 
   /* The offset of the first byte associated with the operation.  */
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
 
   /* The number of bytes covered by the operation.  This is always exact
      and known (rather than -1).  */
-  HOST_WIDE_INT width;
+  poly_int64 width;
 
   union
     {
@@ -259,12 +259,19 @@ struct store_info
 
       struct
 	{
-	  /* A bitmap with one bit per byte.  Cleared bit means the position
-	     is needed.  Used if IS_LARGE is false.  */
+	  /* A bitmap with one bit per byte, or null if the number of
+	     bytes isn't known at compile time.  A cleared bit means
+	     the position is needed.  Used if IS_LARGE is true.  */
 	  bitmap bmap;
 
-	  /* Number of set bits (i.e. unneeded bytes) in BITMAP.  If it is
-	     equal to WIDTH, the whole store is unused.  */
+	  /* When BITMAP is nonnull, this counts the number of set bits
+	     (i.e. unneeded bytes) in the bitmap.  If it is equal to
+	     WIDTH, the whole store is unused.
+
+	     When BITMAP is null:
+	     - the store is definitely not needed when COUNT == 1
+	     - all the store is needed when COUNT == 0 and RHS is nonnull
+	     - otherwise we don't know which parts of the store are needed.  */
 	  int count;
 	} large;
     } positions_needed;
@@ -308,10 +315,10 @@ struct read_info_type
   int group_id;
 
   /* The offset of the first byte associated with the operation.  */
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
 
   /* The number of bytes covered by the operation, or -1 if not known.  */
-  HOST_WIDE_INT width;
+  poly_int64 width;
 
   /* The mem being read.  */
   rtx mem;
@@ -940,13 +947,18 @@ can_escape (tree expr)
    OFFSET and WIDTH.  */
 
 static void
-set_usage_bits (group_info *group, HOST_WIDE_INT offset, HOST_WIDE_INT width,
+set_usage_bits (group_info *group, poly_int64 offset, poly_int64 width,
                 tree expr)
 {
-  HOST_WIDE_INT i;
+  /* Non-constant offsets and widths act as global kills, so there's no point
+     trying to use them to derive global DSE candidates.  */
+  HOST_WIDE_INT i, const_offset, const_width;
   bool expr_escapes = can_escape (expr);
-  if (offset > -MAX_OFFSET && offset + width < MAX_OFFSET)
-    for (i=offset; i<offset+width; i++)
+  if (offset.is_constant (&const_offset)
+      && width.is_constant (&const_width)
+      && const_offset > -MAX_OFFSET
+      && const_offset + const_width < MAX_OFFSET)
+    for (i = const_offset; i < const_offset + const_width; ++i)
       {
 	bitmap store1;
 	bitmap store2;
@@ -1080,7 +1092,7 @@ const_or_frame_p (rtx x)
 static bool
 canon_address (rtx mem,
 	       int *group_id,
-	       HOST_WIDE_INT *offset,
+	       poly_int64 *offset,
 	       cselib_val **base)
 {
   machine_mode address_mode = get_address_mode (mem);
@@ -1147,12 +1159,7 @@ canon_address (rtx mem,
       if (GET_CODE (address) == CONST)
 	address = XEXP (address, 0);
 
-      if (GET_CODE (address) == PLUS
-	  && CONST_INT_P (XEXP (address, 1)))
-	{
-	  *offset = INTVAL (XEXP (address, 1));
-	  address = XEXP (address, 0);
-	}
+      address = strip_offset_and_add (address, offset);
 
       if (ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (mem))
 	  && const_or_frame_p (address))
@@ -1160,8 +1167,11 @@ canon_address (rtx mem,
 	  group_info *group = get_group_info (address);
 
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "  gid=%d offset=%d \n",
-		     group->id, (int)*offset);
+	    {
+	      fprintf (dump_file, "  gid=%d offset=", group->id);
+	      print_dec (*offset, dump_file);
+	      fprintf (dump_file, "\n");
+	    }
 	  *base = NULL;
 	  *group_id = group->id;
 	  return true;
@@ -1178,8 +1188,12 @@ canon_address (rtx mem,
       return false;
     }
   if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "  varying cselib base=%u:%u offset = %d\n",
-	     (*base)->uid, (*base)->hash, (int)*offset);
+    {
+      fprintf (dump_file, "  varying cselib base=%u:%u offset = ",
+	       (*base)->uid, (*base)->hash);
+      print_dec (*offset, dump_file);
+      fprintf (dump_file, "\n");
+    }
   return true;
 }
 
@@ -1228,9 +1242,17 @@ set_all_positions_unneeded (store_info *
 {
   if (__builtin_expect (s_info->is_large, false))
     {
-      bitmap_set_range (s_info->positions_needed.large.bmap,
-			0, s_info->width);
-      s_info->positions_needed.large.count = s_info->width;
+      HOST_WIDE_INT width;
+      if (s_info->width.is_constant (&width))
+	{
+	  bitmap_set_range (s_info->positions_needed.large.bmap, 0, width);
+	  s_info->positions_needed.large.count = width;
+	}
+      else
+	{
+	  gcc_checking_assert (!s_info->positions_needed.large.bmap);
+	  s_info->positions_needed.large.count = 1;
+	}
     }
   else
     s_info->positions_needed.small_bitmask = HOST_WIDE_INT_0U;
@@ -1242,35 +1264,64 @@ set_all_positions_unneeded (store_info *
 any_positions_needed_p (store_info *s_info)
 {
   if (__builtin_expect (s_info->is_large, false))
-    return s_info->positions_needed.large.count < s_info->width;
+    {
+      HOST_WIDE_INT width;
+      if (s_info->width.is_constant (&width))
+	{
+	  gcc_checking_assert (s_info->positions_needed.large.bmap);
+	  return s_info->positions_needed.large.count < width;
+	}
+      else
+	{
+	  gcc_checking_assert (!s_info->positions_needed.large.bmap);
+	  return s_info->positions_needed.large.count == 0;
+	}
+    }
   else
     return (s_info->positions_needed.small_bitmask != HOST_WIDE_INT_0U);
 }
 
 /* Return TRUE if all bytes START through START+WIDTH-1 from S_INFO
-   store are needed.  */
+   store are known to be needed.  */
 
 static inline bool
-all_positions_needed_p (store_info *s_info, int start, int width)
+all_positions_needed_p (store_info *s_info, poly_int64 start,
+			poly_int64 width)
 {
+  gcc_assert (s_info->rhs);
+  if (!s_info->width.is_constant ())
+    {
+      gcc_assert (s_info->is_large
+		  && !s_info->positions_needed.large.bmap);
+      return s_info->positions_needed.large.count == 0;
+    }
+
+  /* Otherwise, if START and WIDTH are non-constant, we're asking about
+     a non-constant region of a constant-sized store.  We can't say for
+     sure that all positions are needed.  */
+  HOST_WIDE_INT const_start, const_width;
+  if (!start.is_constant (&const_start)
+      || !width.is_constant (&const_width))
+    return false;
+
   if (__builtin_expect (s_info->is_large, false))
     {
-      int end = start + width;
-      while (start < end)
-	if (bitmap_bit_p (s_info->positions_needed.large.bmap, start++))
+      for (HOST_WIDE_INT i = const_start; i < const_start + const_width; ++i)
+	if (bitmap_bit_p (s_info->positions_needed.large.bmap, i))
 	  return false;
       return true;
     }
   else
     {
-      unsigned HOST_WIDE_INT mask = lowpart_bitmask (width) << start;
+      unsigned HOST_WIDE_INT mask
+	= lowpart_bitmask (const_width) << const_start;
       return (s_info->positions_needed.small_bitmask & mask) == mask;
     }
 }
 
 
-static rtx get_stored_val (store_info *, machine_mode, HOST_WIDE_INT,
-			   HOST_WIDE_INT, basic_block, bool);
+static rtx get_stored_val (store_info *, machine_mode, poly_int64,
+			   poly_int64, basic_block, bool);
 
 
 /* BODY is an instruction pattern that belongs to INSN.  Return 1 if
@@ -1281,8 +1332,8 @@ static rtx get_stored_val (store_info *,
 record_store (rtx body, bb_info_t bb_info)
 {
   rtx mem, rhs, const_rhs, mem_addr;
-  HOST_WIDE_INT offset = 0;
-  HOST_WIDE_INT width = 0;
+  poly_int64 offset = 0;
+  poly_int64 width = 0;
   insn_info_t insn_info = bb_info->last_insn;
   store_info *store_info = NULL;
   int group_id;
@@ -1437,7 +1488,7 @@ record_store (rtx body, bb_info_t bb_inf
       group_info *group = rtx_group_vec[group_id];
       mem_addr = group->canon_base_addr;
     }
-  if (offset)
+  if (maybe_nonzero (offset))
     mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
 
   while (ptr)
@@ -1497,18 +1548,27 @@ record_store (rtx body, bb_info_t bb_inf
 		}
 	    }
 
+	  HOST_WIDE_INT begin_unneeded, const_s_width, const_width;
 	  if (known_subrange_p (s_info->offset, s_info->width, offset, width))
 	    /* The new store touches every byte that S_INFO does.  */
 	    set_all_positions_unneeded (s_info);
-	  else
+	  else if ((offset - s_info->offset).is_constant (&begin_unneeded)
+		   && s_info->width.is_constant (&const_s_width)
+		   && width.is_constant (&const_width))
 	    {
-	      HOST_WIDE_INT begin_unneeded = offset - s_info->offset;
-	      HOST_WIDE_INT end_unneeded = begin_unneeded + width;
+	      HOST_WIDE_INT end_unneeded = begin_unneeded + const_width;
 	      begin_unneeded = MAX (begin_unneeded, 0);
-	      end_unneeded = MIN (end_unneeded, s_info->width);
+	      end_unneeded = MIN (end_unneeded, const_s_width);
 	      for (i = begin_unneeded; i < end_unneeded; ++i)
 		set_position_unneeded (s_info, i);
 	    }
+	  else
+	    {
+	      /* We don't know which parts of S_INFO are needed and
+		 which aren't, so invalidate the RHS.  */
+	      s_info->rhs = NULL;
+	      s_info->const_rhs = NULL;
+	    }
 	}
       else if (s_info->rhs)
 	/* Need to see if it is possible for this store to overwrite
@@ -1554,7 +1614,14 @@ record_store (rtx body, bb_info_t bb_inf
   store_info->mem = mem;
   store_info->mem_addr = mem_addr;
   store_info->cse_base = base;
-  if (width > HOST_BITS_PER_WIDE_INT)
+  HOST_WIDE_INT const_width;
+  if (!width.is_constant (&const_width))
+    {
+      store_info->is_large = true;
+      store_info->positions_needed.large.count = 0;
+      store_info->positions_needed.large.bmap = NULL;
+    }
+  else if (const_width > HOST_BITS_PER_WIDE_INT)
     {
       store_info->is_large = true;
       store_info->positions_needed.large.count = 0;
@@ -1563,7 +1630,8 @@ record_store (rtx body, bb_info_t bb_inf
   else
     {
       store_info->is_large = false;
-      store_info->positions_needed.small_bitmask = lowpart_bitmask (width);
+      store_info->positions_needed.small_bitmask
+	= lowpart_bitmask (const_width);
     }
   store_info->group_id = group_id;
   store_info->offset = offset;
@@ -1598,10 +1666,10 @@ dump_insn_info (const char * start, insn
    shift.  */
 
 static rtx
-find_shift_sequence (int access_size,
+find_shift_sequence (poly_int64 access_size,
 		     store_info *store_info,
 		     machine_mode read_mode,
-		     int shift, bool speed, bool require_cst)
+		     poly_int64 shift, bool speed, bool require_cst)
 {
   machine_mode store_mode = GET_MODE (store_info->mem);
   scalar_int_mode new_mode;
@@ -1737,11 +1805,11 @@ look_for_hardregs (rtx x, const_rtx pat
 
 static rtx
 get_stored_val (store_info *store_info, machine_mode read_mode,
-		HOST_WIDE_INT read_offset, HOST_WIDE_INT read_width,
+		poly_int64 read_offset, poly_int64 read_width,
 		basic_block bb, bool require_cst)
 {
   machine_mode store_mode = GET_MODE (store_info->mem);
-  HOST_WIDE_INT gap;
+  poly_int64 gap;
   rtx read_reg;
 
   /* To get here the read is within the boundaries of the write so
@@ -1755,10 +1823,10 @@ get_stored_val (store_info *store_info,
   else
     gap = read_offset - store_info->offset;
 
-  if (gap != 0)
+  if (maybe_nonzero (gap))
     {
-      HOST_WIDE_INT shift = gap * BITS_PER_UNIT;
-      HOST_WIDE_INT access_size = GET_MODE_SIZE (read_mode) + gap;
+      poly_int64 shift = gap * BITS_PER_UNIT;
+      poly_int64 access_size = GET_MODE_SIZE (read_mode) + gap;
       read_reg = find_shift_sequence (access_size, store_info, read_mode,
 				      shift, optimize_bb_for_speed_p (bb),
 				      require_cst);
@@ -1977,8 +2045,8 @@ check_mem_read_rtx (rtx *loc, bb_info_t
 {
   rtx mem = *loc, mem_addr;
   insn_info_t insn_info;
-  HOST_WIDE_INT offset = 0;
-  HOST_WIDE_INT width = 0;
+  poly_int64 offset = 0;
+  poly_int64 width = 0;
   cselib_val *base = NULL;
   int group_id;
   read_info_t read_info;
@@ -2027,7 +2095,7 @@ check_mem_read_rtx (rtx *loc, bb_info_t
       group_info *group = rtx_group_vec[group_id];
       mem_addr = group->canon_base_addr;
     }
-  if (offset)
+  if (maybe_nonzero (offset))
     mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
 
   if (group_id >= 0)
@@ -2039,7 +2107,7 @@ check_mem_read_rtx (rtx *loc, bb_info_t
 
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
-	  if (width == -1)
+	  if (!known_size_p (width))
 	    fprintf (dump_file, " processing const load gid=%d[BLK]\n",
 		     group_id);
 	  else
@@ -2073,7 +2141,7 @@ check_mem_read_rtx (rtx *loc, bb_info_t
 	    {
 	      /* This is a block mode load.  We may get lucky and
 		 canon_true_dependence may save the day.  */
-	      if (width == -1)
+	      if (!known_size_p (width))
 		remove
 		  = canon_true_dependence (store_info->mem,
 					   GET_MODE (store_info->mem),
@@ -2803,13 +2871,17 @@ scan_stores (store_info *store_info, bit
 {
   while (store_info)
     {
-      HOST_WIDE_INT i;
+      HOST_WIDE_INT i, offset, width;
       group_info *group_info
 	= rtx_group_vec[store_info->group_id];
-      if (group_info->process_globally)
+      /* We can (conservatively) ignore stores whose bounds aren't known;
+	 they simply don't generate new global dse opportunities.  */
+      if (group_info->process_globally
+	  && store_info->offset.is_constant (&offset)
+	  && store_info->width.is_constant (&width))
 	{
-	  HOST_WIDE_INT end = store_info->offset + store_info->width;
-	  for (i = store_info->offset; i < end; i++)
+	  HOST_WIDE_INT end = offset + width;
+	  for (i = offset; i < end; i++)
 	    {
 	      int index = get_bitmap_index (group_info, i);
 	      if (index != 0)
@@ -2869,7 +2941,12 @@ scan_reads (insn_info_t insn_info, bitma
 	    {
 	      if (i == read_info->group_id)
 		{
-		  if (!known_size_p (read_info->width))
+		  HOST_WIDE_INT offset, width;
+		  /* Reads with non-constant size kill all DSE opportunities
+		     in the group.  */
+		  if (!read_info->offset.is_constant (&offset)
+		      || !read_info->width.is_constant (&width)
+		      || !known_size_p (width))
 		    {
 		      /* Handle block mode reads.  */
 		      if (kill)
@@ -2881,8 +2958,8 @@ scan_reads (insn_info_t insn_info, bitma
 		      /* The groups are the same, just process the
 			 offsets.  */
 		      HOST_WIDE_INT j;
-		      HOST_WIDE_INT end = read_info->offset + read_info->width;
-		      for (j = read_info->offset; j < end; j++)
+		      HOST_WIDE_INT end = offset + width;
+		      for (j = offset; j < end; j++)
 			{
 			  int index = get_bitmap_index (group, j);
 			  if (index != 0)
@@ -3298,22 +3375,30 @@ dse_step5 (void)
 	      while (!store_info->is_set)
 		store_info = store_info->next;
 
-	      HOST_WIDE_INT i;
+	      HOST_WIDE_INT i, offset, width;
 	      group_info *group_info = rtx_group_vec[store_info->group_id];
 
-	      HOST_WIDE_INT end = store_info->offset + store_info->width;
-	      for (i = store_info->offset; i < end; i++)
+	      if (!store_info->offset.is_constant (&offset)
+		  || !store_info->width.is_constant (&width))
+		deleted = false;
+	      else
 		{
-		  int index = get_bitmap_index (group_info, i);
-
-		  if (dump_file && (dump_flags & TDF_DETAILS))
-		    fprintf (dump_file, "i = %d, index = %d\n", (int)i, index);
-		  if (index == 0 || !bitmap_bit_p (v, index))
+		  HOST_WIDE_INT end = offset + width;
+		  for (i = offset; i < end; i++)
 		    {
+		      int index = get_bitmap_index (group_info, i);
+
 		      if (dump_file && (dump_flags & TDF_DETAILS))
-			fprintf (dump_file, "failing at i = %d\n", (int)i);
-		      deleted = false;
-		      break;
+			fprintf (dump_file, "i = %d, index = %d\n",
+				 (int) i, index);
+		      if (index == 0 || !bitmap_bit_p (v, index))
+			{
+			  if (dump_file && (dump_flags & TDF_DETAILS))
+			    fprintf (dump_file, "failing at i = %d\n",
+				     (int) i);
+			  deleted = false;
+			  break;
+			}
 		    }
 		}
 	      if (deleted)

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [017/nnn] poly_int: rtx_addr_can_trap_p_1
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (15 preceding siblings ...)
  2017-10-23 17:07 ` [016/nnn] poly_int: dse.c Richard Sandiford
@ 2017-10-23 17:07 ` Richard Sandiford
  2017-11-18  4:46   ` Jeff Law
  2017-10-23 17:08 ` [020/nnn] poly_int: store_bit_field bitrange Richard Sandiford
                   ` (90 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:07 UTC (permalink / raw)
  To: gcc-patches

This patch changes the offset and size arguments of
rtx_addr_can_trap_p_1 from HOST_WIDE_INT to poly_int64.  It also
uses a size of -1 rather than 0 to represent an unknown size and
BLKmode rather than VOIDmode to represent an unknown mode.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtlanal.c (rtx_addr_can_trap_p_1): Take the offset and size
	as poly_int64s rather than HOST_WIDE_INTs.  Use a size of -1
	rather than 0 to represent an unknown size.  Assert that the size
	is known when the mode isn't BLKmode.
	(may_trap_p_1): Use -1 for unknown sizes.
	(rtx_addr_can_trap_p): Likewise.  Pass BLKmode rather than VOIDmode.

Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:00:54.444001238 +0100
+++ gcc/rtlanal.c	2017-10-23 17:01:55.453690255 +0100
@@ -457,16 +457,17 @@ get_initial_register_offset (int from, i
    references on strict alignment machines.  */
 
 static int
-rtx_addr_can_trap_p_1 (const_rtx x, HOST_WIDE_INT offset, HOST_WIDE_INT size,
+rtx_addr_can_trap_p_1 (const_rtx x, poly_int64 offset, poly_int64 size,
 		       machine_mode mode, bool unaligned_mems)
 {
   enum rtx_code code = GET_CODE (x);
+  gcc_checking_assert (mode == BLKmode || known_size_p (size));
 
   /* The offset must be a multiple of the mode size if we are considering
      unaligned memory references on strict alignment machines.  */
-  if (STRICT_ALIGNMENT && unaligned_mems && GET_MODE_SIZE (mode) != 0)
+  if (STRICT_ALIGNMENT && unaligned_mems && mode != BLKmode)
     {
-      HOST_WIDE_INT actual_offset = offset;
+      poly_int64 actual_offset = offset;
 
 #ifdef SPARC_STACK_BOUNDARY_HACK
       /* ??? The SPARC port may claim a STACK_BOUNDARY higher than
@@ -477,7 +478,7 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST
 	actual_offset -= STACK_POINTER_OFFSET;
 #endif
 
-      if (actual_offset % GET_MODE_SIZE (mode) != 0)
+      if (!multiple_p (actual_offset, GET_MODE_SIZE (mode)))
 	return 1;
     }
 
@@ -489,14 +490,14 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST
       if (!CONSTANT_POOL_ADDRESS_P (x) && !SYMBOL_REF_FUNCTION_P (x))
 	{
 	  tree decl;
-	  HOST_WIDE_INT decl_size;
+	  poly_int64 decl_size;
 
-	  if (offset < 0)
+	  if (may_lt (offset, 0))
+	    return 1;
+	  if (known_zero (offset))
+	    return 0;
+	  if (!known_size_p (size))
 	    return 1;
-	  if (size == 0)
-	    size = GET_MODE_SIZE (mode);
-	  if (size == 0)
-	    return offset != 0;
 
 	  /* If the size of the access or of the symbol is unknown,
 	     assume the worst.  */
@@ -507,9 +508,10 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST
 	  if (!decl)
 	    decl_size = -1;
 	  else if (DECL_P (decl) && DECL_SIZE_UNIT (decl))
-	    decl_size = (tree_fits_shwi_p (DECL_SIZE_UNIT (decl))
-			 ? tree_to_shwi (DECL_SIZE_UNIT (decl))
-			 : -1);
+	    {
+	      if (!poly_int_tree_p (DECL_SIZE_UNIT (decl), &decl_size))
+		decl_size = -1;
+	    }
 	  else if (TREE_CODE (decl) == STRING_CST)
 	    decl_size = TREE_STRING_LENGTH (decl);
 	  else if (TYPE_SIZE_UNIT (TREE_TYPE (decl)))
@@ -517,7 +519,7 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST
 	  else
 	    decl_size = -1;
 
-	  return (decl_size <= 0 ? offset != 0 : offset + size > decl_size);
+	  return !known_subrange_p (offset, size, 0, decl_size);
         }
 
       return 0;
@@ -534,17 +536,14 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST
 	 || (x == arg_pointer_rtx && fixed_regs[ARG_POINTER_REGNUM]))
 	{
 #ifdef RED_ZONE_SIZE
-	  HOST_WIDE_INT red_zone_size = RED_ZONE_SIZE;
+	  poly_int64 red_zone_size = RED_ZONE_SIZE;
 #else
-	  HOST_WIDE_INT red_zone_size = 0;
+	  poly_int64 red_zone_size = 0;
 #endif
-	  HOST_WIDE_INT stack_boundary = PREFERRED_STACK_BOUNDARY
-					 / BITS_PER_UNIT;
-	  HOST_WIDE_INT low_bound, high_bound;
-
-	  if (size == 0)
-	    size = GET_MODE_SIZE (mode);
-	  if (size == 0)
+	  poly_int64 stack_boundary = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
+	  poly_int64 low_bound, high_bound;
+
+	  if (!known_size_p (size))
 	    return 1;
 
 	  if (x == frame_pointer_rtx)
@@ -562,10 +561,10 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST
 	    }
 	  else if (x == hard_frame_pointer_rtx)
 	    {
-	      HOST_WIDE_INT sp_offset
+	      poly_int64 sp_offset
 		= get_initial_register_offset (STACK_POINTER_REGNUM,
 					       HARD_FRAME_POINTER_REGNUM);
-	      HOST_WIDE_INT ap_offset
+	      poly_int64 ap_offset
 		= get_initial_register_offset (ARG_POINTER_REGNUM,
 					       HARD_FRAME_POINTER_REGNUM);
 
@@ -589,7 +588,7 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST
 	    }
 	  else if (x == stack_pointer_rtx)
 	    {
-	      HOST_WIDE_INT ap_offset
+	      poly_int64 ap_offset
 		= get_initial_register_offset (ARG_POINTER_REGNUM,
 					       STACK_POINTER_REGNUM);
 
@@ -629,7 +628,8 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST
 #endif
 	    }
 
-	  if (offset >= low_bound && offset <= high_bound - size)
+	  if (must_ge (offset, low_bound)
+	      && must_le (offset, high_bound - size))
 	    return 0;
 	  return 1;
 	}
@@ -649,7 +649,7 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST
       if (XEXP (x, 0) == pic_offset_table_rtx
 	  && GET_CODE (XEXP (x, 1)) == CONST
 	  && GET_CODE (XEXP (XEXP (x, 1), 0)) == UNSPEC
-	  && offset == 0)
+	  && known_zero (offset))
 	return 0;
 
       /* - or it is an address that can't trap plus a constant integer.  */
@@ -686,7 +686,7 @@ rtx_addr_can_trap_p_1 (const_rtx x, HOST
 int
 rtx_addr_can_trap_p (const_rtx x)
 {
-  return rtx_addr_can_trap_p_1 (x, 0, 0, VOIDmode, false);
+  return rtx_addr_can_trap_p_1 (x, 0, -1, BLKmode, false);
 }
 
 /* Return true if X contains a MEM subrtx.  */
@@ -2796,7 +2796,7 @@ may_trap_p_1 (const_rtx x, unsigned flag
 	  code_changed
 	  || !MEM_NOTRAP_P (x))
 	{
-	  HOST_WIDE_INT size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : 0;
+	  HOST_WIDE_INT size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1;
 	  return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size,
 					GET_MODE (x), code_changed);
 	}

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [019/nnn] poly_int: lra frame offsets
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (17 preceding siblings ...)
  2017-10-23 17:08 ` [020/nnn] poly_int: store_bit_field bitrange Richard Sandiford
@ 2017-10-23 17:08 ` Richard Sandiford
  2017-12-06  0:16   ` Jeff Law
  2017-10-23 17:08 ` [018/nnn] poly_int: MEM_OFFSET and MEM_SIZE Richard Sandiford
                   ` (88 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:08 UTC (permalink / raw)
  To: gcc-patches

This patch makes LRA use poly_int64s rather than HOST_WIDE_INTs
to store a frame offset (including in things like eliminations).


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* lra-int.h (lra_reg): Change offset from int to poly_int64.
	(lra_insn_recog_data): Change sp_offset from HOST_WIDE_INT
	to poly_int64.
	(lra_eliminate_regs_1, eliminate_regs_in_insn): Change
	update_sp_offset from a HOST_WIDE_INT to a poly_int64.
	(lra_update_reg_val_offset, lra_reg_val_equal_p): Take the
	offset as a poly_int64 rather than an int.
	* lra-assigns.c (find_hard_regno_for_1): Handle poly_int64 offsets.
	(setup_live_pseudos_and_spill_after_risky_transforms): Likewise.
	* lra-constraints.c (equiv_address_substitution): Track offsets
	as poly_int64s.
	(emit_inc): Check poly_int_rtx_p instead of CONST_INT_P.
	(curr_insn_transform): Handle the new form of sp_offset.
	* lra-eliminations.c (lra_elim_table): Change previous_offset
	and offset from HOST_WIDE_INT to poly_int64.
	(print_elim_table, update_reg_eliminate): Update accordingly.
	(self_elim_offsets): Change from HOST_WIDE_INT to poly_int64_pod.
	(get_elimination): Update accordingly.
	(form_sum): Check poly_int_rtx_p instead of CONST_INT_P.
	(lra_eliminate_regs_1, eliminate_regs_in_insn): Change
	update_sp_offset from a HOST_WIDE_INT to a poly_int64.  Handle
	poly_int64 offsets generally.
	(curr_sp_change): Change from HOST_WIDE_INT to poly_int64.
	(mark_not_eliminable, init_elimination): Update accordingly.
	(remove_reg_equal_offset_note): Return a bool and pass the new
	offset back by pointer as a poly_int64.
	* lra-remat.c (change_sp_offset): Take sp_offset as a poly_int64
	rather than a HOST_WIDE_INT.
	(do_remat): Track offsets poly_int64s.
	* lra.c (lra_update_insn_recog_data, setup_sp_offset): Likewise.

Index: gcc/lra-int.h
===================================================================
--- gcc/lra-int.h	2017-10-23 16:52:19.836152258 +0100
+++ gcc/lra-int.h	2017-10-23 17:01:59.910337542 +0100
@@ -106,7 +106,7 @@ struct lra_reg
      they do not conflict.  */
   int val;
   /* Offset from relative eliminate register to pesudo reg.  */
-  int offset;
+  poly_int64 offset;
   /* These members are set up in lra-lives.c and updated in
      lra-coalesce.c.  */
   /* The biggest size mode in which each pseudo reg is referred in
@@ -213,7 +213,7 @@ struct lra_insn_recog_data
      insn.  */
   int used_insn_alternative;
   /* SP offset before the insn relative to one at the func start.  */
-  HOST_WIDE_INT sp_offset;
+  poly_int64 sp_offset;
   /* The insn itself.  */
   rtx_insn *insn;
   /* Common data for insns with the same ICODE.  Asm insns (their
@@ -406,8 +406,8 @@ extern bool lra_remat (void);
 extern void lra_debug_elim_table (void);
 extern int lra_get_elimination_hard_regno (int);
 extern rtx lra_eliminate_regs_1 (rtx_insn *, rtx, machine_mode,
-				 bool, bool, HOST_WIDE_INT, bool);
-extern void eliminate_regs_in_insn (rtx_insn *insn, bool, bool, HOST_WIDE_INT);
+				 bool, bool, poly_int64, bool);
+extern void eliminate_regs_in_insn (rtx_insn *insn, bool, bool, poly_int64);
 extern void lra_eliminate (bool, bool);
 
 extern void lra_eliminate_reg_if_possible (rtx *);
@@ -493,7 +493,7 @@ lra_get_insn_recog_data (rtx_insn *insn)
 
 /* Update offset from pseudos with VAL by INCR.  */
 static inline void
-lra_update_reg_val_offset (int val, int incr)
+lra_update_reg_val_offset (int val, poly_int64 incr)
 {
   int i;
 
@@ -506,10 +506,10 @@ lra_update_reg_val_offset (int val, int
 
 /* Return true if register content is equal to VAL with OFFSET.  */
 static inline bool
-lra_reg_val_equal_p (int regno, int val, int offset)
+lra_reg_val_equal_p (int regno, int val, poly_int64 offset)
 {
   if (lra_reg_info[regno].val == val
-      && lra_reg_info[regno].offset == offset)
+      && must_eq (lra_reg_info[regno].offset, offset))
     return true;
 
   return false;
Index: gcc/lra-assigns.c
===================================================================
--- gcc/lra-assigns.c	2017-10-23 16:52:19.836152258 +0100
+++ gcc/lra-assigns.c	2017-10-23 17:01:59.909338965 +0100
@@ -485,7 +485,8 @@ find_hard_regno_for_1 (int regno, int *c
   int hr, conflict_hr, nregs;
   machine_mode biggest_mode;
   unsigned int k, conflict_regno;
-  int offset, val, biggest_nregs, nregs_diff;
+  poly_int64 offset;
+  int val, biggest_nregs, nregs_diff;
   enum reg_class rclass;
   bitmap_iterator bi;
   bool *rclass_intersect_p;
@@ -1147,7 +1148,8 @@ setup_live_pseudos_and_spill_after_risky
 {
   int p, i, j, n, regno, hard_regno;
   unsigned int k, conflict_regno;
-  int val, offset;
+  poly_int64 offset;
+  int val;
   HARD_REG_SET conflict_set;
   machine_mode mode;
   lra_live_range_t r;
Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c	2017-10-23 16:52:19.836152258 +0100
+++ gcc/lra-constraints.c	2017-10-23 17:01:59.910337542 +0100
@@ -3084,7 +3084,8 @@ can_add_disp_p (struct address_info *ad)
 equiv_address_substitution (struct address_info *ad)
 {
   rtx base_reg, new_base_reg, index_reg, new_index_reg, *base_term, *index_term;
-  HOST_WIDE_INT disp, scale;
+  poly_int64 disp;
+  HOST_WIDE_INT scale;
   bool change_p;
 
   base_term = strip_subreg (ad->base_term);
@@ -3115,6 +3116,7 @@ equiv_address_substitution (struct addre
     }
   if (base_reg != new_base_reg)
     {
+      poly_int64 offset;
       if (REG_P (new_base_reg))
 	{
 	  *base_term = new_base_reg;
@@ -3122,10 +3124,10 @@ equiv_address_substitution (struct addre
 	}
       else if (GET_CODE (new_base_reg) == PLUS
 	       && REG_P (XEXP (new_base_reg, 0))
-	       && CONST_INT_P (XEXP (new_base_reg, 1))
+	       && poly_int_rtx_p (XEXP (new_base_reg, 1), &offset)
 	       && can_add_disp_p (ad))
 	{
-	  disp += INTVAL (XEXP (new_base_reg, 1));
+	  disp += offset;
 	  *base_term = XEXP (new_base_reg, 0);
 	  change_p = true;
 	}
@@ -3134,6 +3136,7 @@ equiv_address_substitution (struct addre
     }
   if (index_reg != new_index_reg)
     {
+      poly_int64 offset;
       if (REG_P (new_index_reg))
 	{
 	  *index_term = new_index_reg;
@@ -3141,16 +3144,16 @@ equiv_address_substitution (struct addre
 	}
       else if (GET_CODE (new_index_reg) == PLUS
 	       && REG_P (XEXP (new_index_reg, 0))
-	       && CONST_INT_P (XEXP (new_index_reg, 1))
+	       && poly_int_rtx_p (XEXP (new_index_reg, 1), &offset)
 	       && can_add_disp_p (ad)
 	       && (scale = get_index_scale (ad)))
 	{
-	  disp += INTVAL (XEXP (new_index_reg, 1)) * scale;
+	  disp += offset * scale;
 	  *index_term = XEXP (new_index_reg, 0);
 	  change_p = true;
 	}
     }
-  if (disp != 0)
+  if (maybe_nonzero (disp))
     {
       if (ad->disp != NULL)
 	*ad->disp = plus_constant (GET_MODE (*ad->inner), *ad->disp, disp);
@@ -3629,9 +3632,10 @@ emit_inc (enum reg_class new_rclass, rtx
 	 register.  */
       if (plus_p)
 	{
-	  if (CONST_INT_P (inc))
+	  poly_int64 offset;
+	  if (poly_int_rtx_p (inc, &offset))
 	    emit_insn (gen_add2_insn (result,
-				      gen_int_mode (-INTVAL (inc),
+				      gen_int_mode (-offset,
 						    GET_MODE (result))));
 	  else
 	    emit_insn (gen_sub2_insn (result, inc));
@@ -3999,10 +4003,13 @@ curr_insn_transform (bool check_only_p)
       if (INSN_CODE (curr_insn) >= 0
           && (p = get_insn_name (INSN_CODE (curr_insn))) != NULL)
         fprintf (lra_dump_file, " {%s}", p);
-      if (curr_id->sp_offset != 0)
-        fprintf (lra_dump_file, " (sp_off=%" HOST_WIDE_INT_PRINT "d)",
-		 curr_id->sp_offset);
-       fprintf (lra_dump_file, "\n");
+      if (maybe_nonzero (curr_id->sp_offset))
+	{
+	  fprintf (lra_dump_file, " (sp_off=");
+	  print_dec (curr_id->sp_offset, lra_dump_file);
+	  fprintf (lra_dump_file, ")");
+	}
+      fprintf (lra_dump_file, "\n");
     }
 
   /* Right now, for any pair of operands I and J that are required to
Index: gcc/lra-eliminations.c
===================================================================
--- gcc/lra-eliminations.c	2017-10-23 16:52:19.836152258 +0100
+++ gcc/lra-eliminations.c	2017-10-23 17:01:59.910337542 +0100
@@ -79,9 +79,9 @@ struct lra_elim_table
   int to;
   /* Difference between values of the two hard registers above on
      previous iteration.  */
-  HOST_WIDE_INT previous_offset;
+  poly_int64 previous_offset;
   /* Difference between the values on the current iteration.  */
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
   /* Nonzero if this elimination can be done.  */
   bool can_eliminate;
   /* CAN_ELIMINATE since the last check.  */
@@ -120,10 +120,14 @@ print_elim_table (FILE *f)
   struct lra_elim_table *ep;
 
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
-    fprintf (f, "%s eliminate %d to %d (offset=" HOST_WIDE_INT_PRINT_DEC
-	     ", prev_offset=" HOST_WIDE_INT_PRINT_DEC ")\n",
-	     ep->can_eliminate ? "Can" : "Can't",
-	     ep->from, ep->to, ep->offset, ep->previous_offset);
+    {
+      fprintf (f, "%s eliminate %d to %d (offset=",
+	       ep->can_eliminate ? "Can" : "Can't", ep->from, ep->to);
+      print_dec (ep->offset, f);
+      fprintf (f, ", prev_offset=");
+      print_dec (ep->previous_offset, f);
+      fprintf (f, ")\n");
+    }
 }
 
 /* Print info about elimination table to stderr.  */
@@ -161,7 +165,7 @@ setup_can_eliminate (struct lra_elim_tab
 /* Offsets should be used to restore original offsets for eliminable
    hard register which just became not eliminable.  Zero,
    otherwise.  */
-static HOST_WIDE_INT self_elim_offsets[FIRST_PSEUDO_REGISTER];
+static poly_int64_pod self_elim_offsets[FIRST_PSEUDO_REGISTER];
 
 /* Map: hard regno -> RTL presentation.	 RTL presentations of all
    potentially eliminable hard registers are stored in the map.	 */
@@ -193,6 +197,7 @@ setup_elimination_map (void)
 form_sum (rtx x, rtx y)
 {
   machine_mode mode = GET_MODE (x);
+  poly_int64 offset;
 
   if (mode == VOIDmode)
     mode = GET_MODE (y);
@@ -200,10 +205,10 @@ form_sum (rtx x, rtx y)
   if (mode == VOIDmode)
     mode = Pmode;
 
-  if (CONST_INT_P (x))
-    return plus_constant (mode, y, INTVAL (x));
-  else if (CONST_INT_P (y))
-    return plus_constant (mode, x, INTVAL (y));
+  if (poly_int_rtx_p (x, &offset))
+    return plus_constant (mode, y, offset);
+  else if (poly_int_rtx_p (y, &offset))
+    return plus_constant (mode, x, offset);
   else if (CONSTANT_P (x))
     std::swap (x, y);
 
@@ -252,14 +257,14 @@ get_elimination (rtx reg)
 {
   int hard_regno;
   struct lra_elim_table *ep;
-  HOST_WIDE_INT offset;
 
   lra_assert (REG_P (reg));
   if ((hard_regno = REGNO (reg)) < 0 || hard_regno >= FIRST_PSEUDO_REGISTER)
     return NULL;
   if ((ep = elimination_map[hard_regno]) != NULL)
     return ep->from_rtx != reg ? NULL : ep;
-  if ((offset = self_elim_offsets[hard_regno]) == 0)
+  poly_int64 offset = self_elim_offsets[hard_regno];
+  if (known_zero (offset))
     return NULL;
   /* This is an iteration to restore offsets just after HARD_REGNO
      stopped to be eliminable.	*/
@@ -325,7 +330,7 @@ move_plus_up (rtx x)
 rtx
 lra_eliminate_regs_1 (rtx_insn *insn, rtx x, machine_mode mem_mode,
 		      bool subst_p, bool update_p,
-		      HOST_WIDE_INT update_sp_offset, bool full_p)
+		      poly_int64 update_sp_offset, bool full_p)
 {
   enum rtx_code code = GET_CODE (x);
   struct lra_elim_table *ep;
@@ -335,7 +340,8 @@ lra_eliminate_regs_1 (rtx_insn *insn, rt
   int copied = 0;
 
   lra_assert (!update_p || !full_p);
-  lra_assert (update_sp_offset == 0 || (!subst_p && update_p && !full_p));
+  lra_assert (known_zero (update_sp_offset)
+	      || (!subst_p && update_p && !full_p));
   if (! current_function_decl)
     return x;
 
@@ -360,7 +366,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rt
 	{
 	  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
-	  if (update_sp_offset != 0)
+	  if (maybe_nonzero (update_sp_offset))
 	    {
 	      if (ep->to_rtx == stack_pointer_rtx)
 		return plus_constant (Pmode, to, update_sp_offset);
@@ -387,20 +393,21 @@ lra_eliminate_regs_1 (rtx_insn *insn, rt
 	{
 	  if ((ep = get_elimination (XEXP (x, 0))) != NULL)
 	    {
-	      HOST_WIDE_INT offset;
+	      poly_int64 offset, curr_offset;
 	      rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
 	      if (! update_p && ! full_p)
 		return gen_rtx_PLUS (Pmode, to, XEXP (x, 1));
 	      
-	      if (update_sp_offset != 0)
+	      if (maybe_nonzero (update_sp_offset))
 		offset = ep->to_rtx == stack_pointer_rtx ? update_sp_offset : 0;
 	      else
 		offset = (update_p
 			  ? ep->offset - ep->previous_offset : ep->offset);
 	      if (full_p && insn != NULL_RTX && ep->to_rtx == stack_pointer_rtx)
 		offset -= lra_get_insn_recog_data (insn)->sp_offset;
-	      if (CONST_INT_P (XEXP (x, 1)) && INTVAL (XEXP (x, 1)) == -offset)
+	      if (poly_int_rtx_p (XEXP (x, 1), &curr_offset)
+		  && must_eq (curr_offset, -offset))
 		return to;
 	      else
 		return gen_rtx_PLUS (Pmode, to,
@@ -449,7 +456,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rt
 	{
 	  rtx to = subst_p ? ep->to_rtx : ep->from_rtx;
 
-	  if (update_sp_offset != 0)
+	  if (maybe_nonzero (update_sp_offset))
 	    {
 	      if (ep->to_rtx == stack_pointer_rtx)
 		return plus_constant (Pmode,
@@ -464,7 +471,7 @@ lra_eliminate_regs_1 (rtx_insn *insn, rt
 				  * INTVAL (XEXP (x, 1)));
 	  else if (full_p)
 	    {
-	      HOST_WIDE_INT offset = ep->offset;
+	      poly_int64 offset = ep->offset;
 
 	      if (insn != NULL_RTX && ep->to_rtx == stack_pointer_rtx)
 		offset -= lra_get_insn_recog_data (insn)->sp_offset;
@@ -711,7 +718,7 @@ lra_eliminate_regs (rtx x, machine_mode
 /* Stack pointer offset before the current insn relative to one at the
    func start.  RTL insns can change SP explicitly.  We keep the
    changes from one insn to another through this variable.  */
-static HOST_WIDE_INT curr_sp_change;
+static poly_int64 curr_sp_change;
 
 /* Scan rtx X for references to elimination source or target registers
    in contexts that would prevent the elimination from happening.
@@ -725,6 +732,7 @@ mark_not_eliminable (rtx x, machine_mode
   struct lra_elim_table *ep;
   int i, j;
   const char *fmt;
+  poly_int64 offset = 0;
 
   switch (code)
     {
@@ -738,7 +746,7 @@ mark_not_eliminable (rtx x, machine_mode
 	  && ((code != PRE_MODIFY && code != POST_MODIFY)
 	      || (GET_CODE (XEXP (x, 1)) == PLUS
 		  && XEXP (x, 0) == XEXP (XEXP (x, 1), 0)
-		  && CONST_INT_P (XEXP (XEXP (x, 1), 1)))))
+		  && poly_int_rtx_p (XEXP (XEXP (x, 1), 1), &offset))))
 	{
 	  int size = GET_MODE_SIZE (mem_mode);
 	  
@@ -752,7 +760,7 @@ mark_not_eliminable (rtx x, machine_mode
 	  else if (code == PRE_INC || code == POST_INC)
 	    curr_sp_change += size;
 	  else if (code == PRE_MODIFY || code == POST_MODIFY)
-	    curr_sp_change += INTVAL (XEXP (XEXP (x, 1), 1));
+	    curr_sp_change += offset;
 	}
       else if (REG_P (XEXP (x, 0))
 	       && REGNO (XEXP (x, 0)) >= FIRST_PSEUDO_REGISTER)
@@ -802,9 +810,9 @@ mark_not_eliminable (rtx x, machine_mode
       if (SET_DEST (x) == stack_pointer_rtx
 	  && GET_CODE (SET_SRC (x)) == PLUS
 	  && XEXP (SET_SRC (x), 0) == SET_DEST (x)
-	  && CONST_INT_P (XEXP (SET_SRC (x), 1)))
+	  && poly_int_rtx_p (XEXP (SET_SRC (x), 1), &offset))
 	{
-	  curr_sp_change += INTVAL (XEXP (SET_SRC (x), 1));
+	  curr_sp_change += offset;
 	  return;
 	}
       if (! REG_P (SET_DEST (x))
@@ -859,11 +867,11 @@ mark_not_eliminable (rtx x, machine_mode
 
 #ifdef HARD_FRAME_POINTER_REGNUM
 
-/* Find offset equivalence note for reg WHAT in INSN and return the
-   found elmination offset.  If the note is not found, return NULL.
-   Remove the found note.  */
-static rtx
-remove_reg_equal_offset_note (rtx_insn *insn, rtx what)
+/* Search INSN's reg notes to see whether the destination is equal to
+   WHAT + C for some constant C.  Return true if so, storing C in
+   *OFFSET_OUT and removing the reg note.  */
+static bool
+remove_reg_equal_offset_note (rtx_insn *insn, rtx what, poly_int64 *offset_out)
 {
   rtx link, *link_loc;
 
@@ -873,12 +881,12 @@ remove_reg_equal_offset_note (rtx_insn *
     if (REG_NOTE_KIND (link) == REG_EQUAL
 	&& GET_CODE (XEXP (link, 0)) == PLUS
 	&& XEXP (XEXP (link, 0), 0) == what
-	&& CONST_INT_P (XEXP (XEXP (link, 0), 1)))
+	&& poly_int_rtx_p (XEXP (XEXP (link, 0), 1), offset_out))
       {
 	*link_loc = XEXP (link, 1);
-	return XEXP (XEXP (link, 0), 1);
+	return true;
       }
-  return NULL_RTX;
+  return false;
 }
 
 #endif
@@ -899,7 +907,7 @@ remove_reg_equal_offset_note (rtx_insn *
 
 void
 eliminate_regs_in_insn (rtx_insn *insn, bool replace_p, bool first_p,
-			HOST_WIDE_INT update_sp_offset)
+			poly_int64 update_sp_offset)
 {
   int icode = recog_memoized (insn);
   rtx old_set = single_set (insn);
@@ -940,28 +948,21 @@ eliminate_regs_in_insn (rtx_insn *insn,
 		 nonlocal goto.  */
 	      {
 		rtx src = SET_SRC (old_set);
-		rtx off = remove_reg_equal_offset_note (insn, ep->to_rtx);
-		
+		poly_int64 offset = 0;
+
 		/* We should never process such insn with non-zero
 		   UPDATE_SP_OFFSET.  */
-		lra_assert (update_sp_offset == 0);
+		lra_assert (known_zero (update_sp_offset));
 		
-		if (off != NULL_RTX
-		    || src == ep->to_rtx
-		    || (GET_CODE (src) == PLUS
-			&& XEXP (src, 0) == ep->to_rtx
-			&& CONST_INT_P (XEXP (src, 1))))
+		if (remove_reg_equal_offset_note (insn, ep->to_rtx, &offset)
+		    || strip_offset (src, &offset) == ep->to_rtx)
 		  {
-		    HOST_WIDE_INT offset;
-		    
 		    if (replace_p)
 		      {
 			SET_DEST (old_set) = ep->to_rtx;
 			lra_update_insn_recog_data (insn);
 			return;
 		      }
-		    offset = (off != NULL_RTX ? INTVAL (off)
-			      : src == ep->to_rtx ? 0 : INTVAL (XEXP (src, 1)));
 		    offset -= (ep->offset - ep->previous_offset);
 		    src = plus_constant (Pmode, ep->to_rtx, offset);
 		    
@@ -997,13 +998,13 @@ eliminate_regs_in_insn (rtx_insn *insn,
      currently support: a single set with the source or a REG_EQUAL
      note being a PLUS of an eliminable register and a constant.  */
   plus_src = plus_cst_src = 0;
+  poly_int64 offset = 0;
   if (old_set && REG_P (SET_DEST (old_set)))
     {
       if (GET_CODE (SET_SRC (old_set)) == PLUS)
 	plus_src = SET_SRC (old_set);
       /* First see if the source is of the form (plus (...) CST).  */
-      if (plus_src
-	  && CONST_INT_P (XEXP (plus_src, 1)))
+      if (plus_src && poly_int_rtx_p (XEXP (plus_src, 1), &offset))
 	plus_cst_src = plus_src;
       /* Check that the first operand of the PLUS is a hard reg or
 	 the lowpart subreg of one.  */
@@ -1021,7 +1022,6 @@ eliminate_regs_in_insn (rtx_insn *insn,
   if (plus_cst_src)
     {
       rtx reg = XEXP (plus_cst_src, 0);
-      HOST_WIDE_INT offset = INTVAL (XEXP (plus_cst_src, 1));
 
       if (GET_CODE (reg) == SUBREG)
 	reg = SUBREG_REG (reg);
@@ -1032,7 +1032,7 @@ eliminate_regs_in_insn (rtx_insn *insn,
 
 	  if (! replace_p)
 	    {
-	      if (update_sp_offset == 0)
+	      if (known_zero (update_sp_offset))
 		offset += (ep->offset - ep->previous_offset);
 	      if (ep->to_rtx == stack_pointer_rtx)
 		{
@@ -1051,7 +1051,7 @@ eliminate_regs_in_insn (rtx_insn *insn,
 	     the cost of the insn by replacing a simple REG with (plus
 	     (reg sp) CST).  So try only when we already had a PLUS
 	     before.  */
-	  if (offset == 0 || plus_src)
+	  if (known_zero (offset) || plus_src)
 	    {
 	      rtx new_src = plus_constant (GET_MODE (to_rtx), to_rtx, offset);
 
@@ -1239,7 +1239,7 @@ update_reg_eliminate (bitmap insns_with_
 	      if (lra_dump_file != NULL)
 		fprintf (lra_dump_file, "    Using elimination %d to %d now\n",
 			 ep1->from, ep1->to);
-	      lra_assert (ep1->previous_offset == 0);
+	      lra_assert (known_zero (ep1->previous_offset));
 	      ep1->previous_offset = ep->offset;
 	    }
 	  else
@@ -1251,7 +1251,7 @@ update_reg_eliminate (bitmap insns_with_
 		fprintf (lra_dump_file, "    %d is not eliminable at all\n",
 			 ep->from);
 	      self_elim_offsets[ep->from] = -ep->offset;
-	      if (ep->offset != 0)
+	      if (maybe_nonzero (ep->offset))
 		bitmap_ior_into (insns_with_changed_offsets,
 				 &lra_reg_info[ep->from].insn_bitmap);
 	    }
@@ -1271,7 +1271,7 @@ update_reg_eliminate (bitmap insns_with_
 	   the usage for pseudos.  */
         if (ep->from != ep->to)
 	  SET_HARD_REG_BIT (temp_hard_reg_set, ep->to);
-	if (ep->previous_offset != ep->offset)
+	if (may_ne (ep->previous_offset, ep->offset))
 	  {
 	    bitmap_ior_into (insns_with_changed_offsets,
 			     &lra_reg_info[ep->from].insn_bitmap);
@@ -1357,13 +1357,13 @@ init_elimination (void)
 	    if (NONDEBUG_INSN_P (insn))
 	      {
 		mark_not_eliminable (PATTERN (insn), VOIDmode);
-		if (curr_sp_change != 0
+		if (maybe_nonzero (curr_sp_change)
 		    && find_reg_note (insn, REG_LABEL_OPERAND, NULL_RTX))
 		  stop_to_sp_elimination_p = true;
 	      }
 	  }
       if (! frame_pointer_needed
-	  && (curr_sp_change != 0 || stop_to_sp_elimination_p)
+	  && (maybe_nonzero (curr_sp_change) || stop_to_sp_elimination_p)
 	  && bb->succs && bb->succs->length () != 0)
 	for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
 	  if (ep->to == STACK_POINTER_REGNUM)
Index: gcc/lra-remat.c
===================================================================
--- gcc/lra-remat.c	2017-10-23 16:52:19.836152258 +0100
+++ gcc/lra-remat.c	2017-10-23 17:01:59.910337542 +0100
@@ -994,7 +994,7 @@ calculate_global_remat_bb_data (void)
 
 /* Setup sp offset attribute to SP_OFFSET for all INSNS.  */
 static void
-change_sp_offset (rtx_insn *insns, HOST_WIDE_INT sp_offset)
+change_sp_offset (rtx_insn *insns, poly_int64 sp_offset)
 {
   for (rtx_insn *insn = insns; insn != NULL; insn = NEXT_INSN (insn))
     eliminate_regs_in_insn (insn, false, false, sp_offset);
@@ -1118,7 +1118,7 @@ do_remat (void)
 	  int i, hard_regno, nregs;
 	  int dst_hard_regno, dst_nregs;
 	  rtx_insn *remat_insn = NULL;
-	  HOST_WIDE_INT cand_sp_offset = 0;
+	  poly_int64 cand_sp_offset = 0;
 	  if (cand != NULL)
 	    {
 	      lra_insn_recog_data_t cand_id
@@ -1241,8 +1241,8 @@ do_remat (void)
 
 	  if (remat_insn != NULL)
 	    {
-	      HOST_WIDE_INT sp_offset_change = cand_sp_offset - id->sp_offset;
-	      if (sp_offset_change != 0)
+	      poly_int64 sp_offset_change = cand_sp_offset - id->sp_offset;
+	      if (maybe_nonzero (sp_offset_change))
 		change_sp_offset (remat_insn, sp_offset_change);
 	      update_scratch_ops (remat_insn);
 	      lra_process_new_insns (insn, remat_insn, NULL,
Index: gcc/lra.c
===================================================================
--- gcc/lra.c	2017-10-23 16:52:19.836152258 +0100
+++ gcc/lra.c	2017-10-23 17:01:59.911336118 +0100
@@ -1163,7 +1163,7 @@ lra_update_insn_recog_data (rtx_insn *in
   int n;
   unsigned int uid = INSN_UID (insn);
   struct lra_static_insn_data *insn_static_data;
-  HOST_WIDE_INT sp_offset = 0;
+  poly_int64 sp_offset = 0;
 
   check_and_expand_insn_recog_data (uid);
   if ((data = lra_insn_recog_data[uid]) != NULL
@@ -1805,8 +1805,8 @@ push_insns (rtx_insn *from, rtx_insn *to
 setup_sp_offset (rtx_insn *from, rtx_insn *last)
 {
   rtx_insn *before = next_nonnote_insn_bb (last);
-  HOST_WIDE_INT offset = (before == NULL_RTX || ! INSN_P (before)
-			  ? 0 : lra_get_insn_recog_data (before)->sp_offset);
+  poly_int64 offset = (before == NULL_RTX || ! INSN_P (before)
+		       ? 0 : lra_get_insn_recog_data (before)->sp_offset);
 
   for (rtx_insn *insn = from; insn != NEXT_INSN (last); insn = NEXT_INSN (insn))
     lra_get_insn_recog_data (insn)->sp_offset = offset;

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [018/nnn] poly_int: MEM_OFFSET and MEM_SIZE
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (18 preceding siblings ...)
  2017-10-23 17:08 ` [019/nnn] poly_int: lra frame offsets Richard Sandiford
@ 2017-10-23 17:08 ` Richard Sandiford
  2017-12-06 18:27   ` Jeff Law
  2017-10-23 17:09 ` [023/nnn] poly_int: store_field & co Richard Sandiford
                   ` (87 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:08 UTC (permalink / raw)
  To: gcc-patches

This patch changes the MEM_OFFSET and MEM_SIZE memory attributes
from HOST_WIDE_INT to poly_int64.  Most of it is mechanical,
but there is one nonbovious change in widen_memory_access.
Previously the main while loop broke with:

      /* Similarly for the decl.  */
      else if (DECL_P (attrs.expr)
               && DECL_SIZE_UNIT (attrs.expr)
               && TREE_CODE (DECL_SIZE_UNIT (attrs.expr)) == INTEGER_CST
               && compare_tree_int (DECL_SIZE_UNIT (attrs.expr), size) >= 0
               && (! attrs.offset_known_p || attrs.offset >= 0))
        break;

but it seemed wrong to optimistically assume the best case
when the offset isn't known (and thus might be negative).
As it happens, the "! attrs.offset_known_p" condition was
always false, because we'd already nullified attrs.expr in
that case:

  /* If we don't know what offset we were at within the expression, then
     we can't know if we've overstepped the bounds.  */
  if (! attrs.offset_known_p)
    attrs.expr = NULL_TREE;

The patch therefore drops "! attrs.offset_known_p ||" when
converting the offset check to the may/must interface.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtl.h (mem_attrs): Add a default constructor.  Change size and
	offset from HOST_WIDE_INT to poly_int64.
	* emit-rtl.h (set_mem_offset, set_mem_size, adjust_address_1)
	(adjust_automodify_address_1, set_mem_attributes_minus_bitpos)
	(widen_memory_access): Take the sizes and offsets as poly_int64s
	rather than HOST_WIDE_INTs.
	* alias.c (ao_ref_from_mem): Handle the new form of MEM_OFFSET.
	(offset_overlap_p): Take poly_int64s rather than HOST_WIDE_INTs
	and ints.
	(adjust_offset_for_component_ref): Change the offset from a
	HOST_WIDE_INT to a poly_int64.
	(nonoverlapping_memrefs_p): Track polynomial offsets and sizes.
	* cfgcleanup.c (merge_memattrs): Update after mem_attrs changes.
	* dce.c (find_call_stack_args): Likewise.
	* dse.c (record_store): Likewise.
	* dwarf2out.c (tls_mem_loc_descriptor, dw_sra_loc_expr): Likewise.
	* print-rtl.c (rtx_writer::print_rtx): Likewise.
	* read-rtl-function.c (test_loading_mem): Likewise.
	* rtlanal.c (may_trap_p_1): Likewise.
	* simplify-rtx.c (delegitimize_mem_from_attrs): Likewise.
	* var-tracking.c (int_mem_offset, track_expr_p): Likewise.
	* emit-rtl.c (mem_attrs_eq_p, get_mem_align_offset): Likewise.
	(mem_attrs::mem_attrs): New function.
	(set_mem_attributes_minus_bitpos): Change bitpos from a
	HOST_WIDE_INT to poly_int64.
	(set_mem_alias_set, set_mem_addr_space, set_mem_align, set_mem_expr)
	(clear_mem_offset, clear_mem_size, change_address)
	(get_spill_slot_decl, set_mem_attrs_for_spill): Directly
	initialize mem_attrs.
	(set_mem_offset, set_mem_size, adjust_address_1)
	(adjust_automodify_address_1, offset_address, widen_memory_access):
	Likewise.  Take poly_int64s rather than HOST_WIDE_INT.

Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 17:01:43.314993320 +0100
+++ gcc/rtl.h	2017-10-23 17:01:56.777802803 +0100
@@ -147,6 +147,8 @@ struct addr_diff_vec_flags
    they cannot be modified in place.  */
 struct GTY(()) mem_attrs
 {
+  mem_attrs ();
+
   /* The expression that the MEM accesses, or null if not known.
      This expression might be larger than the memory reference itself.
      (In other words, the MEM might access only part of the object.)  */
@@ -154,11 +156,11 @@ struct GTY(()) mem_attrs
 
   /* The offset of the memory reference from the start of EXPR.
      Only valid if OFFSET_KNOWN_P.  */
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
 
   /* The size of the memory reference in bytes.  Only valid if
      SIZE_KNOWN_P.  */
-  HOST_WIDE_INT size;
+  poly_int64 size;
 
   /* The alias set of the memory reference.  */
   alias_set_type alias;
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-10-23 17:00:54.440004873 +0100
+++ gcc/emit-rtl.h	2017-10-23 17:01:56.777802803 +0100
@@ -333,13 +333,13 @@ extern void set_mem_addr_space (rtx, add
 extern void set_mem_expr (rtx, tree);
 
 /* Set the offset for MEM to OFFSET.  */
-extern void set_mem_offset (rtx, HOST_WIDE_INT);
+extern void set_mem_offset (rtx, poly_int64);
 
 /* Clear the offset recorded for MEM.  */
 extern void clear_mem_offset (rtx);
 
 /* Set the size for MEM to SIZE.  */
-extern void set_mem_size (rtx, HOST_WIDE_INT);
+extern void set_mem_size (rtx, poly_int64);
 
 /* Clear the size recorded for MEM.  */
 extern void clear_mem_size (rtx);
@@ -488,10 +488,10 @@ #define adjust_automodify_address(MEMREF
 #define adjust_automodify_address_nv(MEMREF, MODE, ADDR, OFFSET) \
   adjust_automodify_address_1 (MEMREF, MODE, ADDR, OFFSET, 0)
 
-extern rtx adjust_address_1 (rtx, machine_mode, HOST_WIDE_INT, int, int,
-			     int, HOST_WIDE_INT);
+extern rtx adjust_address_1 (rtx, machine_mode, poly_int64, int, int,
+			     int, poly_int64);
 extern rtx adjust_automodify_address_1 (rtx, machine_mode, rtx,
-					HOST_WIDE_INT, int);
+					poly_int64, int);
 
 /* Return a memory reference like MEMREF, but whose address is changed by
    adding OFFSET, an RTX, to it.  POW2 is the highest power of two factor
@@ -506,7 +506,7 @@ extern void set_mem_attributes (rtx, tre
 /* Similar, except that BITPOS has not yet been applied to REF, so if
    we alter MEM_OFFSET according to T then we should subtract BITPOS
    expecting that it'll be added back in later.  */
-extern void set_mem_attributes_minus_bitpos (rtx, tree, int, HOST_WIDE_INT);
+extern void set_mem_attributes_minus_bitpos (rtx, tree, int, poly_int64);
 
 /* Return OFFSET if XEXP (MEM, 0) - OFFSET is known to be ALIGN
    bits aligned for 0 <= OFFSET < ALIGN / BITS_PER_UNIT, or
@@ -515,7 +515,7 @@ extern int get_mem_align_offset (rtx, un
 
 /* Return a memory reference like MEMREF, but with its mode widened to
    MODE and adjusted by OFFSET.  */
-extern rtx widen_memory_access (rtx, machine_mode, HOST_WIDE_INT);
+extern rtx widen_memory_access (rtx, machine_mode, poly_int64);
 
 extern void maybe_set_max_label_num (rtx_code_label *x);
 
Index: gcc/alias.c
===================================================================
--- gcc/alias.c	2017-10-23 17:01:52.303181137 +0100
+++ gcc/alias.c	2017-10-23 17:01:56.772809920 +0100
@@ -330,7 +330,7 @@ ao_ref_from_mem (ao_ref *ref, const_rtx
 
   /* If MEM_OFFSET/MEM_SIZE get us outside of ref->offset/ref->max_size
      drop ref->ref.  */
-  if (MEM_OFFSET (mem) < 0
+  if (may_lt (MEM_OFFSET (mem), 0)
       || (ref->max_size_known_p ()
 	  && may_gt ((MEM_OFFSET (mem) + MEM_SIZE (mem)) * BITS_PER_UNIT,
 		     ref->max_size)))
@@ -2329,12 +2329,15 @@ addr_side_effect_eval (rtx addr, int siz
    absolute value of the sizes as the actual sizes.  */
 
 static inline bool
-offset_overlap_p (HOST_WIDE_INT c, int xsize, int ysize)
+offset_overlap_p (poly_int64 c, poly_int64 xsize, poly_int64 ysize)
 {
-  return (xsize == 0 || ysize == 0
-	  || (c >= 0
-	      ? (abs (xsize) > c)
-	      : (abs (ysize) > -c)));
+  if (known_zero (xsize) || known_zero (ysize))
+    return true;
+
+  if (may_ge (c, 0))
+    return may_gt (may_lt (xsize, 0) ? -xsize : xsize, c);
+  else
+    return may_gt (may_lt (ysize, 0) ? -ysize : ysize, -c);
 }
 
 /* Return one if X and Y (memory addresses) reference the
@@ -2665,7 +2668,7 @@ decl_for_component_ref (tree x)
 
 static void
 adjust_offset_for_component_ref (tree x, bool *known_p,
-				 HOST_WIDE_INT *offset)
+				 poly_int64 *offset)
 {
   if (!*known_p)
     return;
@@ -2706,8 +2709,8 @@ nonoverlapping_memrefs_p (const_rtx x, c
   rtx rtlx, rtly;
   rtx basex, basey;
   bool moffsetx_known_p, moffsety_known_p;
-  HOST_WIDE_INT moffsetx = 0, moffsety = 0;
-  HOST_WIDE_INT offsetx = 0, offsety = 0, sizex, sizey;
+  poly_int64 moffsetx = 0, moffsety = 0;
+  poly_int64 offsetx = 0, offsety = 0, sizex, sizey;
 
   /* Unless both have exprs, we can't tell anything.  */
   if (exprx == 0 || expry == 0)
@@ -2809,12 +2812,10 @@ nonoverlapping_memrefs_p (const_rtx x, c
      we can avoid overlap is if we can deduce that they are nonoverlapping
      pieces of that decl, which is very rare.  */
   basex = MEM_P (rtlx) ? XEXP (rtlx, 0) : rtlx;
-  if (GET_CODE (basex) == PLUS && CONST_INT_P (XEXP (basex, 1)))
-    offsetx = INTVAL (XEXP (basex, 1)), basex = XEXP (basex, 0);
+  basex = strip_offset_and_add (basex, &offsetx);
 
   basey = MEM_P (rtly) ? XEXP (rtly, 0) : rtly;
-  if (GET_CODE (basey) == PLUS && CONST_INT_P (XEXP (basey, 1)))
-    offsety = INTVAL (XEXP (basey, 1)), basey = XEXP (basey, 0);
+  basey = strip_offset_and_add (basey, &offsety);
 
   /* If the bases are different, we know they do not overlap if both
      are constants or if one is a constant and the other a pointer into the
@@ -2835,10 +2836,10 @@ nonoverlapping_memrefs_p (const_rtx x, c
      declarations are necessarily different
     (i.e. compare_base_decls (exprx, expry) == -1)  */
 
-  sizex = (!MEM_P (rtlx) ? (int) GET_MODE_SIZE (GET_MODE (rtlx))
+  sizex = (!MEM_P (rtlx) ? poly_int64 (GET_MODE_SIZE (GET_MODE (rtlx)))
 	   : MEM_SIZE_KNOWN_P (rtlx) ? MEM_SIZE (rtlx)
 	   : -1);
-  sizey = (!MEM_P (rtly) ? (int) GET_MODE_SIZE (GET_MODE (rtly))
+  sizey = (!MEM_P (rtly) ? poly_int64 (GET_MODE_SIZE (GET_MODE (rtly)))
 	   : MEM_SIZE_KNOWN_P (rtly) ? MEM_SIZE (rtly)
 	   : -1);
 
@@ -2857,16 +2858,7 @@ nonoverlapping_memrefs_p (const_rtx x, c
   if (MEM_SIZE_KNOWN_P (y) && moffsety_known_p)
     sizey = MEM_SIZE (y);
 
-  /* Put the values of the memref with the lower offset in X's values.  */
-  if (offsetx > offsety)
-    {
-      std::swap (offsetx, offsety);
-      std::swap (sizex, sizey);
-    }
-
-  /* If we don't know the size of the lower-offset value, we can't tell
-     if they conflict.  Otherwise, we do the test.  */
-  return sizex >= 0 && offsety >= offsetx + sizex;
+  return !ranges_may_overlap_p (offsetx, sizex, offsety, sizey);
 }
 
 /* Helper for true_dependence and canon_true_dependence.
Index: gcc/cfgcleanup.c
===================================================================
--- gcc/cfgcleanup.c	2017-10-23 16:52:19.902212938 +0100
+++ gcc/cfgcleanup.c	2017-10-23 17:01:56.772809920 +0100
@@ -873,8 +873,6 @@ merge_memattrs (rtx x, rtx y)
 	MEM_ATTRS (x) = 0;
       else
 	{
-	  HOST_WIDE_INT mem_size;
-
 	  if (MEM_ALIAS_SET (x) != MEM_ALIAS_SET (y))
 	    {
 	      set_mem_alias_set (x, 0);
@@ -890,20 +888,23 @@ merge_memattrs (rtx x, rtx y)
 	    }
 	  else if (MEM_OFFSET_KNOWN_P (x) != MEM_OFFSET_KNOWN_P (y)
 		   || (MEM_OFFSET_KNOWN_P (x)
-		       && MEM_OFFSET (x) != MEM_OFFSET (y)))
+		       && may_ne (MEM_OFFSET (x), MEM_OFFSET (y))))
 	    {
 	      clear_mem_offset (x);
 	      clear_mem_offset (y);
 	    }
 
-	  if (MEM_SIZE_KNOWN_P (x) && MEM_SIZE_KNOWN_P (y))
-	    {
-	      mem_size = MAX (MEM_SIZE (x), MEM_SIZE (y));
-	      set_mem_size (x, mem_size);
-	      set_mem_size (y, mem_size);
-	    }
+	  if (!MEM_SIZE_KNOWN_P (x))
+	    clear_mem_size (y);
+	  else if (!MEM_SIZE_KNOWN_P (y))
+	    clear_mem_size (x);
+	  else if (must_le (MEM_SIZE (x), MEM_SIZE (y)))
+	    set_mem_size (x, MEM_SIZE (y));
+	  else if (must_le (MEM_SIZE (y), MEM_SIZE (x)))
+	    set_mem_size (y, MEM_SIZE (x));
 	  else
 	    {
+	      /* The sizes aren't ordered, so we can't merge them.  */
 	      clear_mem_size (x);
 	      clear_mem_size (y);
 	    }
Index: gcc/dce.c
===================================================================
--- gcc/dce.c	2017-10-23 16:52:19.902212938 +0100
+++ gcc/dce.c	2017-10-23 17:01:56.772809920 +0100
@@ -293,9 +293,8 @@ find_call_stack_args (rtx_call_insn *cal
       {
 	rtx mem = XEXP (XEXP (p, 0), 0), addr;
 	HOST_WIDE_INT off = 0, size;
-	if (!MEM_SIZE_KNOWN_P (mem))
+	if (!MEM_SIZE_KNOWN_P (mem) || !MEM_SIZE (mem).is_constant (&size))
 	  return false;
-	size = MEM_SIZE (mem);
 	addr = XEXP (mem, 0);
 	if (GET_CODE (addr) == PLUS
 	    && REG_P (XEXP (addr, 0))
@@ -360,7 +359,9 @@ find_call_stack_args (rtx_call_insn *cal
 	&& MEM_P (XEXP (XEXP (p, 0), 0)))
       {
 	rtx mem = XEXP (XEXP (p, 0), 0), addr;
-	HOST_WIDE_INT off = 0, byte;
+	HOST_WIDE_INT off = 0, byte, size;
+	/* Checked in the previous iteration.  */
+	size = MEM_SIZE (mem).to_constant ();
 	addr = XEXP (mem, 0);
 	if (GET_CODE (addr) == PLUS
 	    && REG_P (XEXP (addr, 0))
@@ -386,7 +387,7 @@ find_call_stack_args (rtx_call_insn *cal
 	    set = single_set (DF_REF_INSN (defs->ref));
 	    off += INTVAL (XEXP (SET_SRC (set), 1));
 	  }
-	for (byte = off; byte < off + MEM_SIZE (mem); byte++)
+	for (byte = off; byte < off + size; byte++)
 	  {
 	    if (!bitmap_set_bit (sp_bytes, byte - min_sp_off))
 	      gcc_unreachable ();
@@ -469,8 +470,10 @@ find_call_stack_args (rtx_call_insn *cal
 	    break;
 	}
 
+      HOST_WIDE_INT size;
       if (!MEM_SIZE_KNOWN_P (mem)
-	  || !check_argument_store (MEM_SIZE (mem), off, min_sp_off,
+	  || !MEM_SIZE (mem).is_constant (&size)
+	  || !check_argument_store (size, off, min_sp_off,
 				    max_sp_off, sp_bytes))
 	break;
 
Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2017-10-23 17:01:54.249406896 +0100
+++ gcc/dse.c	2017-10-23 17:01:56.773808497 +0100
@@ -1365,6 +1365,7 @@ record_store (rtx body, bb_info_t bb_inf
   /* At this point we know mem is a mem. */
   if (GET_MODE (mem) == BLKmode)
     {
+      HOST_WIDE_INT const_size;
       if (GET_CODE (XEXP (mem, 0)) == SCRATCH)
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
@@ -1376,8 +1377,11 @@ record_store (rtx body, bb_info_t bb_inf
       /* Handle (set (mem:BLK (addr) [... S36 ...]) (const_int 0))
 	 as memset (addr, 0, 36);  */
       else if (!MEM_SIZE_KNOWN_P (mem)
-	       || MEM_SIZE (mem) <= 0
-	       || MEM_SIZE (mem) > MAX_OFFSET
+	       || may_le (MEM_SIZE (mem), 0)
+	       /* This is a limit on the bitmap size, which is only relevant
+		  for constant-sized MEMs.  */
+	       || (MEM_SIZE (mem).is_constant (&const_size)
+		   && const_size > MAX_OFFSET)
 	       || GET_CODE (body) != SET
 	       || !CONST_INT_P (SET_SRC (body)))
 	{
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 17:01:45.056510879 +0100
+++ gcc/dwarf2out.c	2017-10-23 17:01:56.775805650 +0100
@@ -13754,7 +13754,7 @@ tls_mem_loc_descriptor (rtx mem)
   if (loc_result == NULL)
     return NULL;
 
-  if (MEM_OFFSET (mem))
+  if (maybe_nonzero (MEM_OFFSET (mem)))
     loc_descr_plus_const (&loc_result, MEM_OFFSET (mem));
 
   return loc_result;
@@ -16320,8 +16320,10 @@ dw_sra_loc_expr (tree decl, rtx loc)
 	     adjustment.  */
 	  if (MEM_P (varloc))
 	    {
-	      unsigned HOST_WIDE_INT memsize
-		= MEM_SIZE (varloc) * BITS_PER_UNIT;
+	      unsigned HOST_WIDE_INT memsize;
+	      if (!poly_uint64 (MEM_SIZE (varloc)).is_constant (&memsize))
+		goto discard_descr;
+	      memsize *= BITS_PER_UNIT;
 	      if (memsize != bitsize)
 		{
 		  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
Index: gcc/print-rtl.c
===================================================================
--- gcc/print-rtl.c	2017-10-23 17:01:43.314993320 +0100
+++ gcc/print-rtl.c	2017-10-23 17:01:56.777802803 +0100
@@ -884,10 +884,16 @@ rtx_writer::print_rtx (const_rtx in_rtx)
 	fputc (' ', m_outfile);
 
       if (MEM_OFFSET_KNOWN_P (in_rtx))
-	fprintf (m_outfile, "+" HOST_WIDE_INT_PRINT_DEC, MEM_OFFSET (in_rtx));
+	{
+	  fprintf (m_outfile, "+");
+	  print_poly_int (m_outfile, MEM_OFFSET (in_rtx));
+	}
 
       if (MEM_SIZE_KNOWN_P (in_rtx))
-	fprintf (m_outfile, " S" HOST_WIDE_INT_PRINT_DEC, MEM_SIZE (in_rtx));
+	{
+	  fprintf (m_outfile, " S");
+	  print_poly_int (m_outfile, MEM_SIZE (in_rtx));
+	}
 
       if (MEM_ALIGN (in_rtx) != 1)
 	fprintf (m_outfile, " A%u", MEM_ALIGN (in_rtx));
Index: gcc/read-rtl-function.c
===================================================================
--- gcc/read-rtl-function.c	2017-10-23 16:52:19.902212938 +0100
+++ gcc/read-rtl-function.c	2017-10-23 17:01:56.777802803 +0100
@@ -2143,9 +2143,9 @@ test_loading_mem ()
   ASSERT_EQ (42, MEM_ALIAS_SET (mem1));
   /* "+17".  */
   ASSERT_TRUE (MEM_OFFSET_KNOWN_P (mem1));
-  ASSERT_EQ (17, MEM_OFFSET (mem1));
+  ASSERT_MUST_EQ (17, MEM_OFFSET (mem1));
   /* "S8".  */
-  ASSERT_EQ (8, MEM_SIZE (mem1));
+  ASSERT_MUST_EQ (8, MEM_SIZE (mem1));
   /* "A128.  */
   ASSERT_EQ (128, MEM_ALIGN (mem1));
   /* "AS5.  */
@@ -2159,9 +2159,9 @@ test_loading_mem ()
   ASSERT_EQ (43, MEM_ALIAS_SET (mem2));
   /* "+18".  */
   ASSERT_TRUE (MEM_OFFSET_KNOWN_P (mem2));
-  ASSERT_EQ (18, MEM_OFFSET (mem2));
+  ASSERT_MUST_EQ (18, MEM_OFFSET (mem2));
   /* "S9".  */
-  ASSERT_EQ (9, MEM_SIZE (mem2));
+  ASSERT_MUST_EQ (9, MEM_SIZE (mem2));
   /* "AS6.  */
   ASSERT_EQ (6, MEM_ADDR_SPACE (mem2));
 }
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:01:55.453690255 +0100
+++ gcc/rtlanal.c	2017-10-23 17:01:56.778801380 +0100
@@ -2796,7 +2796,7 @@ may_trap_p_1 (const_rtx x, unsigned flag
 	  code_changed
 	  || !MEM_NOTRAP_P (x))
 	{
-	  HOST_WIDE_INT size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1;
+	  poly_int64 size = MEM_SIZE_KNOWN_P (x) ? MEM_SIZE (x) : -1;
 	  return rtx_addr_can_trap_p_1 (XEXP (x, 0), 0, size,
 					GET_MODE (x), code_changed);
 	}
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 17:00:54.445000329 +0100
+++ gcc/simplify-rtx.c	2017-10-23 17:01:56.778801380 +0100
@@ -289,7 +289,7 @@ delegitimize_mem_from_attrs (rtx x)
     {
       tree decl = MEM_EXPR (x);
       machine_mode mode = GET_MODE (x);
-      HOST_WIDE_INT offset = 0;
+      poly_int64 offset = 0;
 
       switch (TREE_CODE (decl))
 	{
@@ -346,6 +346,7 @@ delegitimize_mem_from_attrs (rtx x)
 	  if (MEM_P (newx))
 	    {
 	      rtx n = XEXP (newx, 0), o = XEXP (x, 0);
+	      poly_int64 n_offset, o_offset;
 
 	      /* Avoid creating a new MEM needlessly if we already had
 		 the same address.  We do if there's no OFFSET and the
@@ -353,21 +354,14 @@ delegitimize_mem_from_attrs (rtx x)
 		 form (plus NEWX OFFSET), or the NEWX is of the form
 		 (plus Y (const_int Z)) and X is that with the offset
 		 added: (plus Y (const_int Z+OFFSET)).  */
-	      if (!((offset == 0
-		     || (GET_CODE (o) == PLUS
-			 && GET_CODE (XEXP (o, 1)) == CONST_INT
-			 && (offset == INTVAL (XEXP (o, 1))
-			     || (GET_CODE (n) == PLUS
-				 && GET_CODE (XEXP (n, 1)) == CONST_INT
-				 && (INTVAL (XEXP (n, 1)) + offset
-				     == INTVAL (XEXP (o, 1)))
-				 && (n = XEXP (n, 0))))
-			 && (o = XEXP (o, 0))))
+	      n = strip_offset (n, &n_offset);
+	      o = strip_offset (o, &o_offset);
+	      if (!(must_eq (o_offset, n_offset + offset)
 		    && rtx_equal_p (o, n)))
 		x = adjust_address_nv (newx, mode, offset);
 	    }
 	  else if (GET_MODE (x) == GET_MODE (newx)
-		   && offset == 0)
+		   && known_zero (offset))
 	    x = newx;
 	}
     }
Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c	2017-10-23 17:01:43.315991896 +0100
+++ gcc/var-tracking.c	2017-10-23 17:01:56.779799956 +0100
@@ -395,8 +395,9 @@ #define VTI(BB) ((variable_tracking_info
 static inline HOST_WIDE_INT
 int_mem_offset (const_rtx mem)
 {
-  if (MEM_OFFSET_KNOWN_P (mem))
-    return MEM_OFFSET (mem);
+  HOST_WIDE_INT offset;
+  if (MEM_OFFSET_KNOWN_P (mem) && MEM_OFFSET (mem).is_constant (&offset))
+    return offset;
   return 0;
 }
 
@@ -5256,7 +5257,7 @@ track_expr_p (tree expr, bool need_rtl)
 	  && !tracked_record_parameter_p (realdecl))
 	return 0;
       if (MEM_SIZE_KNOWN_P (decl_rtl)
-	  && MEM_SIZE (decl_rtl) > MAX_VAR_PARTS)
+	  && may_gt (MEM_SIZE (decl_rtl), MAX_VAR_PARTS))
 	return 0;
     }
 
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 17:01:43.313994743 +0100
+++ gcc/emit-rtl.c	2017-10-23 17:01:56.776804226 +0100
@@ -386,9 +386,9 @@ mem_attrs_eq_p (const struct mem_attrs *
     return false;
   return (p->alias == q->alias
 	  && p->offset_known_p == q->offset_known_p
-	  && (!p->offset_known_p || p->offset == q->offset)
+	  && (!p->offset_known_p || must_eq (p->offset, q->offset))
 	  && p->size_known_p == q->size_known_p
-	  && (!p->size_known_p || p->size == q->size)
+	  && (!p->size_known_p || must_eq (p->size, q->size))
 	  && p->align == q->align
 	  && p->addrspace == q->addrspace
 	  && (p->expr == q->expr
@@ -1789,6 +1789,17 @@ operand_subword_force (rtx op, unsigned
   return result;
 }
 \f
+mem_attrs::mem_attrs ()
+  : expr (NULL_TREE),
+    offset (0),
+    size (0),
+    alias (0),
+    align (0),
+    addrspace (ADDR_SPACE_GENERIC),
+    offset_known_p (false),
+    size_known_p (false)
+{}
+
 /* Returns 1 if both MEM_EXPR can be considered equal
    and 0 otherwise.  */
 
@@ -1815,7 +1826,7 @@ mem_expr_equal_p (const_tree expr1, cons
 get_mem_align_offset (rtx mem, unsigned int align)
 {
   tree expr;
-  unsigned HOST_WIDE_INT offset;
+  poly_uint64 offset;
 
   /* This function can't use
      if (!MEM_EXPR (mem) || !MEM_OFFSET_KNOWN_P (mem)
@@ -1857,12 +1868,13 @@ get_mem_align_offset (rtx mem, unsigned
 	  tree byte_offset = component_ref_field_offset (expr);
 	  tree bit_offset = DECL_FIELD_BIT_OFFSET (field);
 
+	  poly_uint64 suboffset;
 	  if (!byte_offset
-	      || !tree_fits_uhwi_p (byte_offset)
+	      || !poly_int_tree_p (byte_offset, &suboffset)
 	      || !tree_fits_uhwi_p (bit_offset))
 	    return -1;
 
-	  offset += tree_to_uhwi (byte_offset);
+	  offset += suboffset;
 	  offset += tree_to_uhwi (bit_offset) / BITS_PER_UNIT;
 
 	  if (inner == NULL_TREE)
@@ -1886,7 +1898,10 @@ get_mem_align_offset (rtx mem, unsigned
   else
     return -1;
 
-  return offset & ((align / BITS_PER_UNIT) - 1);
+  HOST_WIDE_INT misalign;
+  if (!known_misalignment (offset, align / BITS_PER_UNIT, &misalign))
+    return -1;
+  return misalign;
 }
 
 /* Given REF (a MEM) and T, either the type of X or the expression
@@ -1896,9 +1911,9 @@ get_mem_align_offset (rtx mem, unsigned
 
 void
 set_mem_attributes_minus_bitpos (rtx ref, tree t, int objectp,
-				 HOST_WIDE_INT bitpos)
+				 poly_int64 bitpos)
 {
-  HOST_WIDE_INT apply_bitpos = 0;
+  poly_int64 apply_bitpos = 0;
   tree type;
   struct mem_attrs attrs, *defattrs, *refattrs;
   addr_space_t as;
@@ -1919,8 +1934,6 @@ set_mem_attributes_minus_bitpos (rtx ref
      set_mem_attributes.  */
   gcc_assert (!DECL_P (t) || ref != DECL_RTL_IF_SET (t));
 
-  memset (&attrs, 0, sizeof (attrs));
-
   /* Get the alias set from the expression or type (perhaps using a
      front-end routine) and use it.  */
   attrs.alias = get_alias_set (t);
@@ -2090,10 +2103,9 @@ set_mem_attributes_minus_bitpos (rtx ref
 	    {
 	      attrs.expr = t2;
 	      attrs.offset_known_p = false;
-	      if (tree_fits_uhwi_p (off_tree))
+	      if (poly_int_tree_p (off_tree, &attrs.offset))
 		{
 		  attrs.offset_known_p = true;
-		  attrs.offset = tree_to_uhwi (off_tree);
 		  apply_bitpos = bitpos;
 		}
 	    }
@@ -2114,27 +2126,29 @@ set_mem_attributes_minus_bitpos (rtx ref
       unsigned int obj_align;
       unsigned HOST_WIDE_INT obj_bitpos;
       get_object_alignment_1 (t, &obj_align, &obj_bitpos);
-      obj_bitpos = (obj_bitpos - bitpos) & (obj_align - 1);
-      if (obj_bitpos != 0)
-	obj_align = least_bit_hwi (obj_bitpos);
+      unsigned int diff_align = known_alignment (obj_bitpos - bitpos);
+      if (diff_align != 0)
+	obj_align = MIN (obj_align, diff_align);
       attrs.align = MAX (attrs.align, obj_align);
     }
 
-  if (tree_fits_uhwi_p (new_size))
+  poly_uint64 const_size;
+  if (poly_int_tree_p (new_size, &const_size))
     {
       attrs.size_known_p = true;
-      attrs.size = tree_to_uhwi (new_size);
+      attrs.size = const_size;
     }
 
   /* If we modified OFFSET based on T, then subtract the outstanding
      bit position offset.  Similarly, increase the size of the accessed
      object to contain the negative offset.  */
-  if (apply_bitpos)
+  if (maybe_nonzero (apply_bitpos))
     {
       gcc_assert (attrs.offset_known_p);
-      attrs.offset -= apply_bitpos / BITS_PER_UNIT;
+      poly_int64 bytepos = bits_to_bytes_round_down (apply_bitpos);
+      attrs.offset -= bytepos;
       if (attrs.size_known_p)
-	attrs.size += apply_bitpos / BITS_PER_UNIT;
+	attrs.size += bytepos;
     }
 
   /* Now set the attributes we computed above.  */
@@ -2153,11 +2167,9 @@ set_mem_attributes (rtx ref, tree t, int
 void
 set_mem_alias_set (rtx mem, alias_set_type set)
 {
-  struct mem_attrs attrs;
-
   /* If the new and old alias sets don't conflict, something is wrong.  */
   gcc_checking_assert (alias_sets_conflict_p (set, MEM_ALIAS_SET (mem)));
-  attrs = *get_mem_attrs (mem);
+  mem_attrs attrs (*get_mem_attrs (mem));
   attrs.alias = set;
   set_mem_attrs (mem, &attrs);
 }
@@ -2167,9 +2179,7 @@ set_mem_alias_set (rtx mem, alias_set_ty
 void
 set_mem_addr_space (rtx mem, addr_space_t addrspace)
 {
-  struct mem_attrs attrs;
-
-  attrs = *get_mem_attrs (mem);
+  mem_attrs attrs (*get_mem_attrs (mem));
   attrs.addrspace = addrspace;
   set_mem_attrs (mem, &attrs);
 }
@@ -2179,9 +2189,7 @@ set_mem_addr_space (rtx mem, addr_space_
 void
 set_mem_align (rtx mem, unsigned int align)
 {
-  struct mem_attrs attrs;
-
-  attrs = *get_mem_attrs (mem);
+  mem_attrs attrs (*get_mem_attrs (mem));
   attrs.align = align;
   set_mem_attrs (mem, &attrs);
 }
@@ -2191,9 +2199,7 @@ set_mem_align (rtx mem, unsigned int ali
 void
 set_mem_expr (rtx mem, tree expr)
 {
-  struct mem_attrs attrs;
-
-  attrs = *get_mem_attrs (mem);
+  mem_attrs attrs (*get_mem_attrs (mem));
   attrs.expr = expr;
   set_mem_attrs (mem, &attrs);
 }
@@ -2201,11 +2207,9 @@ set_mem_expr (rtx mem, tree expr)
 /* Set the offset of MEM to OFFSET.  */
 
 void
-set_mem_offset (rtx mem, HOST_WIDE_INT offset)
+set_mem_offset (rtx mem, poly_int64 offset)
 {
-  struct mem_attrs attrs;
-
-  attrs = *get_mem_attrs (mem);
+  mem_attrs attrs (*get_mem_attrs (mem));
   attrs.offset_known_p = true;
   attrs.offset = offset;
   set_mem_attrs (mem, &attrs);
@@ -2216,9 +2220,7 @@ set_mem_offset (rtx mem, HOST_WIDE_INT o
 void
 clear_mem_offset (rtx mem)
 {
-  struct mem_attrs attrs;
-
-  attrs = *get_mem_attrs (mem);
+  mem_attrs attrs (*get_mem_attrs (mem));
   attrs.offset_known_p = false;
   set_mem_attrs (mem, &attrs);
 }
@@ -2226,11 +2228,9 @@ clear_mem_offset (rtx mem)
 /* Set the size of MEM to SIZE.  */
 
 void
-set_mem_size (rtx mem, HOST_WIDE_INT size)
+set_mem_size (rtx mem, poly_int64 size)
 {
-  struct mem_attrs attrs;
-
-  attrs = *get_mem_attrs (mem);
+  mem_attrs attrs (*get_mem_attrs (mem));
   attrs.size_known_p = true;
   attrs.size = size;
   set_mem_attrs (mem, &attrs);
@@ -2241,9 +2241,7 @@ set_mem_size (rtx mem, HOST_WIDE_INT siz
 void
 clear_mem_size (rtx mem)
 {
-  struct mem_attrs attrs;
-
-  attrs = *get_mem_attrs (mem);
+  mem_attrs attrs (*get_mem_attrs (mem));
   attrs.size_known_p = false;
   set_mem_attrs (mem, &attrs);
 }
@@ -2306,9 +2304,9 @@ change_address (rtx memref, machine_mode
 {
   rtx new_rtx = change_address_1 (memref, mode, addr, 1, false);
   machine_mode mmode = GET_MODE (new_rtx);
-  struct mem_attrs attrs, *defattrs;
+  struct mem_attrs *defattrs;
 
-  attrs = *get_mem_attrs (memref);
+  mem_attrs attrs (*get_mem_attrs (memref));
   defattrs = mode_mem_attrs[(int) mmode];
   attrs.expr = NULL_TREE;
   attrs.offset_known_p = false;
@@ -2343,15 +2341,14 @@ change_address (rtx memref, machine_mode
    has no inherent size.  */
 
 rtx
-adjust_address_1 (rtx memref, machine_mode mode, HOST_WIDE_INT offset,
+adjust_address_1 (rtx memref, machine_mode mode, poly_int64 offset,
 		  int validate, int adjust_address, int adjust_object,
-		  HOST_WIDE_INT size)
+		  poly_int64 size)
 {
   rtx addr = XEXP (memref, 0);
   rtx new_rtx;
   scalar_int_mode address_mode;
-  int pbits;
-  struct mem_attrs attrs = *get_mem_attrs (memref), *defattrs;
+  struct mem_attrs attrs (*get_mem_attrs (memref)), *defattrs;
   unsigned HOST_WIDE_INT max_align;
 #ifdef POINTERS_EXTEND_UNSIGNED
   scalar_int_mode pointer_mode
@@ -2368,8 +2365,10 @@ adjust_address_1 (rtx memref, machine_mo
     size = defattrs->size;
 
   /* If there are no changes, just return the original memory reference.  */
-  if (mode == GET_MODE (memref) && !offset
-      && (size == 0 || (attrs.size_known_p && attrs.size == size))
+  if (mode == GET_MODE (memref)
+      && known_zero (offset)
+      && (known_zero (size)
+	  || (attrs.size_known_p && must_eq (attrs.size, size)))
       && (!validate || memory_address_addr_space_p (mode, addr,
 						    attrs.addrspace)))
     return memref;
@@ -2382,22 +2381,17 @@ adjust_address_1 (rtx memref, machine_mo
   /* Convert a possibly large offset to a signed value within the
      range of the target address space.  */
   address_mode = get_address_mode (memref);
-  pbits = GET_MODE_BITSIZE (address_mode);
-  if (HOST_BITS_PER_WIDE_INT > pbits)
-    {
-      int shift = HOST_BITS_PER_WIDE_INT - pbits;
-      offset = (((HOST_WIDE_INT) ((unsigned HOST_WIDE_INT) offset << shift))
-		>> shift);
-    }
+  offset = trunc_int_for_mode (offset, address_mode);
 
   if (adjust_address)
     {
       /* If MEMREF is a LO_SUM and the offset is within the alignment of the
 	 object, we can merge it into the LO_SUM.  */
-      if (GET_MODE (memref) != BLKmode && GET_CODE (addr) == LO_SUM
-	  && offset >= 0
-	  && (unsigned HOST_WIDE_INT) offset
-	      < GET_MODE_ALIGNMENT (GET_MODE (memref)) / BITS_PER_UNIT)
+      if (GET_MODE (memref) != BLKmode
+	  && GET_CODE (addr) == LO_SUM
+	  && known_in_range_p (offset,
+			       0, (GET_MODE_ALIGNMENT (GET_MODE (memref))
+				   / BITS_PER_UNIT)))
 	addr = gen_rtx_LO_SUM (address_mode, XEXP (addr, 0),
 			       plus_constant (address_mode,
 					      XEXP (addr, 1), offset));
@@ -2408,7 +2402,7 @@ adjust_address_1 (rtx memref, machine_mo
       else if (POINTERS_EXTEND_UNSIGNED > 0
 	       && GET_CODE (addr) == ZERO_EXTEND
 	       && GET_MODE (XEXP (addr, 0)) == pointer_mode
-	       && trunc_int_for_mode (offset, pointer_mode) == offset)
+	       && must_eq (trunc_int_for_mode (offset, pointer_mode), offset))
 	addr = gen_rtx_ZERO_EXTEND (address_mode,
 				    plus_constant (pointer_mode,
 						   XEXP (addr, 0), offset));
@@ -2421,7 +2415,7 @@ adjust_address_1 (rtx memref, machine_mo
 
   /* If the address is a REG, change_address_1 rightfully returns memref,
      but this would destroy memref's MEM_ATTRS.  */
-  if (new_rtx == memref && offset != 0)
+  if (new_rtx == memref && maybe_nonzero (offset))
     new_rtx = copy_rtx (new_rtx);
 
   /* Conservatively drop the object if we don't know where we start from.  */
@@ -2438,7 +2432,7 @@ adjust_address_1 (rtx memref, machine_mo
       attrs.offset += offset;
 
       /* Drop the object if the new left end is not within its bounds.  */
-      if (adjust_object && attrs.offset < 0)
+      if (adjust_object && may_lt (attrs.offset, 0))
 	{
 	  attrs.expr = NULL_TREE;
 	  attrs.alias = 0;
@@ -2448,16 +2442,16 @@ adjust_address_1 (rtx memref, machine_mo
   /* Compute the new alignment by taking the MIN of the alignment and the
      lowest-order set bit in OFFSET, but don't change the alignment if OFFSET
      if zero.  */
-  if (offset != 0)
+  if (maybe_nonzero (offset))
     {
-      max_align = least_bit_hwi (offset) * BITS_PER_UNIT;
+      max_align = known_alignment (offset) * BITS_PER_UNIT;
       attrs.align = MIN (attrs.align, max_align);
     }
 
-  if (size)
+  if (maybe_nonzero (size))
     {
       /* Drop the object if the new right end is not within its bounds.  */
-      if (adjust_object && (offset + size) > attrs.size)
+      if (adjust_object && may_gt (offset + size, attrs.size))
 	{
 	  attrs.expr = NULL_TREE;
 	  attrs.alias = 0;
@@ -2485,7 +2479,7 @@ adjust_address_1 (rtx memref, machine_mo
 
 rtx
 adjust_automodify_address_1 (rtx memref, machine_mode mode, rtx addr,
-			     HOST_WIDE_INT offset, int validate)
+			     poly_int64 offset, int validate)
 {
   memref = change_address_1 (memref, VOIDmode, addr, validate, false);
   return adjust_address_1 (memref, mode, offset, validate, 0, 0, 0);
@@ -2500,9 +2494,9 @@ offset_address (rtx memref, rtx offset,
 {
   rtx new_rtx, addr = XEXP (memref, 0);
   machine_mode address_mode;
-  struct mem_attrs attrs, *defattrs;
+  struct mem_attrs *defattrs;
 
-  attrs = *get_mem_attrs (memref);
+  mem_attrs attrs (*get_mem_attrs (memref));
   address_mode = get_address_mode (memref);
   new_rtx = simplify_gen_binary (PLUS, address_mode, addr, offset);
 
@@ -2570,17 +2564,16 @@ replace_equiv_address_nv (rtx memref, rt
    operations plus masking logic.  */
 
 rtx
-widen_memory_access (rtx memref, machine_mode mode, HOST_WIDE_INT offset)
+widen_memory_access (rtx memref, machine_mode mode, poly_int64 offset)
 {
   rtx new_rtx = adjust_address_1 (memref, mode, offset, 1, 1, 0, 0);
-  struct mem_attrs attrs;
   unsigned int size = GET_MODE_SIZE (mode);
 
   /* If there are no changes, just return the original memory reference.  */
   if (new_rtx == memref)
     return new_rtx;
 
-  attrs = *get_mem_attrs (new_rtx);
+  mem_attrs attrs (*get_mem_attrs (new_rtx));
 
   /* If we don't know what offset we were at within the expression, then
      we can't know if we've overstepped the bounds.  */
@@ -2602,28 +2595,30 @@ widen_memory_access (rtx memref, machine
 
 	  /* Is the field at least as large as the access?  If so, ok,
 	     otherwise strip back to the containing structure.  */
-	  if (TREE_CODE (DECL_SIZE_UNIT (field)) == INTEGER_CST
-	      && compare_tree_int (DECL_SIZE_UNIT (field), size) >= 0
-	      && attrs.offset >= 0)
+	  if (poly_int_tree_p (DECL_SIZE_UNIT (field))
+	      && must_ge (wi::to_poly_offset (DECL_SIZE_UNIT (field)), size)
+	      && must_ge (attrs.offset, 0))
 	    break;
 
-	  if (! tree_fits_uhwi_p (offset))
+	  poly_uint64 suboffset;
+	  if (!poly_int_tree_p (offset, &suboffset))
 	    {
 	      attrs.expr = NULL_TREE;
 	      break;
 	    }
 
 	  attrs.expr = TREE_OPERAND (attrs.expr, 0);
-	  attrs.offset += tree_to_uhwi (offset);
+	  attrs.offset += suboffset;
 	  attrs.offset += (tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field))
 			   / BITS_PER_UNIT);
 	}
       /* Similarly for the decl.  */
       else if (DECL_P (attrs.expr)
 	       && DECL_SIZE_UNIT (attrs.expr)
-	       && TREE_CODE (DECL_SIZE_UNIT (attrs.expr)) == INTEGER_CST
-	       && compare_tree_int (DECL_SIZE_UNIT (attrs.expr), size) >= 0
-	       && (! attrs.offset_known_p || attrs.offset >= 0))
+	       && poly_int_tree_p (DECL_SIZE_UNIT (attrs.expr))
+	       && must_ge (wi::to_poly_offset (DECL_SIZE_UNIT (attrs.expr)),
+			   size)
+	       && must_ge (attrs.offset, 0))
 	break;
       else
 	{
@@ -2654,7 +2649,6 @@ get_spill_slot_decl (bool force_build_p)
 {
   tree d = spill_slot_decl;
   rtx rd;
-  struct mem_attrs attrs;
 
   if (d || !force_build_p)
     return d;
@@ -2668,7 +2662,7 @@ get_spill_slot_decl (bool force_build_p)
 
   rd = gen_rtx_MEM (BLKmode, frame_pointer_rtx);
   MEM_NOTRAP_P (rd) = 1;
-  attrs = *mode_mem_attrs[(int) BLKmode];
+  mem_attrs attrs (*mode_mem_attrs[(int) BLKmode]);
   attrs.alias = new_alias_set ();
   attrs.expr = d;
   set_mem_attrs (rd, &attrs);
@@ -2686,10 +2680,9 @@ get_spill_slot_decl (bool force_build_p)
 void
 set_mem_attrs_for_spill (rtx mem)
 {
-  struct mem_attrs attrs;
   rtx addr;
 
-  attrs = *get_mem_attrs (mem);
+  mem_attrs attrs (*get_mem_attrs (mem));
   attrs.expr = get_spill_slot_decl (true);
   attrs.alias = MEM_ALIAS_SET (DECL_RTL (attrs.expr));
   attrs.addrspace = ADDR_SPACE_GENERIC;
@@ -2699,10 +2692,7 @@ set_mem_attrs_for_spill (rtx mem)
      with perhaps the plus missing for offset = 0.  */
   addr = XEXP (mem, 0);
   attrs.offset_known_p = true;
-  attrs.offset = 0;
-  if (GET_CODE (addr) == PLUS
-      && CONST_INT_P (XEXP (addr, 1)))
-    attrs.offset = INTVAL (XEXP (addr, 1));
+  strip_offset (addr, &attrs.offset);
 
   set_mem_attrs (mem, &attrs);
   MEM_NOTRAP_P (mem) = 1;

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [020/nnn] poly_int: store_bit_field bitrange
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (16 preceding siblings ...)
  2017-10-23 17:07 ` [017/nnn] poly_int: rtx_addr_can_trap_p_1 Richard Sandiford
@ 2017-10-23 17:08 ` Richard Sandiford
  2017-12-05 23:43   ` Jeff Law
  2017-10-23 17:08 ` [019/nnn] poly_int: lra frame offsets Richard Sandiford
                   ` (89 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:08 UTC (permalink / raw)
  To: gcc-patches

This patch changes the bitnum and bitsize arguments to
store_bit_field from unsigned HOST_WIDE_INTs to poly_uint64s.
The later part of store_bit_field_1 still needs to operate
on constant bit positions and sizes, so the patch splits
it out into a subfunction (store_integral_bit_field).


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expmed.h (store_bit_field): Take bitsize and bitnum as
	poly_uint64s rather than unsigned HOST_WIDE_INTs.
	* expmed.c (simple_mem_bitfield_p): Likewise.  Add a parameter
	that returns the byte size.
	(store_bit_field_1): Take bitsize and bitnum as
	poly_uint64s rather than unsigned HOST_WIDE_INTs.  Update call
	to simple_mem_bitfield_p.  Split the part that can only handle
	constant bitsize and bitnum out into...
	(store_integral_bit_field): ...this new function.
	(store_bit_field): Take bitsize and bitnum as poly_uint64s rather
	than unsigned HOST_WIDE_INTs.
	(extract_bit_field_1): Update call to simple_mem_bitfield_p.

Index: gcc/expmed.h
===================================================================
--- gcc/expmed.h	2017-10-23 17:00:54.441003964 +0100
+++ gcc/expmed.h	2017-10-23 17:02:01.542011677 +0100
@@ -718,8 +718,7 @@ extern rtx expand_divmod (int, enum tree
 			  rtx, int);
 #endif
 
-extern void store_bit_field (rtx, unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
+extern void store_bit_field (rtx, poly_uint64, poly_uint64,
 			     unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
 			     machine_mode, rtx, bool);
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 17:00:57.771973825 +0100
+++ gcc/expmed.c	2017-10-23 17:02:01.542011677 +0100
@@ -46,6 +46,12 @@ struct target_expmed default_target_expm
 struct target_expmed *this_target_expmed = &default_target_expmed;
 #endif
 
+static bool store_integral_bit_field (rtx, opt_scalar_int_mode,
+				      unsigned HOST_WIDE_INT,
+				      unsigned HOST_WIDE_INT,
+				      unsigned HOST_WIDE_INT,
+				      unsigned HOST_WIDE_INT,
+				      machine_mode, rtx, bool, bool);
 static void store_fixed_bit_field (rtx, opt_scalar_int_mode,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
@@ -562,17 +568,18 @@ strict_volatile_bitfield_p (rtx op0, uns
 }
 
 /* Return true if OP is a memory and if a bitfield of size BITSIZE at
-   bit number BITNUM can be treated as a simple value of mode MODE.  */
+   bit number BITNUM can be treated as a simple value of mode MODE.
+   Store the byte offset in *BYTENUM if so.  */
 
 static bool
-simple_mem_bitfield_p (rtx op0, unsigned HOST_WIDE_INT bitsize,
-		       unsigned HOST_WIDE_INT bitnum, machine_mode mode)
+simple_mem_bitfield_p (rtx op0, poly_uint64 bitsize, poly_uint64 bitnum,
+		       machine_mode mode, poly_uint64 *bytenum)
 {
   return (MEM_P (op0)
-	  && bitnum % BITS_PER_UNIT == 0
-	  && bitsize == GET_MODE_BITSIZE (mode)
+	  && multiple_p (bitnum, BITS_PER_UNIT, bytenum)
+	  && must_eq (bitsize, GET_MODE_BITSIZE (mode))
 	  && (!targetm.slow_unaligned_access (mode, MEM_ALIGN (op0))
-	      || (bitnum % GET_MODE_ALIGNMENT (mode) == 0
+	      || (multiple_p (bitnum, GET_MODE_ALIGNMENT (mode))
 		  && MEM_ALIGN (op0) >= GET_MODE_ALIGNMENT (mode))));
 }
 \f
@@ -717,15 +724,13 @@ store_bit_field_using_insv (const extrac
    return false instead.  */
 
 static bool
-store_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		   unsigned HOST_WIDE_INT bitnum,
+store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 		   unsigned HOST_WIDE_INT bitregion_start,
 		   unsigned HOST_WIDE_INT bitregion_end,
 		   machine_mode fieldmode,
 		   rtx value, bool reverse, bool fallback_p)
 {
   rtx op0 = str_rtx;
-  rtx orig_value;
 
   while (GET_CODE (op0) == SUBREG)
     {
@@ -736,23 +741,23 @@ store_bit_field_1 (rtx str_rtx, unsigned
   /* No action is needed if the target is a register and if the field
      lies completely outside that register.  This can occur if the source
      code contains an out-of-bounds access to a small array.  */
-  if (REG_P (op0) && bitnum >= GET_MODE_BITSIZE (GET_MODE (op0)))
+  if (REG_P (op0) && must_ge (bitnum, GET_MODE_BITSIZE (GET_MODE (op0))))
     return true;
 
   /* Use vec_set patterns for inserting parts of vectors whenever
      available.  */
   machine_mode outermode = GET_MODE (op0);
   scalar_mode innermode = GET_MODE_INNER (outermode);
+  poly_uint64 pos;
   if (VECTOR_MODE_P (outermode)
       && !MEM_P (op0)
       && optab_handler (vec_set_optab, outermode) != CODE_FOR_nothing
       && fieldmode == innermode
-      && bitsize == GET_MODE_BITSIZE (innermode)
-      && !(bitnum % GET_MODE_BITSIZE (innermode)))
+      && must_eq (bitsize, GET_MODE_BITSIZE (innermode))
+      && multiple_p (bitnum, GET_MODE_BITSIZE (innermode), &pos))
     {
       struct expand_operand ops[3];
       enum insn_code icode = optab_handler (vec_set_optab, outermode);
-      int pos = bitnum / GET_MODE_BITSIZE (innermode);
 
       create_fixed_operand (&ops[0], op0);
       create_input_operand (&ops[1], value, innermode);
@@ -764,16 +769,16 @@ store_bit_field_1 (rtx str_rtx, unsigned
   /* If the target is a register, overwriting the entire object, or storing
      a full-word or multi-word field can be done with just a SUBREG.  */
   if (!MEM_P (op0)
-      && bitsize == GET_MODE_BITSIZE (fieldmode)
-      && ((bitsize == GET_MODE_BITSIZE (GET_MODE (op0)) && bitnum == 0)
-	  || (bitsize % BITS_PER_WORD == 0 && bitnum % BITS_PER_WORD == 0)))
+      && must_eq (bitsize, GET_MODE_BITSIZE (fieldmode)))
     {
       /* Use the subreg machinery either to narrow OP0 to the required
 	 words or to cope with mode punning between equal-sized modes.
 	 In the latter case, use subreg on the rhs side, not lhs.  */
       rtx sub;
-
-      if (bitsize == GET_MODE_BITSIZE (GET_MODE (op0)))
+      HOST_WIDE_INT regnum;
+      HOST_WIDE_INT regsize = REGMODE_NATURAL_SIZE (GET_MODE (op0));
+      if (known_zero (bitnum)
+	  && must_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (op0))))
 	{
 	  sub = simplify_gen_subreg (GET_MODE (op0), value, fieldmode, 0);
 	  if (sub)
@@ -784,10 +789,11 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	      return true;
 	    }
 	}
-      else
+      else if (constant_multiple_p (bitnum, regsize * BITS_PER_UNIT, &regnum)
+	       && multiple_p (bitsize, regsize * BITS_PER_UNIT))
 	{
 	  sub = simplify_gen_subreg (fieldmode, op0, GET_MODE (op0),
-				     bitnum / BITS_PER_UNIT);
+				     regnum * regsize);
 	  if (sub)
 	    {
 	      if (reverse)
@@ -801,15 +807,23 @@ store_bit_field_1 (rtx str_rtx, unsigned
   /* If the target is memory, storing any naturally aligned field can be
      done with a simple store.  For targets that support fast unaligned
      memory, any naturally sized, unit aligned field can be done directly.  */
-  if (simple_mem_bitfield_p (op0, bitsize, bitnum, fieldmode))
+  poly_uint64 bytenum;
+  if (simple_mem_bitfield_p (op0, bitsize, bitnum, fieldmode, &bytenum))
     {
-      op0 = adjust_bitfield_address (op0, fieldmode, bitnum / BITS_PER_UNIT);
+      op0 = adjust_bitfield_address (op0, fieldmode, bytenum);
       if (reverse)
 	value = flip_storage_order (fieldmode, value);
       emit_move_insn (op0, value);
       return true;
     }
 
+  /* It's possible we'll need to handle other cases here for
+     polynomial bitnum and bitsize.  */
+
+  /* From here on we need to be looking at a fixed-size insertion.  */
+  unsigned HOST_WIDE_INT ibitsize = bitsize.to_constant ();
+  unsigned HOST_WIDE_INT ibitnum = bitnum.to_constant ();
+
   /* Make sure we are playing with integral modes.  Pun with subregs
      if we aren't.  This must come after the entire register case above,
      since that case is valid for any mode.  The following cases are only
@@ -825,12 +839,31 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	op0 = gen_lowpart (op0_mode.require (), op0);
     }
 
+  return store_integral_bit_field (op0, op0_mode, ibitsize, ibitnum,
+				   bitregion_start, bitregion_end,
+				   fieldmode, value, reverse, fallback_p);
+}
+
+/* Subroutine of store_bit_field_1, with the same arguments, except
+   that BITSIZE and BITNUM are constant.  Handle cases specific to
+   integral modes.  If OP0_MODE is defined, it is the mode of OP0,
+   otherwise OP0 is a BLKmode MEM.  */
+
+static bool
+store_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
+			  unsigned HOST_WIDE_INT bitsize,
+			  unsigned HOST_WIDE_INT bitnum,
+			  unsigned HOST_WIDE_INT bitregion_start,
+			  unsigned HOST_WIDE_INT bitregion_end,
+			  machine_mode fieldmode,
+			  rtx value, bool reverse, bool fallback_p)
+{
   /* Storing an lsb-aligned field in a register
      can be done with a movstrict instruction.  */
 
   if (!MEM_P (op0)
       && !reverse
-      && lowpart_bit_field_p (bitnum, bitsize, GET_MODE (op0))
+      && lowpart_bit_field_p (bitnum, bitsize, op0_mode.require ())
       && bitsize == GET_MODE_BITSIZE (fieldmode)
       && optab_handler (movstrict_optab, fieldmode) != CODE_FOR_nothing)
     {
@@ -882,10 +915,13 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	 subwords to extract.  Note that fieldmode will often (always?) be
 	 VOIDmode, because that is what store_field uses to indicate that this
 	 is a bit field, but passing VOIDmode to operand_subword_force
-	 is not allowed.  */
-      fieldmode = GET_MODE (value);
-      if (fieldmode == VOIDmode)
-	fieldmode = smallest_int_mode_for_size (nwords * BITS_PER_WORD);
+	 is not allowed.
+
+	 The mode must be fixed-size, since insertions into variable-sized
+	 objects are meant to be handled before calling this function.  */
+      fixed_size_mode value_mode = as_a <fixed_size_mode> (GET_MODE (value));
+      if (value_mode == VOIDmode)
+	value_mode = smallest_int_mode_for_size (nwords * BITS_PER_WORD);
 
       last = get_last_insn ();
       for (i = 0; i < nwords; i++)
@@ -893,7 +929,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	  /* If I is 0, use the low-order word in both field and target;
 	     if I is 1, use the next to lowest word; and so on.  */
 	  unsigned int wordnum = (backwards
-				  ? GET_MODE_SIZE (fieldmode) / UNITS_PER_WORD
+				  ? GET_MODE_SIZE (value_mode) / UNITS_PER_WORD
 				  - i - 1
 				  : i);
 	  unsigned int bit_offset = (backwards ^ reverse
@@ -901,7 +937,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
 					    * BITS_PER_WORD,
 					    0)
 				     : (int) i * BITS_PER_WORD);
-	  rtx value_word = operand_subword_force (value, wordnum, fieldmode);
+	  rtx value_word = operand_subword_force (value, wordnum, value_mode);
 	  unsigned HOST_WIDE_INT new_bitsize =
 	    MIN (BITS_PER_WORD, bitsize - i * BITS_PER_WORD);
 
@@ -935,7 +971,7 @@ store_bit_field_1 (rtx str_rtx, unsigned
      integer of the corresponding size.  This can occur on a machine
      with 64 bit registers that uses SFmode for float.  It can also
      occur for unaligned float or complex fields.  */
-  orig_value = value;
+  rtx orig_value = value;
   scalar_int_mode value_mode;
   if (GET_MODE (value) == VOIDmode)
     /* By this point we've dealt with values that are bigger than a word,
@@ -1043,41 +1079,43 @@ store_bit_field_1 (rtx str_rtx, unsigned
    If REVERSE is true, the store is to be done in reverse order.  */
 
 void
-store_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		 unsigned HOST_WIDE_INT bitnum,
+store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
 		 unsigned HOST_WIDE_INT bitregion_start,
 		 unsigned HOST_WIDE_INT bitregion_end,
 		 machine_mode fieldmode,
 		 rtx value, bool reverse)
 {
   /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
+  unsigned HOST_WIDE_INT ibitsize = 0, ibitnum = 0;
   scalar_int_mode int_mode;
-  if (is_a <scalar_int_mode> (fieldmode, &int_mode)
-      && strict_volatile_bitfield_p (str_rtx, bitsize, bitnum, int_mode,
+  if (bitsize.is_constant (&ibitsize)
+      && bitnum.is_constant (&ibitnum)
+      && is_a <scalar_int_mode> (fieldmode, &int_mode)
+      && strict_volatile_bitfield_p (str_rtx, ibitsize, ibitnum, int_mode,
 				     bitregion_start, bitregion_end))
     {
       /* Storing of a full word can be done with a simple store.
 	 We know here that the field can be accessed with one single
 	 instruction.  For targets that support unaligned memory,
 	 an unaligned access may be necessary.  */
-      if (bitsize == GET_MODE_BITSIZE (int_mode))
+      if (ibitsize == GET_MODE_BITSIZE (int_mode))
 	{
 	  str_rtx = adjust_bitfield_address (str_rtx, int_mode,
-					     bitnum / BITS_PER_UNIT);
+					     ibitnum / BITS_PER_UNIT);
 	  if (reverse)
 	    value = flip_storage_order (int_mode, value);
-	  gcc_assert (bitnum % BITS_PER_UNIT == 0);
+	  gcc_assert (ibitnum % BITS_PER_UNIT == 0);
 	  emit_move_insn (str_rtx, value);
 	}
       else
 	{
 	  rtx temp;
 
-	  str_rtx = narrow_bit_field_mem (str_rtx, int_mode, bitsize, bitnum,
-					  &bitnum);
-	  gcc_assert (bitnum + bitsize <= GET_MODE_BITSIZE (int_mode));
+	  str_rtx = narrow_bit_field_mem (str_rtx, int_mode, ibitsize,
+					  ibitnum, &ibitnum);
+	  gcc_assert (ibitnum + ibitsize <= GET_MODE_BITSIZE (int_mode));
 	  temp = copy_to_reg (str_rtx);
-	  if (!store_bit_field_1 (temp, bitsize, bitnum, 0, 0,
+	  if (!store_bit_field_1 (temp, ibitsize, ibitnum, 0, 0,
 				  int_mode, value, reverse, true))
 	    gcc_unreachable ();
 
@@ -1094,19 +1132,21 @@ store_bit_field (rtx str_rtx, unsigned H
     {
       scalar_int_mode best_mode;
       machine_mode addr_mode = VOIDmode;
-      HOST_WIDE_INT offset, size;
+      HOST_WIDE_INT offset;
 
       gcc_assert ((bitregion_start % BITS_PER_UNIT) == 0);
 
       offset = bitregion_start / BITS_PER_UNIT;
       bitnum -= bitregion_start;
-      size = (bitnum + bitsize + BITS_PER_UNIT - 1) / BITS_PER_UNIT;
+      poly_int64 size = bits_to_bytes_round_up (bitnum + bitsize);
       bitregion_end -= bitregion_start;
       bitregion_start = 0;
-      if (get_best_mode (bitsize, bitnum,
-			 bitregion_start, bitregion_end,
-			 MEM_ALIGN (str_rtx), INT_MAX,
-			 MEM_VOLATILE_P (str_rtx), &best_mode))
+      if (bitsize.is_constant (&ibitsize)
+	  && bitnum.is_constant (&ibitnum)
+	  && get_best_mode (ibitsize, ibitnum,
+			    bitregion_start, bitregion_end,
+			    MEM_ALIGN (str_rtx), INT_MAX,
+			    MEM_VOLATILE_P (str_rtx), &best_mode))
 	addr_mode = best_mode;
       str_rtx = adjust_bitfield_address_size (str_rtx, addr_mode,
 					      offset, size);
@@ -1738,9 +1778,10 @@ extract_bit_field_1 (rtx str_rtx, unsign
 
   /* Extraction of a full MODE1 value can be done with a load as long as
      the field is on a byte boundary and is sufficiently aligned.  */
-  if (simple_mem_bitfield_p (op0, bitsize, bitnum, mode1))
+  poly_uint64 bytenum;
+  if (simple_mem_bitfield_p (op0, bitsize, bitnum, mode1, &bytenum))
     {
-      op0 = adjust_bitfield_address (op0, mode1, bitnum / BITS_PER_UNIT);
+      op0 = adjust_bitfield_address (op0, mode1, bytenum);
       if (reverse)
 	op0 = flip_storage_order (mode1, op0);
       return convert_extracted_bit_field (op0, mode, tmode, unsignedp);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [022/nnn] poly_int: C++ bitfield regions
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (21 preceding siblings ...)
  2017-10-23 17:09 ` [021/nnn] poly_int: extract_bit_field bitrange Richard Sandiford
@ 2017-10-23 17:09 ` Richard Sandiford
  2017-12-05 23:39   ` Jeff Law
  2017-10-23 17:10 ` [025/nnn] poly_int: SUBREG_BYTE Richard Sandiford
                   ` (84 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:09 UTC (permalink / raw)
  To: gcc-patches

This patch changes C++ bitregion_start/end values from constants to
poly_ints.  Although it's unlikely that the size needs to be polynomial
in practice, the offset could be with future language extensions.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expmed.h (store_bit_field): Change bitregion_start and
	bitregion_end from unsigned HOST_WIDE_INT to poly_uint64.
	* expmed.c (adjust_bit_field_mem_for_reg, strict_volatile_bitfield_p)
	(store_bit_field_1, store_integral_bit_field, store_bit_field)
	(store_fixed_bit_field, store_split_bit_field): Likewise.
	* expr.c (store_constructor_field, store_field): Likewise.
	(optimize_bitfield_assignment_op): Likewise.  Make the same change
	to bitsize and bitpos.
	* machmode.h (bit_field_mode_iterator): Change m_bitregion_start
	and m_bitregion_end from HOST_WIDE_INT to poly_int64.  Make the
	same change in the constructor arguments.
	(get_best_mode): Change bitregion_start and bitregion_end from
	unsigned HOST_WIDE_INT to poly_uint64.
	* stor-layout.c (bit_field_mode_iterator::bit_field_mode_iterator):
	Change bitregion_start and bitregion_end from HOST_WIDE_INT to
	poly_int64.
	(bit_field_mode_iterator::next_mode): Update for new types
	of m_bitregion_start and m_bitregion_end.
	(get_best_mode): Change bitregion_start and bitregion_end from
	unsigned HOST_WIDE_INT to poly_uint64.

Index: gcc/expmed.h
===================================================================
--- gcc/expmed.h	2017-10-23 17:11:50.109574423 +0100
+++ gcc/expmed.h	2017-10-23 17:11:54.533863145 +0100
@@ -719,8 +719,7 @@ extern rtx expand_divmod (int, enum tree
 #endif
 
 extern void store_bit_field (rtx, poly_uint64, poly_uint64,
-			     unsigned HOST_WIDE_INT,
-			     unsigned HOST_WIDE_INT,
+			     poly_uint64, poly_uint64,
 			     machine_mode, rtx, bool);
 extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
 			      machine_mode, machine_mode, bool, rtx *);
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 17:11:50.109574423 +0100
+++ gcc/expmed.c	2017-10-23 17:11:54.533863145 +0100
@@ -49,14 +49,12 @@ struct target_expmed *this_target_expmed
 static bool store_integral_bit_field (rtx, opt_scalar_int_mode,
 				      unsigned HOST_WIDE_INT,
 				      unsigned HOST_WIDE_INT,
-				      unsigned HOST_WIDE_INT,
-				      unsigned HOST_WIDE_INT,
+				      poly_uint64, poly_uint64,
 				      machine_mode, rtx, bool, bool);
 static void store_fixed_bit_field (rtx, opt_scalar_int_mode,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   poly_uint64, poly_uint64,
 				   rtx, scalar_int_mode, bool);
 static void store_fixed_bit_field_1 (rtx, scalar_int_mode,
 				     unsigned HOST_WIDE_INT,
@@ -65,8 +63,7 @@ static void store_fixed_bit_field_1 (rtx
 static void store_split_bit_field (rtx, opt_scalar_int_mode,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
-				   unsigned HOST_WIDE_INT,
+				   poly_uint64, poly_uint64,
 				   rtx, scalar_int_mode, bool);
 static rtx extract_integral_bit_field (rtx, opt_scalar_int_mode,
 				       unsigned HOST_WIDE_INT,
@@ -471,8 +468,8 @@ narrow_bit_field_mem (rtx mem, opt_scala
 adjust_bit_field_mem_for_reg (enum extraction_pattern pattern,
 			      rtx op0, HOST_WIDE_INT bitsize,
 			      HOST_WIDE_INT bitnum,
-			      unsigned HOST_WIDE_INT bitregion_start,
-			      unsigned HOST_WIDE_INT bitregion_end,
+			      poly_uint64 bitregion_start,
+			      poly_uint64 bitregion_end,
 			      machine_mode fieldmode,
 			      unsigned HOST_WIDE_INT *new_bitnum)
 {
@@ -536,8 +533,8 @@ lowpart_bit_field_p (poly_uint64 bitnum,
 strict_volatile_bitfield_p (rtx op0, unsigned HOST_WIDE_INT bitsize,
 			    unsigned HOST_WIDE_INT bitnum,
 			    scalar_int_mode fieldmode,
-			    unsigned HOST_WIDE_INT bitregion_start,
-			    unsigned HOST_WIDE_INT bitregion_end)
+			    poly_uint64 bitregion_start,
+			    poly_uint64 bitregion_end)
 {
   unsigned HOST_WIDE_INT modesize = GET_MODE_BITSIZE (fieldmode);
 
@@ -564,9 +561,10 @@ strict_volatile_bitfield_p (rtx op0, uns
     return false;
 
   /* Check for cases where the C++ memory model applies.  */
-  if (bitregion_end != 0
-      && (bitnum - bitnum % modesize < bitregion_start
-	  || bitnum - bitnum % modesize + modesize - 1 > bitregion_end))
+  if (maybe_nonzero (bitregion_end)
+      && (may_lt (bitnum - bitnum % modesize, bitregion_start)
+	  || may_gt (bitnum - bitnum % modesize + modesize - 1,
+		     bitregion_end)))
     return false;
 
   return true;
@@ -730,8 +728,7 @@ store_bit_field_using_insv (const extrac
 
 static bool
 store_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
-		   unsigned HOST_WIDE_INT bitregion_start,
-		   unsigned HOST_WIDE_INT bitregion_end,
+		   poly_uint64 bitregion_start, poly_uint64 bitregion_end,
 		   machine_mode fieldmode,
 		   rtx value, bool reverse, bool fallback_p)
 {
@@ -858,8 +855,8 @@ store_bit_field_1 (rtx str_rtx, poly_uin
 store_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 			  unsigned HOST_WIDE_INT bitsize,
 			  unsigned HOST_WIDE_INT bitnum,
-			  unsigned HOST_WIDE_INT bitregion_start,
-			  unsigned HOST_WIDE_INT bitregion_end,
+			  poly_uint64 bitregion_start,
+			  poly_uint64 bitregion_end,
 			  machine_mode fieldmode,
 			  rtx value, bool reverse, bool fallback_p)
 {
@@ -1085,8 +1082,7 @@ store_integral_bit_field (rtx op0, opt_s
 
 void
 store_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
-		 unsigned HOST_WIDE_INT bitregion_start,
-		 unsigned HOST_WIDE_INT bitregion_end,
+		 poly_uint64 bitregion_start, poly_uint64 bitregion_end,
 		 machine_mode fieldmode,
 		 rtx value, bool reverse)
 {
@@ -1133,15 +1129,12 @@ store_bit_field (rtx str_rtx, poly_uint6
   /* Under the C++0x memory model, we must not touch bits outside the
      bit region.  Adjust the address to start at the beginning of the
      bit region.  */
-  if (MEM_P (str_rtx) && bitregion_start > 0)
+  if (MEM_P (str_rtx) && maybe_nonzero (bitregion_start))
     {
       scalar_int_mode best_mode;
       machine_mode addr_mode = VOIDmode;
-      HOST_WIDE_INT offset;
-
-      gcc_assert ((bitregion_start % BITS_PER_UNIT) == 0);
 
-      offset = bitregion_start / BITS_PER_UNIT;
+      poly_uint64 offset = exact_div (bitregion_start, BITS_PER_UNIT);
       bitnum -= bitregion_start;
       poly_int64 size = bits_to_bytes_round_up (bitnum + bitsize);
       bitregion_end -= bitregion_start;
@@ -1174,8 +1167,7 @@ store_bit_field (rtx str_rtx, poly_uint6
 store_fixed_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 		       unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitnum,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       poly_uint64 bitregion_start, poly_uint64 bitregion_end,
 		       rtx value, scalar_int_mode value_mode, bool reverse)
 {
   /* There is a case not handled here:
@@ -1330,8 +1322,7 @@ store_fixed_bit_field_1 (rtx op0, scalar
 store_split_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
 		       unsigned HOST_WIDE_INT bitsize,
 		       unsigned HOST_WIDE_INT bitpos,
-		       unsigned HOST_WIDE_INT bitregion_start,
-		       unsigned HOST_WIDE_INT bitregion_end,
+		       poly_uint64 bitregion_start, poly_uint64 bitregion_end,
 		       rtx value, scalar_int_mode value_mode, bool reverse)
 {
   unsigned int unit, total_bits, bitsdone = 0;
@@ -1379,9 +1370,9 @@ store_split_bit_field (rtx op0, opt_scal
 	 UNIT close to the end of the region as needed.  If op0 is a REG
 	 or SUBREG of REG, don't do this, as there can't be data races
 	 on a register and we can expand shorter code in some cases.  */
-      if (bitregion_end
+      if (maybe_nonzero (bitregion_end)
 	  && unit > BITS_PER_UNIT
-	  && bitpos + bitsdone - thispos + unit > bitregion_end + 1
+	  && may_gt (bitpos + bitsdone - thispos + unit, bitregion_end + 1)
 	  && !REG_P (op0)
 	  && (GET_CODE (op0) != SUBREG || !REG_P (SUBREG_REG (op0))))
 	{
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:11:43.725043907 +0100
+++ gcc/expr.c	2017-10-23 17:11:54.535862371 +0100
@@ -79,13 +79,9 @@ static void emit_block_move_via_loop (rt
 static void clear_by_pieces (rtx, unsigned HOST_WIDE_INT, unsigned int);
 static rtx_insn *compress_float_constant (rtx, rtx);
 static rtx get_subtarget (rtx);
-static void store_constructor_field (rtx, unsigned HOST_WIDE_INT,
-				     HOST_WIDE_INT, unsigned HOST_WIDE_INT,
-				     unsigned HOST_WIDE_INT, machine_mode,
-				     tree, int, alias_set_type, bool);
 static void store_constructor (tree, rtx, int, HOST_WIDE_INT, bool);
 static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
-			unsigned HOST_WIDE_INT, unsigned HOST_WIDE_INT,
+			poly_uint64, poly_uint64,
 			machine_mode, tree, alias_set_type, bool, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
@@ -4611,10 +4607,10 @@ get_subtarget (rtx x)
    and there's nothing else to do.  */
 
 static bool
-optimize_bitfield_assignment_op (unsigned HOST_WIDE_INT bitsize,
-				 unsigned HOST_WIDE_INT bitpos,
-				 unsigned HOST_WIDE_INT bitregion_start,
-				 unsigned HOST_WIDE_INT bitregion_end,
+optimize_bitfield_assignment_op (poly_uint64 pbitsize,
+				 poly_uint64 pbitpos,
+				 poly_uint64 pbitregion_start,
+				 poly_uint64 pbitregion_end,
 				 machine_mode mode1, rtx str_rtx,
 				 tree to, tree src, bool reverse)
 {
@@ -4626,7 +4622,12 @@ optimize_bitfield_assignment_op (unsigne
   gimple *srcstmt;
   enum tree_code code;
 
+  unsigned HOST_WIDE_INT bitsize, bitpos, bitregion_start, bitregion_end;
   if (mode1 != VOIDmode
+      || !pbitsize.is_constant (&bitsize)
+      || !pbitpos.is_constant (&bitpos)
+      || !pbitregion_start.is_constant (&bitregion_start)
+      || !pbitregion_end.is_constant (&bitregion_end)
       || bitsize >= BITS_PER_WORD
       || str_bitsize > BITS_PER_WORD
       || TREE_SIDE_EFFECTS (to)
@@ -6082,8 +6083,8 @@ all_zeros_p (const_tree exp)
 static void
 store_constructor_field (rtx target, unsigned HOST_WIDE_INT bitsize,
 			 HOST_WIDE_INT bitpos,
-			 unsigned HOST_WIDE_INT bitregion_start,
-			 unsigned HOST_WIDE_INT bitregion_end,
+			 poly_uint64 bitregion_start,
+			 poly_uint64 bitregion_end,
 			 machine_mode mode,
 			 tree exp, int cleared,
 			 alias_set_type alias_set, bool reverse)
@@ -6762,8 +6763,7 @@ store_constructor (tree exp, rtx target,
 
 static rtx
 store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
-	     unsigned HOST_WIDE_INT bitregion_start,
-	     unsigned HOST_WIDE_INT bitregion_end,
+	     poly_uint64 bitregion_start, poly_uint64 bitregion_end,
 	     machine_mode mode, tree exp,
 	     alias_set_type alias_set, bool nontemporal,  bool reverse)
 {
Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2017-10-23 17:11:43.725043907 +0100
+++ gcc/machmode.h	2017-10-23 17:11:54.535862371 +0100
@@ -760,7 +760,7 @@ mode_for_int_vector (machine_mode mode)
 {
 public:
   bit_field_mode_iterator (HOST_WIDE_INT, HOST_WIDE_INT,
-			   HOST_WIDE_INT, HOST_WIDE_INT,
+			   poly_int64, poly_int64,
 			   unsigned int, bool);
   bool next_mode (scalar_int_mode *);
   bool prefer_smaller_modes ();
@@ -771,8 +771,8 @@ mode_for_int_vector (machine_mode mode)
      for invalid input such as gcc.dg/pr48335-8.c.  */
   HOST_WIDE_INT m_bitsize;
   HOST_WIDE_INT m_bitpos;
-  HOST_WIDE_INT m_bitregion_start;
-  HOST_WIDE_INT m_bitregion_end;
+  poly_int64 m_bitregion_start;
+  poly_int64 m_bitregion_end;
   unsigned int m_align;
   bool m_volatilep;
   int m_count;
@@ -780,8 +780,7 @@ mode_for_int_vector (machine_mode mode)
 
 /* Find the best mode to use to access a bit field.  */
 
-extern bool get_best_mode (int, int, unsigned HOST_WIDE_INT,
-			   unsigned HOST_WIDE_INT, unsigned int,
+extern bool get_best_mode (int, int, poly_uint64, poly_uint64, unsigned int,
 			   unsigned HOST_WIDE_INT, bool, scalar_int_mode *);
 
 /* Determine alignment, 1<=result<=BIGGEST_ALIGNMENT.  */
Index: gcc/stor-layout.c
===================================================================
--- gcc/stor-layout.c	2017-10-23 17:11:43.725043907 +0100
+++ gcc/stor-layout.c	2017-10-23 17:11:54.535862371 +0100
@@ -2747,15 +2747,15 @@ fixup_unsigned_type (tree type)
 
 bit_field_mode_iterator
 ::bit_field_mode_iterator (HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
-			   HOST_WIDE_INT bitregion_start,
-			   HOST_WIDE_INT bitregion_end,
+			   poly_int64 bitregion_start,
+			   poly_int64 bitregion_end,
 			   unsigned int align, bool volatilep)
 : m_mode (NARROWEST_INT_MODE), m_bitsize (bitsize),
   m_bitpos (bitpos), m_bitregion_start (bitregion_start),
   m_bitregion_end (bitregion_end), m_align (align),
   m_volatilep (volatilep), m_count (0)
 {
-  if (!m_bitregion_end)
+  if (known_zero (m_bitregion_end))
     {
       /* We can assume that any aligned chunk of ALIGN bits that overlaps
 	 the bitfield is mapped and won't trap, provided that ALIGN isn't
@@ -2765,8 +2765,8 @@ fixup_unsigned_type (tree type)
 	= MIN (align, MAX (BIGGEST_ALIGNMENT, BITS_PER_WORD));
       if (bitsize <= 0)
 	bitsize = 1;
-      m_bitregion_end = bitpos + bitsize + units - 1;
-      m_bitregion_end -= m_bitregion_end % units + 1;
+      HOST_WIDE_INT end = bitpos + bitsize + units - 1;
+      m_bitregion_end = end - end % units - 1;
     }
 }
 
@@ -2803,10 +2803,11 @@ bit_field_mode_iterator::next_mode (scal
 
       /* Stop if the mode goes outside the bitregion.  */
       HOST_WIDE_INT start = m_bitpos - substart;
-      if (m_bitregion_start && start < m_bitregion_start)
+      if (maybe_nonzero (m_bitregion_start)
+	  && may_lt (start, m_bitregion_start))
 	break;
       HOST_WIDE_INT end = start + unit;
-      if (end > m_bitregion_end + 1)
+      if (may_gt (end, m_bitregion_end + 1))
 	break;
 
       /* Stop if the mode requires too much alignment.  */
@@ -2862,8 +2863,7 @@ bit_field_mode_iterator::prefer_smaller_
 
 bool
 get_best_mode (int bitsize, int bitpos,
-	       unsigned HOST_WIDE_INT bitregion_start,
-	       unsigned HOST_WIDE_INT bitregion_end,
+	       poly_uint64 bitregion_start, poly_uint64 bitregion_end,
 	       unsigned int align,
 	       unsigned HOST_WIDE_INT largest_mode_bitsize, bool volatilep,
 	       scalar_int_mode *best_mode)

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [021/nnn] poly_int: extract_bit_field bitrange
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (20 preceding siblings ...)
  2017-10-23 17:09 ` [023/nnn] poly_int: store_field & co Richard Sandiford
@ 2017-10-23 17:09 ` Richard Sandiford
  2017-12-05 23:46   ` Jeff Law
  2017-10-23 17:09 ` [022/nnn] poly_int: C++ bitfield regions Richard Sandiford
                   ` (85 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:09 UTC (permalink / raw)
  To: gcc-patches

Similar to the previous store_bit_field patch, but for extractions
rather than insertions.  The patch splits out the extraction-as-subreg
handling into a new function (extract_bit_field_as_subreg), both for
ease of writing and because a later patch will add another caller.

The simplify_gen_subreg overload is temporary; it goes away
in a later patch.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtl.h (simplify_gen_subreg): Add a temporary overload that
	accepts poly_uint64 offsets.
	* expmed.h (extract_bit_field): Take bitsize and bitnum as
	poly_uint64s rather than unsigned HOST_WIDE_INTs.
	* expmed.c (lowpart_bit_field_p): Likewise.
	(extract_bit_field_as_subreg): New function, split out from...
	(extract_bit_field_1): ...here.  Take bitsize and bitnum as
	poly_uint64s rather than unsigned HOST_WIDE_INTs.  For vector
	extractions, check that BITSIZE matches the size of the extracted
	value and that BITNUM is an exact multiple of that size.
	If all else fails, try forcing the value into memory if
	BITNUM is variable, and adjusting the address so that the
	offset is constant.  Split the part that can only handle constant
	bitsize and bitnum out into...
	(extract_integral_bit_field): ...this new function.
	(extract_bit_field): Take bitsize and bitnum as poly_uint64s
	rather than unsigned HOST_WIDE_INTs.

Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 17:11:43.774024962 +0100
+++ gcc/rtl.h	2017-10-23 17:11:50.109574423 +0100
@@ -3267,6 +3267,12 @@ extern rtx simplify_subreg (machine_mode
 			    unsigned int);
 extern rtx simplify_gen_subreg (machine_mode, rtx, machine_mode,
 				unsigned int);
+inline rtx
+simplify_gen_subreg (machine_mode omode, rtx x, machine_mode imode,
+		     poly_uint64 offset)
+{
+  return simplify_gen_subreg (omode, x, imode, offset.to_constant ());
+}
 extern rtx lowpart_subreg (machine_mode, rtx, machine_mode);
 extern rtx simplify_replace_fn_rtx (rtx, const_rtx,
 				    rtx (*fn) (rtx, const_rtx, void *), void *);
Index: gcc/expmed.h
===================================================================
--- gcc/expmed.h	2017-10-23 17:11:43.774024962 +0100
+++ gcc/expmed.h	2017-10-23 17:11:50.109574423 +0100
@@ -722,8 +722,7 @@ extern void store_bit_field (rtx, poly_u
 			     unsigned HOST_WIDE_INT,
 			     unsigned HOST_WIDE_INT,
 			     machine_mode, rtx, bool);
-extern rtx extract_bit_field (rtx, unsigned HOST_WIDE_INT,
-			      unsigned HOST_WIDE_INT, int, rtx,
+extern rtx extract_bit_field (rtx, poly_uint64, poly_uint64, int, rtx,
 			      machine_mode, machine_mode, bool, rtx *);
 extern rtx extract_low_bits (machine_mode, machine_mode, rtx);
 extern rtx expand_mult (machine_mode, rtx, rtx, rtx, int);
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 17:11:43.774024962 +0100
+++ gcc/expmed.c	2017-10-23 17:11:50.109574423 +0100
@@ -68,6 +68,10 @@ static void store_split_bit_field (rtx,
 				   unsigned HOST_WIDE_INT,
 				   unsigned HOST_WIDE_INT,
 				   rtx, scalar_int_mode, bool);
+static rtx extract_integral_bit_field (rtx, opt_scalar_int_mode,
+				       unsigned HOST_WIDE_INT,
+				       unsigned HOST_WIDE_INT, int, rtx,
+				       machine_mode, machine_mode, bool, bool);
 static rtx extract_fixed_bit_field (machine_mode, rtx, opt_scalar_int_mode,
 				    unsigned HOST_WIDE_INT,
 				    unsigned HOST_WIDE_INT, rtx, int, bool);
@@ -509,17 +513,17 @@ adjust_bit_field_mem_for_reg (enum extra
    offset is then BITNUM / BITS_PER_UNIT.  */
 
 static bool
-lowpart_bit_field_p (unsigned HOST_WIDE_INT bitnum,
-		     unsigned HOST_WIDE_INT bitsize,
+lowpart_bit_field_p (poly_uint64 bitnum, poly_uint64 bitsize,
 		     machine_mode struct_mode)
 {
-  unsigned HOST_WIDE_INT regsize = REGMODE_NATURAL_SIZE (struct_mode);
+  poly_uint64 regsize = REGMODE_NATURAL_SIZE (struct_mode);
   if (BYTES_BIG_ENDIAN)
-    return (bitnum % BITS_PER_UNIT == 0
-	    && (bitnum + bitsize == GET_MODE_BITSIZE (struct_mode)
-		|| (bitnum + bitsize) % (regsize * BITS_PER_UNIT) == 0));
+    return (multiple_p (bitnum, BITS_PER_UNIT)
+	    && (must_eq (bitnum + bitsize, GET_MODE_BITSIZE (struct_mode))
+		|| multiple_p (bitnum + bitsize,
+			       regsize * BITS_PER_UNIT)));
   else
-    return bitnum % (regsize * BITS_PER_UNIT) == 0;
+    return multiple_p (bitnum, regsize * BITS_PER_UNIT);
 }
 
 /* Return true if -fstrict-volatile-bitfields applies to an access of OP0
@@ -1574,16 +1578,33 @@ extract_bit_field_using_extv (const extr
   return NULL_RTX;
 }
 
+/* See whether it would be valid to extract the part of OP0 described
+   by BITNUM and BITSIZE into a value of mode MODE using a subreg
+   operation.  Return the subreg if so, otherwise return null.  */
+
+static rtx
+extract_bit_field_as_subreg (machine_mode mode, rtx op0,
+			     poly_uint64 bitsize, poly_uint64 bitnum)
+{
+  poly_uint64 bytenum;
+  if (multiple_p (bitnum, BITS_PER_UNIT, &bytenum)
+      && must_eq (bitsize, GET_MODE_BITSIZE (mode))
+      && lowpart_bit_field_p (bitnum, bitsize, GET_MODE (op0))
+      && TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op0)))
+    return simplify_gen_subreg (mode, op0, GET_MODE (op0), bytenum);
+  return NULL_RTX;
+}
+
 /* A subroutine of extract_bit_field, with the same arguments.
    If FALLBACK_P is true, fall back to extract_fixed_bit_field
    if we can find no other means of implementing the operation.
    if FALLBACK_P is false, return NULL instead.  */
 
 static rtx
-extract_bit_field_1 (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		     unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target,
-		     machine_mode mode, machine_mode tmode,
-		     bool reverse, bool fallback_p, rtx *alt_rtl)
+extract_bit_field_1 (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
+		     int unsignedp, rtx target, machine_mode mode,
+		     machine_mode tmode, bool reverse, bool fallback_p,
+		     rtx *alt_rtl)
 {
   rtx op0 = str_rtx;
   machine_mode mode1;
@@ -1600,13 +1621,13 @@ extract_bit_field_1 (rtx str_rtx, unsign
   /* If we have an out-of-bounds access to a register, just return an
      uninitialized register of the required mode.  This can occur if the
      source code contains an out-of-bounds access to a small array.  */
-  if (REG_P (op0) && bitnum >= GET_MODE_BITSIZE (GET_MODE (op0)))
+  if (REG_P (op0) && must_ge (bitnum, GET_MODE_BITSIZE (GET_MODE (op0))))
     return gen_reg_rtx (tmode);
 
   if (REG_P (op0)
       && mode == GET_MODE (op0)
-      && bitnum == 0
-      && bitsize == GET_MODE_BITSIZE (GET_MODE (op0)))
+      && known_zero (bitnum)
+      && must_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (op0))))
     {
       if (reverse)
 	op0 = flip_storage_order (mode, op0);
@@ -1618,6 +1639,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
   if (VECTOR_MODE_P (GET_MODE (op0))
       && !MEM_P (op0)
       && VECTOR_MODE_P (tmode)
+      && must_eq (bitsize, GET_MODE_SIZE (tmode))
       && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (tmode))
     {
       machine_mode new_mode = GET_MODE (op0);
@@ -1633,18 +1655,17 @@ extract_bit_field_1 (rtx str_rtx, unsign
 	      || !targetm.vector_mode_supported_p (new_mode))
 	    new_mode = VOIDmode;
 	}
+      poly_uint64 pos;
       if (new_mode != VOIDmode
 	  && (convert_optab_handler (vec_extract_optab, new_mode, tmode)
 	      != CODE_FOR_nothing)
-	  && ((bitnum + bitsize - 1) / GET_MODE_BITSIZE (tmode)
-	      == bitnum / GET_MODE_BITSIZE (tmode)))
+	  && multiple_p (bitnum, GET_MODE_BITSIZE (tmode), &pos))
 	{
 	  struct expand_operand ops[3];
 	  machine_mode outermode = new_mode;
 	  machine_mode innermode = tmode;
 	  enum insn_code icode
 	    = convert_optab_handler (vec_extract_optab, outermode, innermode);
-	  unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode);
 
 	  if (new_mode != GET_MODE (op0))
 	    op0 = gen_lowpart (new_mode, op0);
@@ -1697,17 +1718,17 @@ extract_bit_field_1 (rtx str_rtx, unsign
      available.  */
   machine_mode outermode = GET_MODE (op0);
   scalar_mode innermode = GET_MODE_INNER (outermode);
+  poly_uint64 pos;
   if (VECTOR_MODE_P (outermode)
       && !MEM_P (op0)
       && (convert_optab_handler (vec_extract_optab, outermode, innermode)
 	  != CODE_FOR_nothing)
-      && ((bitnum + bitsize - 1) / GET_MODE_BITSIZE (innermode)
-	  == bitnum / GET_MODE_BITSIZE (innermode)))
+      && must_eq (bitsize, GET_MODE_BITSIZE (innermode))
+      && multiple_p (bitnum, GET_MODE_BITSIZE (innermode), &pos))
     {
       struct expand_operand ops[3];
       enum insn_code icode
 	= convert_optab_handler (vec_extract_optab, outermode, innermode);
-      unsigned HOST_WIDE_INT pos = bitnum / GET_MODE_BITSIZE (innermode);
 
       create_output_operand (&ops[0], target, innermode);
       ops[0].target = 1;
@@ -1765,14 +1786,9 @@ extract_bit_field_1 (rtx str_rtx, unsign
   /* Extraction of a full MODE1 value can be done with a subreg as long
      as the least significant bit of the value is the least significant
      bit of either OP0 or a word of OP0.  */
-  if (!MEM_P (op0)
-      && !reverse
-      && lowpart_bit_field_p (bitnum, bitsize, op0_mode.require ())
-      && bitsize == GET_MODE_BITSIZE (mode1)
-      && TRULY_NOOP_TRUNCATION_MODES_P (mode1, op0_mode.require ()))
+  if (!MEM_P (op0) && !reverse)
     {
-      rtx sub = simplify_gen_subreg (mode1, op0, op0_mode.require (),
-				     bitnum / BITS_PER_UNIT);
+      rtx sub = extract_bit_field_as_subreg (mode1, op0, bitsize, bitnum);
       if (sub)
 	return convert_extracted_bit_field (sub, mode, tmode, unsignedp);
     }
@@ -1788,6 +1804,39 @@ extract_bit_field_1 (rtx str_rtx, unsign
       return convert_extracted_bit_field (op0, mode, tmode, unsignedp);
     }
 
+  /* If we have a memory source and a non-constant bit offset, restrict
+     the memory to the referenced bytes.  This is a worst-case fallback
+     but is useful for things like vector booleans.  */
+  if (MEM_P (op0) && !bitnum.is_constant ())
+    {
+      bytenum = bits_to_bytes_round_down (bitnum);
+      bitnum = num_trailing_bits (bitnum);
+      poly_uint64 bytesize = bits_to_bytes_round_up (bitnum + bitsize);
+      op0 = adjust_bitfield_address_size (op0, BLKmode, bytenum, bytesize);
+      op0_mode = opt_scalar_int_mode ();
+    }
+
+  /* It's possible we'll need to handle other cases here for
+     polynomial bitnum and bitsize.  */
+
+  /* From here on we need to be looking at a fixed-size insertion.  */
+  return extract_integral_bit_field (op0, op0_mode, bitsize.to_constant (),
+				     bitnum.to_constant (), unsignedp,
+				     target, mode, tmode, reverse, fallback_p);
+}
+
+/* Subroutine of extract_bit_field_1, with the same arguments, except
+   that BITSIZE and BITNUM are constant.  Handle cases specific to
+   integral modes.  If OP0_MODE is defined, it is the mode of OP0,
+   otherwise OP0 is a BLKmode MEM.  */
+
+static rtx
+extract_integral_bit_field (rtx op0, opt_scalar_int_mode op0_mode,
+			    unsigned HOST_WIDE_INT bitsize,
+			    unsigned HOST_WIDE_INT bitnum, int unsignedp,
+			    rtx target, machine_mode mode, machine_mode tmode,
+			    bool reverse, bool fallback_p)
+{
   /* Handle fields bigger than a word.  */
 
   if (bitsize > BITS_PER_WORD)
@@ -1807,12 +1856,16 @@ extract_bit_field_1 (rtx str_rtx, unsign
 
       /* In case we're about to clobber a base register or something 
 	 (see gcc.c-torture/execute/20040625-1.c).   */
-      if (reg_mentioned_p (target, str_rtx))
+      if (reg_mentioned_p (target, op0))
 	target = gen_reg_rtx (mode);
 
       /* Indicate for flow that the entire target reg is being set.  */
       emit_clobber (target);
 
+      /* The mode must be fixed-size, since extract_bit_field_1 handles
+	 extractions from variable-sized objects before calling this
+	 function.  */
+      unsigned int target_size = GET_MODE_SIZE (GET_MODE (target));
       last = get_last_insn ();
       for (i = 0; i < nwords; i++)
 	{
@@ -1820,9 +1873,7 @@ extract_bit_field_1 (rtx str_rtx, unsign
 	     if I is 1, use the next to lowest word; and so on.  */
 	  /* Word number in TARGET to use.  */
 	  unsigned int wordnum
-	    = (backwards
-	       ? GET_MODE_SIZE (GET_MODE (target)) / UNITS_PER_WORD - i - 1
-	       : i);
+	    = (backwards ? target_size / UNITS_PER_WORD - i - 1 : i);
 	  /* Offset from start of field in OP0.  */
 	  unsigned int bit_offset = (backwards ^ reverse
 				     ? MAX ((int) bitsize - ((int) i + 1)
@@ -1851,11 +1902,11 @@ extract_bit_field_1 (rtx str_rtx, unsign
 	{
 	  /* Unless we've filled TARGET, the upper regs in a multi-reg value
 	     need to be zero'd out.  */
-	  if (GET_MODE_SIZE (GET_MODE (target)) > nwords * UNITS_PER_WORD)
+	  if (target_size > nwords * UNITS_PER_WORD)
 	    {
 	      unsigned int i, total_words;
 
-	      total_words = GET_MODE_SIZE (GET_MODE (target)) / UNITS_PER_WORD;
+	      total_words = target_size / UNITS_PER_WORD;
 	      for (i = nwords; i < total_words; i++)
 		emit_move_insn
 		  (operand_subword (target,
@@ -1993,10 +2044,9 @@ extract_bit_field_1 (rtx str_rtx, unsign
    if they are equally easy.  */
 
 rtx
-extract_bit_field (rtx str_rtx, unsigned HOST_WIDE_INT bitsize,
-		   unsigned HOST_WIDE_INT bitnum, int unsignedp, rtx target,
-		   machine_mode mode, machine_mode tmode, bool reverse,
-		   rtx *alt_rtl)
+extract_bit_field (rtx str_rtx, poly_uint64 bitsize, poly_uint64 bitnum,
+		   int unsignedp, rtx target, machine_mode mode,
+		   machine_mode tmode, bool reverse, rtx *alt_rtl)
 {
   machine_mode mode1;
 
@@ -2008,28 +2058,34 @@ extract_bit_field (rtx str_rtx, unsigned
   else
     mode1 = tmode;
 
+  unsigned HOST_WIDE_INT ibitsize, ibitnum;
   scalar_int_mode int_mode;
-  if (is_a <scalar_int_mode> (mode1, &int_mode)
-      && strict_volatile_bitfield_p (str_rtx, bitsize, bitnum, int_mode, 0, 0))
+  if (bitsize.is_constant (&ibitsize)
+      && bitnum.is_constant (&ibitnum)
+      && is_a <scalar_int_mode> (mode1, &int_mode)
+      && strict_volatile_bitfield_p (str_rtx, ibitsize, ibitnum,
+				     int_mode, 0, 0))
     {
       /* Extraction of a full INT_MODE value can be done with a simple load.
 	 We know here that the field can be accessed with one single
 	 instruction.  For targets that support unaligned memory,
 	 an unaligned access may be necessary.  */
-      if (bitsize == GET_MODE_BITSIZE (int_mode))
+      if (ibitsize == GET_MODE_BITSIZE (int_mode))
 	{
 	  rtx result = adjust_bitfield_address (str_rtx, int_mode,
-						bitnum / BITS_PER_UNIT);
+						ibitnum / BITS_PER_UNIT);
 	  if (reverse)
 	    result = flip_storage_order (int_mode, result);
-	  gcc_assert (bitnum % BITS_PER_UNIT == 0);
+	  gcc_assert (ibitnum % BITS_PER_UNIT == 0);
 	  return convert_extracted_bit_field (result, mode, tmode, unsignedp);
 	}
 
-      str_rtx = narrow_bit_field_mem (str_rtx, int_mode, bitsize, bitnum,
-				      &bitnum);
-      gcc_assert (bitnum + bitsize <= GET_MODE_BITSIZE (int_mode));
+      str_rtx = narrow_bit_field_mem (str_rtx, int_mode, ibitsize, ibitnum,
+				      &ibitnum);
+      gcc_assert (ibitnum + ibitsize <= GET_MODE_BITSIZE (int_mode));
       str_rtx = copy_to_reg (str_rtx);
+      return extract_bit_field_1 (str_rtx, ibitsize, ibitnum, unsignedp,
+				  target, mode, tmode, reverse, true, alt_rtl);
     }
 
   return extract_bit_field_1 (str_rtx, bitsize, bitnum, unsignedp,

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [023/nnn] poly_int: store_field & co
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (19 preceding siblings ...)
  2017-10-23 17:08 ` [018/nnn] poly_int: MEM_OFFSET and MEM_SIZE Richard Sandiford
@ 2017-10-23 17:09 ` Richard Sandiford
  2017-12-05 23:49   ` Jeff Law
  2017-10-23 17:09 ` [021/nnn] poly_int: extract_bit_field bitrange Richard Sandiford
                   ` (86 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:09 UTC (permalink / raw)
  To: gcc-patches

This patch makes store_field and related routines use poly_ints
for bit positions and sizes.  It keeps the existing choices
between signed and unsigned types (there are a mixture of both).


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expr.c (store_constructor_field): Change bitsize from a
	unsigned HOST_WIDE_INT to a poly_uint64 and bitpos from a
	HOST_WIDE_INT to a poly_int64.
	(store_constructor): Change size from a HOST_WIDE_INT to
	a poly_int64.
	(store_field): Likewise bitsize and bitpos.

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:11:54.535862371 +0100
+++ gcc/expr.c	2017-10-23 17:11:55.989300194 +0100
@@ -79,9 +79,8 @@ static void emit_block_move_via_loop (rt
 static void clear_by_pieces (rtx, unsigned HOST_WIDE_INT, unsigned int);
 static rtx_insn *compress_float_constant (rtx, rtx);
 static rtx get_subtarget (rtx);
-static void store_constructor (tree, rtx, int, HOST_WIDE_INT, bool);
-static rtx store_field (rtx, HOST_WIDE_INT, HOST_WIDE_INT,
-			poly_uint64, poly_uint64,
+static void store_constructor (tree, rtx, int, poly_int64, bool);
+static rtx store_field (rtx, poly_int64, poly_int64, poly_uint64, poly_uint64,
 			machine_mode, tree, alias_set_type, bool, bool);
 
 static unsigned HOST_WIDE_INT highest_pow2_factor_for_target (const_tree, const_tree);
@@ -6081,31 +6080,34 @@ all_zeros_p (const_tree exp)
    clear a substructure if the outer structure has already been cleared.  */
 
 static void
-store_constructor_field (rtx target, unsigned HOST_WIDE_INT bitsize,
-			 HOST_WIDE_INT bitpos,
+store_constructor_field (rtx target, poly_uint64 bitsize, poly_int64 bitpos,
 			 poly_uint64 bitregion_start,
 			 poly_uint64 bitregion_end,
 			 machine_mode mode,
 			 tree exp, int cleared,
 			 alias_set_type alias_set, bool reverse)
 {
+  poly_int64 bytepos;
+  poly_uint64 bytesize;
   if (TREE_CODE (exp) == CONSTRUCTOR
       /* We can only call store_constructor recursively if the size and
 	 bit position are on a byte boundary.  */
-      && bitpos % BITS_PER_UNIT == 0
-      && (bitsize > 0 && bitsize % BITS_PER_UNIT == 0)
+      && multiple_p (bitpos, BITS_PER_UNIT, &bytepos)
+      && maybe_nonzero (bitsize)
+      && multiple_p (bitsize, BITS_PER_UNIT, &bytesize)
       /* If we have a nonzero bitpos for a register target, then we just
 	 let store_field do the bitfield handling.  This is unlikely to
 	 generate unnecessary clear instructions anyways.  */
-      && (bitpos == 0 || MEM_P (target)))
+      && (known_zero (bitpos) || MEM_P (target)))
     {
       if (MEM_P (target))
-	target
-	  = adjust_address (target,
-			    GET_MODE (target) == BLKmode
-			    || 0 != (bitpos
-				     % GET_MODE_ALIGNMENT (GET_MODE (target)))
-			    ? BLKmode : VOIDmode, bitpos / BITS_PER_UNIT);
+	{
+	  machine_mode target_mode = GET_MODE (target);
+	  if (target_mode != BLKmode
+	      && !multiple_p (bitpos, GET_MODE_ALIGNMENT (target_mode)))
+	    target_mode = BLKmode;
+	  target = adjust_address (target, target_mode, bytepos);
+	}
 
 
       /* Update the alias set, if required.  */
@@ -6116,8 +6118,7 @@ store_constructor_field (rtx target, uns
 	  set_mem_alias_set (target, alias_set);
 	}
 
-      store_constructor (exp, target, cleared, bitsize / BITS_PER_UNIT,
-			 reverse);
+      store_constructor (exp, target, cleared, bytesize, reverse);
     }
   else
     store_field (target, bitsize, bitpos, bitregion_start, bitregion_end, mode,
@@ -6151,12 +6152,12 @@ fields_length (const_tree type)
    If REVERSE is true, the store is to be done in reverse order.  */
 
 static void
-store_constructor (tree exp, rtx target, int cleared, HOST_WIDE_INT size,
+store_constructor (tree exp, rtx target, int cleared, poly_int64 size,
 		   bool reverse)
 {
   tree type = TREE_TYPE (exp);
   HOST_WIDE_INT exp_size = int_size_in_bytes (type);
-  HOST_WIDE_INT bitregion_end = size > 0 ? size * BITS_PER_UNIT - 1 : 0;
+  poly_int64 bitregion_end = must_gt (size, 0) ? size * BITS_PER_UNIT - 1 : 0;
 
   switch (TREE_CODE (type))
     {
@@ -6171,7 +6172,7 @@ store_constructor (tree exp, rtx target,
 	reverse = TYPE_REVERSE_STORAGE_ORDER (type);
 
 	/* If size is zero or the target is already cleared, do nothing.  */
-	if (size == 0 || cleared)
+	if (known_zero (size) || cleared)
 	  cleared = 1;
 	/* We either clear the aggregate or indicate the value is dead.  */
 	else if ((TREE_CODE (type) == UNION_TYPE
@@ -6200,14 +6201,14 @@ store_constructor (tree exp, rtx target,
 	   the whole structure first.  Don't do this if TARGET is a
 	   register whose mode size isn't equal to SIZE since
 	   clear_storage can't handle this case.  */
-	else if (size > 0
+	else if (known_size_p (size)
 		 && (((int) CONSTRUCTOR_NELTS (exp) != fields_length (type))
 		     || mostly_zeros_p (exp))
 		 && (!REG_P (target)
-		     || ((HOST_WIDE_INT) GET_MODE_SIZE (GET_MODE (target))
-			 == size)))
+		     || must_eq (GET_MODE_SIZE (GET_MODE (target)), size)))
 	  {
-	    clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
+	    clear_storage (target, gen_int_mode (size, Pmode),
+			   BLOCK_OP_NORMAL);
 	    cleared = 1;
 	  }
 
@@ -6388,12 +6389,13 @@ store_constructor (tree exp, rtx target,
 	      need_to_clear = 1;
 	  }
 
-	if (need_to_clear && size > 0)
+	if (need_to_clear && may_gt (size, 0))
 	  {
 	    if (REG_P (target))
-	      emit_move_insn (target,  CONST0_RTX (GET_MODE (target)));
+	      emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
 	    else
-	      clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
+	      clear_storage (target, gen_int_mode (size, Pmode),
+			     BLOCK_OP_NORMAL);
 	    cleared = 1;
 	  }
 
@@ -6407,7 +6409,7 @@ store_constructor (tree exp, rtx target,
 	FOR_EACH_CONSTRUCTOR_ELT (CONSTRUCTOR_ELTS (exp), i, index, value)
 	  {
 	    machine_mode mode;
-	    HOST_WIDE_INT bitsize;
+	    poly_int64 bitsize;
 	    HOST_WIDE_INT bitpos;
 	    rtx xtarget = target;
 
@@ -6500,7 +6502,8 @@ store_constructor (tree exp, rtx target,
 		    xtarget = adjust_address (xtarget, mode, 0);
 		    if (TREE_CODE (value) == CONSTRUCTOR)
 		      store_constructor (value, xtarget, cleared,
-					 bitsize / BITS_PER_UNIT, reverse);
+					 exact_div (bitsize, BITS_PER_UNIT),
+					 reverse);
 		    else
 		      store_expr (value, xtarget, 0, false, reverse);
 
@@ -6669,12 +6672,13 @@ store_constructor (tree exp, rtx target,
 	    need_to_clear = (count < n_elts || 4 * zero_count >= 3 * count);
 	  }
 
-	if (need_to_clear && size > 0 && !vector)
+	if (need_to_clear && may_gt (size, 0) && !vector)
 	  {
 	    if (REG_P (target))
 	      emit_move_insn (target, CONST0_RTX (mode));
 	    else
-	      clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
+	      clear_storage (target, gen_int_mode (size, Pmode),
+			     BLOCK_OP_NORMAL);
 	    cleared = 1;
 	  }
 
@@ -6762,7 +6766,7 @@ store_constructor (tree exp, rtx target,
    If REVERSE is true, the store is to be done in reverse order.  */
 
 static rtx
-store_field (rtx target, HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+store_field (rtx target, poly_int64 bitsize, poly_int64 bitpos,
 	     poly_uint64 bitregion_start, poly_uint64 bitregion_end,
 	     machine_mode mode, tree exp,
 	     alias_set_type alias_set, bool nontemporal,  bool reverse)
@@ -6773,7 +6777,7 @@ store_field (rtx target, HOST_WIDE_INT b
   /* If we have nothing to store, do nothing unless the expression has
      side-effects.  Don't do that for zero sized addressable lhs of
      calls.  */
-  if (bitsize == 0
+  if (known_zero (bitsize)
       && (!TREE_ADDRESSABLE (TREE_TYPE (exp))
 	  || TREE_CODE (exp) != CALL_EXPR))
     return expand_expr (exp, const0_rtx, VOIDmode, EXPAND_NORMAL);
@@ -6782,7 +6786,7 @@ store_field (rtx target, HOST_WIDE_INT b
     {
       /* We're storing into a struct containing a single __complex.  */
 
-      gcc_assert (!bitpos);
+      gcc_assert (known_zero (bitpos));
       return store_expr (exp, target, 0, nontemporal, reverse);
     }
 
@@ -6790,6 +6794,7 @@ store_field (rtx target, HOST_WIDE_INT b
      is a bit field, we cannot use addressing to access it.
      Use bit-field techniques or SUBREG to store in it.  */
 
+  poly_int64 decl_bitsize;
   if (mode == VOIDmode
       || (mode != BLKmode && ! direct_store[(int) mode]
 	  && GET_MODE_CLASS (mode) != MODE_COMPLEX_INT
@@ -6800,21 +6805,22 @@ store_field (rtx target, HOST_WIDE_INT b
 	 store it as a bit field.  */
       || (mode != BLKmode
 	  && ((((MEM_ALIGN (target) < GET_MODE_ALIGNMENT (mode))
-		|| bitpos % GET_MODE_ALIGNMENT (mode))
+		|| !multiple_p (bitpos, GET_MODE_ALIGNMENT (mode)))
 	       && targetm.slow_unaligned_access (mode, MEM_ALIGN (target)))
-	      || (bitpos % BITS_PER_UNIT != 0)))
-      || (bitsize >= 0 && mode != BLKmode
-	  && GET_MODE_BITSIZE (mode) > bitsize)
+	      || !multiple_p (bitpos, BITS_PER_UNIT)))
+      || (known_size_p (bitsize)
+	  && mode != BLKmode
+	  && may_gt (GET_MODE_BITSIZE (mode), bitsize))
       /* If the RHS and field are a constant size and the size of the
 	 RHS isn't the same size as the bitfield, we must use bitfield
 	 operations.  */
-      || (bitsize >= 0
-	  && TREE_CODE (TYPE_SIZE (TREE_TYPE (exp))) == INTEGER_CST
-	  && compare_tree_int (TYPE_SIZE (TREE_TYPE (exp)), bitsize) != 0
+      || (known_size_p (bitsize)
+	  && poly_int_tree_p (TYPE_SIZE (TREE_TYPE (exp)))
+	  && may_ne (wi::to_poly_offset (TYPE_SIZE (TREE_TYPE (exp))), bitsize)
 	  /* Except for initialization of full bytes from a CONSTRUCTOR, which
 	     we will handle specially below.  */
 	  && !(TREE_CODE (exp) == CONSTRUCTOR
-	       && bitsize % BITS_PER_UNIT == 0)
+	       && multiple_p (bitsize, BITS_PER_UNIT))
 	  /* And except for bitwise copying of TREE_ADDRESSABLE types,
 	     where the FIELD_DECL has the right bitsize, but TREE_TYPE (exp)
 	     includes some extra padding.  store_expr / expand_expr will in
@@ -6825,14 +6831,14 @@ store_field (rtx target, HOST_WIDE_INT b
 	     get_base_address needs to live in memory.  */
 	  && (!TREE_ADDRESSABLE (TREE_TYPE (exp))
 	      || TREE_CODE (exp) != COMPONENT_REF
-	      || TREE_CODE (DECL_SIZE (TREE_OPERAND (exp, 1))) != INTEGER_CST
-	      || (bitsize % BITS_PER_UNIT != 0)
-	      || (bitpos % BITS_PER_UNIT != 0)
-	      || (compare_tree_int (DECL_SIZE (TREE_OPERAND (exp, 1)), bitsize)
-		  != 0)))
+	      || !multiple_p (bitsize, BITS_PER_UNIT)
+	      || !multiple_p (bitpos, BITS_PER_UNIT)
+	      || !poly_int_tree_p (DECL_SIZE (TREE_OPERAND (exp, 1)),
+				   &decl_bitsize)
+	      || may_ne (decl_bitsize, bitsize)))
       /* If we are expanding a MEM_REF of a non-BLKmode non-addressable
          decl we must use bitfield operations.  */
-      || (bitsize >= 0
+      || (known_size_p (bitsize)
 	  && TREE_CODE (exp) == MEM_REF
 	  && TREE_CODE (TREE_OPERAND (exp, 0)) == ADDR_EXPR
 	  && DECL_P (TREE_OPERAND (TREE_OPERAND (exp, 0), 0))
@@ -6853,17 +6859,23 @@ store_field (rtx target, HOST_WIDE_INT b
 	  tree type = TREE_TYPE (exp);
 	  if (INTEGRAL_TYPE_P (type)
 	      && TYPE_PRECISION (type) < GET_MODE_BITSIZE (TYPE_MODE (type))
-	      && bitsize == TYPE_PRECISION (type))
+	      && must_eq (bitsize, TYPE_PRECISION (type)))
 	    {
 	      tree op = gimple_assign_rhs1 (nop_def);
 	      type = TREE_TYPE (op);
-	      if (INTEGRAL_TYPE_P (type) && TYPE_PRECISION (type) >= bitsize)
+	      if (INTEGRAL_TYPE_P (type)
+		  && must_ge (TYPE_PRECISION (type), bitsize))
 		exp = op;
 	    }
 	}
 
       temp = expand_normal (exp);
 
+      /* We don't support variable-sized BLKmode bitfields, since our
+	 handling of BLKmode is bound up with the ability to break
+	 things into words.  */
+      gcc_assert (mode != BLKmode || bitsize.is_constant ());
+
       /* Handle calls that return values in multiple non-contiguous locations.
 	 The Irix 6 ABI has examples of this.  */
       if (GET_CODE (temp) == PARALLEL)
@@ -6904,9 +6916,11 @@ store_field (rtx target, HOST_WIDE_INT b
 	  if (reverse)
 	    temp = flip_storage_order (temp_mode, temp);
 
-	  if (bitsize < size
+	  gcc_checking_assert (must_le (bitsize, size));
+	  if (may_lt (bitsize, size)
 	      && reverse ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN
-	      && !(mode == BLKmode && bitsize > BITS_PER_WORD))
+	      /* Use of to_constant for BLKmode was checked above.  */
+	      && !(mode == BLKmode && bitsize.to_constant () > BITS_PER_WORD))
 	    temp = expand_shift (RSHIFT_EXPR, temp_mode, temp,
 				 size - bitsize, NULL_RTX, 1);
 	}
@@ -6923,16 +6937,16 @@ store_field (rtx target, HOST_WIDE_INT b
 	  && (GET_MODE (target) == BLKmode
 	      || (MEM_P (target)
 		  && GET_MODE_CLASS (GET_MODE (target)) == MODE_INT
-		  && (bitpos % BITS_PER_UNIT) == 0
-		  && (bitsize % BITS_PER_UNIT) == 0)))
+		  && multiple_p (bitpos, BITS_PER_UNIT)
+		  && multiple_p (bitsize, BITS_PER_UNIT))))
 	{
-	  gcc_assert (MEM_P (target) && MEM_P (temp)
-		      && (bitpos % BITS_PER_UNIT) == 0);
+	  gcc_assert (MEM_P (target) && MEM_P (temp));
+	  poly_int64 bytepos = exact_div (bitpos, BITS_PER_UNIT);
+	  poly_int64 bytesize = bits_to_bytes_round_up (bitsize);
 
-	  target = adjust_address (target, VOIDmode, bitpos / BITS_PER_UNIT);
+	  target = adjust_address (target, VOIDmode, bytepos);
 	  emit_block_move (target, temp,
-			   GEN_INT ((bitsize + BITS_PER_UNIT - 1)
-				    / BITS_PER_UNIT),
+			   gen_int_mode (bytesize, Pmode),
 			   BLOCK_OP_NORMAL);
 
 	  return const0_rtx;
@@ -6940,7 +6954,7 @@ store_field (rtx target, HOST_WIDE_INT b
 
       /* If the mode of TEMP is still BLKmode and BITSIZE not larger than the
 	 word size, we need to load the value (see again store_bit_field).  */
-      if (GET_MODE (temp) == BLKmode && bitsize <= BITS_PER_WORD)
+      if (GET_MODE (temp) == BLKmode && must_le (bitsize, BITS_PER_WORD))
 	{
 	  scalar_int_mode temp_mode = smallest_int_mode_for_size (bitsize);
 	  temp = extract_bit_field (temp, bitsize, 0, 1, NULL_RTX, temp_mode,
@@ -6957,7 +6971,8 @@ store_field (rtx target, HOST_WIDE_INT b
   else
     {
       /* Now build a reference to just the desired component.  */
-      rtx to_rtx = adjust_address (target, mode, bitpos / BITS_PER_UNIT);
+      rtx to_rtx = adjust_address (target, mode,
+				   exact_div (bitpos, BITS_PER_UNIT));
 
       if (to_rtx == target)
 	to_rtx = copy_rtx (to_rtx);
@@ -6967,10 +6982,10 @@ store_field (rtx target, HOST_WIDE_INT b
 
       /* Above we avoided using bitfield operations for storing a CONSTRUCTOR
 	 into a target smaller than its type; handle that case now.  */
-      if (TREE_CODE (exp) == CONSTRUCTOR && bitsize >= 0)
+      if (TREE_CODE (exp) == CONSTRUCTOR && known_size_p (bitsize))
 	{
-	  gcc_assert (bitsize % BITS_PER_UNIT == 0);
-	  store_constructor (exp, to_rtx, 0, bitsize / BITS_PER_UNIT, reverse);
+	  poly_int64 bytesize = exact_div (bitsize, BITS_PER_UNIT);
+	  store_constructor (exp, to_rtx, 0, bytesize, reverse);
 	  return to_rtx;
 	}
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [025/nnn] poly_int: SUBREG_BYTE
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (22 preceding siblings ...)
  2017-10-23 17:09 ` [022/nnn] poly_int: C++ bitfield regions Richard Sandiford
@ 2017-10-23 17:10 ` Richard Sandiford
  2017-12-06 18:50   ` Jeff Law
  2017-10-23 17:10 ` [024/nnn] poly_int: ira subreg liveness tracking Richard Sandiford
                   ` (83 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:10 UTC (permalink / raw)
  To: gcc-patches

This patch changes SUBREG_BYTE from an int to a poly_int.
Since valid SUBREG_BYTEs must be contained within the mode of the
SUBREG_REG, the required range is the same as for GET_MODE_SIZE,
i.e. unsigned short.  The patch therefore uses poly_uint16(_pod)
for the SUBREG_BYTE.

Using poly_uint16_pod rtx fields requires a new field code ('p').
Since there are no other uses of 'p' besides SUBREG_BYTE, the patch
doesn't add an XPOLY or whatever; all uses should go via SUBREG_BYTE
instead.

The patch doesn't bother implementing 'p' support for legacy
define_peepholes, since none of the remaining ones have subregs
in their patterns.

As it happened, the rtl documentation used SUBREG as an example of a
code with mixed field types, accessed via XEXP (x, 0) and XINT (x, 1).
Since there's no direct replacement for XINT, and since people should
never use it even if there were, the patch changes the example to use
INT_LIST instead.

The patch also changes subreg-related helper functions so that they too
take and return polynomial offsets.  This makes the patch quite big, but
it's mostly mechanical.  The patch generally sticks to existing choices
wrt signedness.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/rtl.texi: Update documentation of SUBREG_BYTE.  Document the
	'p' format code.  Use INT_LIST rather than SUBREG as the example of
	a code with an XINT and an XEXP.  Remove the implication that
	accessing an rtx field using XINT is expected to work.
	* rtl.def (SUBREG): Change format from "ei" to "ep".
	* rtl.h (rtunion::rt_subreg): New field.
	(XCSUBREG): New macro.
	(SUBREG_BYTE): Use it.
	(subreg_shape): Change offset from an unsigned int to a poly_uint16.
	Update constructor accordingly.
	(subreg_shape::operator ==): Update accordingly.
	(subreg_shape::unique_id): Return an unsigned HOST_WIDE_INT rather
	than an unsigned int.
	(subreg_lsb, subreg_lowpart_offset, subreg_highpart_offset): Return
	a poly_uint64 rather than an unsigned int.
	(subreg_lsb_1): Likewise.  Take the offset as a poly_uint64 rather
	than an unsigned int.
	(subreg_size_offset_from_lsb, subreg_size_lowpart_offset)
	(subreg_size_highpart_offset): Return a poly_uint64 rather than
	an unsigned int.  Take the sizes as poly_uint64s.
	(subreg_offset_from_lsb): Return a poly_uint64 rather than
	an unsigned int.  Take the shift as a poly_uint64 rather than
	an unsigned int.
	(subreg_regno_offset, subreg_offset_representable_p): Take the offset
	as a poly_uint64 rather than an unsigned int.
	(simplify_subreg_regno): Likewise.
	(byte_lowpart_offset): Return the memory offset as a poly_int64
	rather than an int.
	(subreg_memory_offset): Likewise.  Take the subreg offset as a
	poly_uint64 rather than an unsigned int.
	(simplify_subreg, simplify_gen_subreg, subreg_get_info)
	(gen_rtx_SUBREG, validate_subreg): Take the subreg offset as a
	poly_uint64 rather than an unsigned int.
	* rtl.c (rtx_format): Describe 'p' in comment.
	(copy_rtx, rtx_equal_p_cb, rtx_equal_p): Handle 'p'.
	* emit-rtl.c (validate_subreg, gen_rtx_SUBREG): Take the subreg
	offset as a poly_uint64 rather than an unsigned int.
	(byte_lowpart_offset): Return the memory offset as a poly_int64
	rather than an int.
	(subreg_memory_offset): Likewise.  Take the subreg offset as a
	poly_uint64 rather than an unsigned int.
	(subreg_size_lowpart_offset, subreg_size_highpart_offset): Take the
	mode sizes as poly_uint64s rather than unsigned ints.  Return a
	poly_uint64 rather than an unsigned int.
	(subreg_lowpart_p): Treat subreg offsets as poly_ints.
	(copy_insn_1): Handle 'p'.
	* rtlanal.c (set_noop_p): Treat subregs offsets as poly_uint64s.
	(subreg_lsb_1): Take the subreg offset as a poly_uint64 rather than
	an unsigned int.  Return the shift in the same way.
	(subreg_lsb): Return the shift as a poly_uint64 rather than an
	unsigned int.
	(subreg_size_offset_from_lsb): Take the sizes and shift as
	poly_uint64s rather than unsigned ints.  Return the offset as
	a poly_uint64.
	(subreg_get_info, subreg_regno_offset, subreg_offset_representable_p)
	(simplify_subreg_regno): Take the offset as a poly_uint64 rather than
	an unsigned int.
	* rtlhash.c (add_rtx): Handle 'p'.
	* genemit.c (gen_exp): Likewise.
	* gengenrtl.c (type_from_format, gendef): Likewise.
	* gensupport.c (subst_pattern_match, get_alternatives_number)
	(collect_insn_data, alter_predicate_for_insn, alter_constraints)
	(subst_dup): Likewise.
	* gengtype.c (adjust_field_rtx_def): Likewise.
	* genrecog.c (find_operand, find_matching_operand, validate_pattern)
	(match_pattern_2): Likewise.
	(rtx_test::SUBREG_FIELD): New rtx_test::kind_enum.
	(rtx_test::subreg_field): New function.
	(operator ==, safe_to_hoist_p, transition_parameter_type)
	(print_nonbool_test, print_test): Handle SUBREG_FIELD.
	* genattrtab.c (attr_rtx_1): Say that 'p' is deliberately not handled.
	* genpeep.c (match_rtx): Likewise.
	* print-rtl.c (print_poly_int): Include if GENERATOR_FILE too.
	(rtx_writer::print_rtx_operand): Handle 'p'.
	(print_value): Handle SUBREG.
	* read-rtl.c (apply_int_iterator): Likewise.
	(rtx_reader::read_rtx_operand): Handle 'p'.
	* alias.c (rtx_equal_for_memref_p): Likewise.
	* cselib.c (rtx_equal_for_cselib_1, cselib_hash_rtx): Likewise.
	* caller-save.c (replace_reg_with_saved_mem): Treat subreg offsets
	as poly_ints.
	* calls.c (expand_call): Likewise.
	* combine.c (combine_simplify_rtx, expand_field_assignment): Likewise.
	(make_extraction, gen_lowpart_for_combine): Likewise.
	* loop-invariant.c (hash_invariant_expr_1, invariant_expr_equal_p):
	Likewise.
	* cse.c (remove_invalid_subreg_refs): Take the offset as a poly_uint64
	rather than an unsigned int.  Treat subreg offsets as poly_ints.
	(exp_equiv_p): Handle 'p'.
	(hash_rtx_cb): Likewise.  Treat subreg offsets as poly_ints.
	(equiv_constant, cse_insn): Treat subreg offsets as poly_ints.
	* dse.c (find_shift_sequence): Likewise.
	* dwarf2out.c (rtl_for_decl_location): Likewise.
	* expmed.c (extract_low_bits): Likewise.
	* expr.c (emit_group_store, undefined_operand_subword_p): Likewise.
	(expand_expr_real_2): Likewise.
	* final.c (alter_subreg): Likewise.
	(leaf_renumber_regs_insn): Handle 'p'.
	* function.c (assign_parm_find_stack_rtl, assign_parm_setup_stack):
	Treat subreg offsets as poly_ints.
	* fwprop.c (forward_propagate_and_simplify): Likewise.
	* ifcvt.c (noce_emit_move_insn, noce_emit_cmove): Likewise.
	* ira.c (get_subreg_tracking_sizes): Likewise.
	* ira-conflicts.c (go_through_subreg): Likewise.
	* ira-lives.c (process_single_reg_class_operands): Likewise.
	* jump.c (rtx_renumbered_equal_p): Likewise.  Handle 'p'.
	* lower-subreg.c (simplify_subreg_concatn): Take the subreg offset
	as a poly_uint64 rather than an unsigned int.
	(simplify_gen_subreg_concatn, resolve_simple_move): Treat
	subreg offsets as poly_ints.
	* lra-constraints.c (operands_match_p): Handle 'p'.
	(match_reload, curr_insn_transform): Treat subreg offsets as poly_ints.
	* lra-spills.c (assign_mem_slot): Likewise.
	* postreload.c (move2add_valid_value_p): Likewise.
	* recog.c (general_operand, indirect_operand): Likewise.
	* regcprop.c (copy_value, maybe_mode_change): Likewise.
	(copyprop_hardreg_forward_1): Likewise.
	* reginfo.c (simplifiable_subregs_hasher::hash, simplifiable_subregs)
	(record_subregs_of_mode): Likewise.
	* rtlhooks.c (gen_lowpart_general, gen_lowpart_if_possible): Likewise.
	* reload.c (operands_match_p): Handle 'p'.
	(find_reloads_subreg_address): Treat subreg offsets as poly_ints.
	* reload1.c (alter_reg, choose_reload_regs): Likewise.
	(compute_reload_subreg_offset): Likewise, and return an poly_int64.
	* simplify-rtx.c (simplify_truncation, simplify_binary_operation_1):
	(test_vector_ops_duplicate): Treat subreg offsets as poly_ints.
	(simplify_const_poly_int_tests<N>::run): Likewise.
	(simplify_subreg, simplify_gen_subreg): Take the subreg offset as
	a poly_uint64 rather than an unsigned int.
	* valtrack.c (debug_lowpart_subreg): Likewise.
	* var-tracking.c (var_lowpart): Likewise.
	(loc_cmp): Handle 'p'.

Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi	2017-10-23 17:16:35.057923923 +0100
+++ gcc/doc/rtl.texi	2017-10-23 17:16:50.360529627 +0100
@@ -109,10 +109,10 @@ and what kinds of objects they are.  In
 by looking at an operand what kind of object it is.  Instead, you must know
 from its context---from the expression code of the containing expression.
 For example, in an expression of code @code{subreg}, the first operand is
-to be regarded as an expression and the second operand as an integer.  In
-an expression of code @code{plus}, there are two operands, both of which
-are to be regarded as expressions.  In a @code{symbol_ref} expression,
-there is one operand, which is to be regarded as a string.
+to be regarded as an expression and the second operand as a polynomial
+integer.  In an expression of code @code{plus}, there are two operands,
+both of which are to be regarded as expressions.  In a @code{symbol_ref}
+expression, there is one operand, which is to be regarded as a string.
 
 Expressions are written as parentheses containing the name of the
 expression type, its flags and machine mode if any, and then the operands
@@ -209,7 +209,7 @@ chain, such as @code{NOTE}, @code{BARRIE
 For each expression code, @file{rtl.def} specifies the number of
 contained objects and their kinds using a sequence of characters
 called the @dfn{format} of the expression code.  For example,
-the format of @code{subreg} is @samp{ei}.
+the format of @code{subreg} is @samp{ep}.
 
 @cindex RTL format characters
 These are the most commonly used format characters:
@@ -258,6 +258,9 @@ An omitted vector is effectively the sam
 @item B
 @samp{B} indicates a pointer to basic block structure.
 
+@item p
+A polynomial integer.  At present this is used only for @code{SUBREG_BYTE}.
+
 @item 0
 @samp{0} means a slot whose contents do not fit any normal category.
 @samp{0} slots are not printed at all in dumps, and are often used in
@@ -340,16 +343,13 @@ stored in the operand.  You would do thi
 the containing expression.  That is also how you would know how many
 operands there are.
 
-For example, if @var{x} is a @code{subreg} expression, you know that it has
-two operands which can be correctly accessed as @code{XEXP (@var{x}, 0)}
-and @code{XINT (@var{x}, 1)}.  If you did @code{XINT (@var{x}, 0)}, you
-would get the address of the expression operand but cast as an integer;
-that might occasionally be useful, but it would be cleaner to write
-@code{(int) XEXP (@var{x}, 0)}.  @code{XEXP (@var{x}, 1)} would also
-compile without error, and would return the second, integer operand cast as
-an expression pointer, which would probably result in a crash when
-accessed.  Nothing stops you from writing @code{XEXP (@var{x}, 28)} either,
-but this will access memory past the end of the expression with
+For example, if @var{x} is an @code{int_list} expression, you know that it has
+two operands which can be correctly accessed as @code{XINT (@var{x}, 0)}
+and @code{XEXP (@var{x}, 1)}.  Incorrect accesses like
+@code{XEXP (@var{x}, 0)} and @code{XINT (@var{x}, 1)} would compile,
+but would trigger an internal compiler error when rtl checking is enabled.
+Nothing stops you from writing @code{XEXP (@var{x}, 28)} either, but
+this will access memory past the end of the expression with
 unpredictable results.
 
 Access to operands which are vectors is more complicated.  You can use the
@@ -2007,6 +2007,13 @@ on a @code{BYTES_BIG_ENDIAN}, @samp{UNIT
 on a little-endian, @samp{UNITS_PER_WORD == 4} target.  Both
 @code{subreg}s access the lower two bytes of register @var{x}.
 
+Note that the byte offset is a polynomial integer; it may not be a
+compile-time constant on targets with variable-sized modes.  However,
+the restrictions above mean that there are only a certain set of
+acceptable offsets for a given combination of @var{m1} and @var{m2}.
+The compiler can always tell which blocks a valid subreg occupies, and
+whether the subreg is a lowpart of a block.
+
 @end table
 
 A @code{MODE_PARTIAL_INT} mode behaves as if it were as wide as the
Index: gcc/rtl.def
===================================================================
--- gcc/rtl.def	2017-10-23 17:16:35.057923923 +0100
+++ gcc/rtl.def	2017-10-23 17:16:50.374527737 +0100
@@ -394,7 +394,7 @@ DEF_RTL_EXPR(SCRATCH, "scratch", "", RTX
 
 /* A reference to a part of another value.  The first operand is the
    complete value and the second is the byte offset of the selected part.   */
-DEF_RTL_EXPR(SUBREG, "subreg", "ei", RTX_EXTRA)
+DEF_RTL_EXPR(SUBREG, "subreg", "ep", RTX_EXTRA)
 
 /* This one-argument rtx is used for move instructions
    that are guaranteed to alter only the low part of a destination.
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 17:16:35.057923923 +0100
+++ gcc/rtl.h	2017-10-23 17:16:50.374527737 +0100
@@ -198,6 +198,7 @@ struct GTY((for_user)) reg_attrs {
 {
   int rt_int;
   unsigned int rt_uint;
+  poly_uint16_pod rt_subreg;
   const char *rt_str;
   rtx rt_rtx;
   rtvec rt_rtvec;
@@ -1330,6 +1331,7 @@ #define X0ANY(RTX, N)	   RTL_CHECK1 (RTX
 
 #define XCINT(RTX, N, C)      (RTL_CHECKC1 (RTX, N, C).rt_int)
 #define XCUINT(RTX, N, C)     (RTL_CHECKC1 (RTX, N, C).rt_uint)
+#define XCSUBREG(RTX, N, C)   (RTL_CHECKC1 (RTX, N, C).rt_subreg)
 #define XCSTR(RTX, N, C)      (RTL_CHECKC1 (RTX, N, C).rt_str)
 #define XCEXP(RTX, N, C)      (RTL_CHECKC1 (RTX, N, C).rt_rtx)
 #define XCVEC(RTX, N, C)      (RTL_CHECKC1 (RTX, N, C).rt_rtvec)
@@ -1920,7 +1922,7 @@ #define CONST_VECTOR_NUNITS(RTX) XCVECLE
    SUBREG_BYTE extracts the byte-number.  */
 
 #define SUBREG_REG(RTX) XCEXP (RTX, 0, SUBREG)
-#define SUBREG_BYTE(RTX) XCUINT (RTX, 1, SUBREG)
+#define SUBREG_BYTE(RTX) XCSUBREG (RTX, 1, SUBREG)
 
 /* in rtlanal.c */
 /* Return the right cost to give to an operation
@@ -1993,19 +1995,19 @@ costs_add_n_insns (struct full_rtx_costs
    offset     == the SUBREG_BYTE
    outer_mode == the mode of the SUBREG itself.  */
 struct subreg_shape {
-  subreg_shape (machine_mode, unsigned int, machine_mode);
+  subreg_shape (machine_mode, poly_uint16, machine_mode);
   bool operator == (const subreg_shape &) const;
   bool operator != (const subreg_shape &) const;
-  unsigned int unique_id () const;
+  unsigned HOST_WIDE_INT unique_id () const;
 
   machine_mode inner_mode;
-  unsigned int offset;
+  poly_uint16 offset;
   machine_mode outer_mode;
 };
 
 inline
 subreg_shape::subreg_shape (machine_mode inner_mode_in,
-			    unsigned int offset_in,
+			    poly_uint16 offset_in,
 			    machine_mode outer_mode_in)
   : inner_mode (inner_mode_in), offset (offset_in), outer_mode (outer_mode_in)
 {}
@@ -2014,7 +2016,7 @@ subreg_shape::subreg_shape (machine_mode
 subreg_shape::operator == (const subreg_shape &other) const
 {
   return (inner_mode == other.inner_mode
-	  && offset == other.offset
+	  && must_eq (offset, other.offset)
 	  && outer_mode == other.outer_mode);
 }
 
@@ -2029,11 +2031,16 @@ subreg_shape::operator != (const subreg_
    current mode is anywhere near being 65536 bytes in size, so the
    id comfortably fits in an int.  */
 
-inline unsigned int
+inline unsigned HOST_WIDE_INT
 subreg_shape::unique_id () const
 {
-  STATIC_ASSERT (MAX_MACHINE_MODE <= 256);
-  return (int) inner_mode + ((int) outer_mode << 8) + (offset << 16);
+  { STATIC_ASSERT (MAX_MACHINE_MODE <= 256); }
+  { STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 3); }
+  { STATIC_ASSERT (sizeof (offset.coeffs[0]) <= 2); }
+  int res = (int) inner_mode + ((int) outer_mode << 8);
+  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    res += (HOST_WIDE_INT) offset.coeffs[i] << ((1 + i) * 16);
+  return res;
 }
 
 /* Return the shape of a SUBREG rtx.  */
@@ -2287,11 +2294,10 @@ extern int rtx_cost (rtx, machine_mode,
 extern int address_cost (rtx, machine_mode, addr_space_t, bool);
 extern void get_full_rtx_cost (rtx, machine_mode, enum rtx_code, int,
 			       struct full_rtx_costs *);
-extern unsigned int subreg_lsb (const_rtx);
-extern unsigned int subreg_lsb_1 (machine_mode, machine_mode,
-				  unsigned int);
-extern unsigned int subreg_size_offset_from_lsb (unsigned int, unsigned int,
-						 unsigned int);
+extern poly_uint64 subreg_lsb (const_rtx);
+extern poly_uint64 subreg_lsb_1 (machine_mode, machine_mode, poly_uint64);
+extern poly_uint64 subreg_size_offset_from_lsb (poly_uint64, poly_uint64,
+						poly_uint64);
 extern bool read_modify_subreg_p (const_rtx);
 
 /* Return the subreg byte offset for a subreg whose outer mode is
@@ -2300,22 +2306,22 @@ extern bool read_modify_subreg_p (const_
    the inner value.  This is the inverse of subreg_lsb_1 (which converts
    byte offsets to bit shifts).  */
 
-inline unsigned int
+inline poly_uint64
 subreg_offset_from_lsb (machine_mode outer_mode,
 			machine_mode inner_mode,
-			unsigned int lsb_shift)
+			poly_uint64 lsb_shift)
 {
   return subreg_size_offset_from_lsb (GET_MODE_SIZE (outer_mode),
 				      GET_MODE_SIZE (inner_mode), lsb_shift);
 }
 
-extern unsigned int subreg_regno_offset	(unsigned int, machine_mode,
-					 unsigned int, machine_mode);
+extern unsigned int subreg_regno_offset (unsigned int, machine_mode,
+					 poly_uint64, machine_mode);
 extern bool subreg_offset_representable_p (unsigned int, machine_mode,
-					   unsigned int, machine_mode);
+					   poly_uint64, machine_mode);
 extern unsigned int subreg_regno (const_rtx);
 extern int simplify_subreg_regno (unsigned int, machine_mode,
-				  unsigned int, machine_mode);
+				  poly_uint64, machine_mode);
 extern unsigned int subreg_nregs (const_rtx);
 extern unsigned int subreg_nregs_with_regno (unsigned int, const_rtx);
 extern unsigned HOST_WIDE_INT nonzero_bits (const_rtx, machine_mode);
@@ -3016,7 +3022,7 @@ extern rtx operand_subword (rtx, unsigne
 /* In emit-rtl.c */
 extern rtx operand_subword_force (rtx, unsigned int, machine_mode);
 extern int subreg_lowpart_p (const_rtx);
-extern unsigned int subreg_size_lowpart_offset (unsigned int, unsigned int);
+extern poly_uint64 subreg_size_lowpart_offset (poly_uint64, poly_uint64);
 
 /* Return true if a subreg of mode OUTERMODE would only access part of
    an inner register with mode INNERMODE.  The other bits of the inner
@@ -3063,7 +3069,7 @@ paradoxical_subreg_p (const_rtx x)
 
 /* Return the SUBREG_BYTE for an OUTERMODE lowpart of an INNERMODE value.  */
 
-inline unsigned int
+inline poly_uint64
 subreg_lowpart_offset (machine_mode outermode, machine_mode innermode)
 {
   return subreg_size_lowpart_offset (GET_MODE_SIZE (outermode),
@@ -3098,20 +3104,21 @@ wider_subreg_mode (const_rtx x)
   return wider_subreg_mode (GET_MODE (x), GET_MODE (SUBREG_REG (x)));
 }
 
-extern unsigned int subreg_size_highpart_offset (unsigned int, unsigned int);
+extern poly_uint64 subreg_size_highpart_offset (poly_uint64, poly_uint64);
 
 /* Return the SUBREG_BYTE for an OUTERMODE highpart of an INNERMODE value.  */
 
-inline unsigned int
+inline poly_uint64
 subreg_highpart_offset (machine_mode outermode, machine_mode innermode)
 {
   return subreg_size_highpart_offset (GET_MODE_SIZE (outermode),
 				      GET_MODE_SIZE (innermode));
 }
 
-extern int byte_lowpart_offset (machine_mode, machine_mode);
-extern int subreg_memory_offset (machine_mode, machine_mode, unsigned int);
-extern int subreg_memory_offset (const_rtx);
+extern poly_int64 byte_lowpart_offset (machine_mode, machine_mode);
+extern poly_int64 subreg_memory_offset (machine_mode, machine_mode,
+					poly_uint64);
+extern poly_int64 subreg_memory_offset (const_rtx);
 extern rtx make_safe_from (rtx, rtx);
 extern rtx convert_memory_address_addr_space_1 (scalar_int_mode, rtx,
 						addr_space_t, bool, bool);
@@ -3263,16 +3270,8 @@ extern rtx simplify_gen_ternary (enum rt
 				 machine_mode, rtx, rtx, rtx);
 extern rtx simplify_gen_relational (enum rtx_code, machine_mode,
 				    machine_mode, rtx, rtx);
-extern rtx simplify_subreg (machine_mode, rtx, machine_mode,
-			    unsigned int);
-extern rtx simplify_gen_subreg (machine_mode, rtx, machine_mode,
-				unsigned int);
-inline rtx
-simplify_gen_subreg (machine_mode omode, rtx x, machine_mode imode,
-		     poly_uint64 offset)
-{
-  return simplify_gen_subreg (omode, x, imode, offset.to_constant ());
-}
+extern rtx simplify_subreg (machine_mode, rtx, machine_mode, poly_uint64);
+extern rtx simplify_gen_subreg (machine_mode, rtx, machine_mode, poly_uint64);
 extern rtx lowpart_subreg (machine_mode, rtx, machine_mode);
 extern rtx simplify_replace_fn_rtx (rtx, const_rtx,
 				    rtx (*fn) (rtx, const_rtx, void *), void *);
@@ -3458,7 +3457,7 @@ struct subreg_info
 };
 
 extern void subreg_get_info (unsigned int, machine_mode,
-			     unsigned int, machine_mode,
+			     poly_uint64, machine_mode,
 			     struct subreg_info *);
 
 /* lists.c */
@@ -3697,7 +3696,7 @@ extern rtx gen_rtx_CONST_VECTOR (machine
 extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
 extern rtx gen_raw_REG (machine_mode, unsigned int);
 extern rtx gen_rtx_REG (machine_mode, unsigned int);
-extern rtx gen_rtx_SUBREG (machine_mode, rtx, int);
+extern rtx gen_rtx_SUBREG (machine_mode, rtx, poly_uint64);
 extern rtx gen_rtx_MEM (machine_mode, rtx);
 extern rtx gen_rtx_VAR_LOCATION (machine_mode, tree, rtx,
 				 enum var_init_status);
@@ -3914,7 +3913,7 @@ extern rtx gen_const_mem (machine_mode,
 extern rtx gen_frame_mem (machine_mode, rtx);
 extern rtx gen_tmp_stack_mem (machine_mode, rtx);
 extern bool validate_subreg (machine_mode, machine_mode,
-			     const_rtx, unsigned int);
+			     const_rtx, poly_uint64);
 
 /* In combine.c  */
 extern unsigned int extended_count (const_rtx, machine_mode, int);
Index: gcc/rtl.c
===================================================================
--- gcc/rtl.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/rtl.c	2017-10-23 17:16:50.374527737 +0100
@@ -89,7 +89,8 @@ const char * const rtx_format[NUM_RTX_CO
      "b" is a pointer to a bitmap header.
      "B" is a basic block pointer.
      "t" is a tree pointer.
-     "r" a register.  */
+     "r" a register.
+     "p" is a poly_uint16 offset.  */
 
 #define DEF_RTL_EXPR(ENUM, NAME, FORMAT, CLASS)   FORMAT ,
 #include "rtl.def"		/* rtl expressions are defined here */
@@ -349,6 +350,7 @@ copy_rtx (rtx orig)
       case 't':
       case 'w':
       case 'i':
+      case 'p':
       case 's':
       case 'S':
       case 'T':
@@ -503,6 +505,11 @@ rtx_equal_p_cb (const_rtx x, const_rtx y
 	    }
 	  break;
 
+	case 'p':
+	  if (may_ne (SUBREG_BYTE (x), SUBREG_BYTE (y)))
+	    return 0;
+	  break;
+
 	case 'V':
 	case 'E':
 	  /* Two vectors must have the same length.  */
@@ -640,6 +647,11 @@ rtx_equal_p (const_rtx x, const_rtx y)
 	    }
 	  break;
 
+	case 'p':
+	  if (may_ne (SUBREG_BYTE (x), SUBREG_BYTE (y)))
+	    return 0;
+	  break;
+
 	case 'V':
 	case 'E':
 	  /* Two vectors must have the same length.  */
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/emit-rtl.c	2017-10-23 17:16:50.363529222 +0100
@@ -922,17 +922,17 @@ gen_tmp_stack_mem (machine_mode mode, rt
 
 bool
 validate_subreg (machine_mode omode, machine_mode imode,
-		 const_rtx reg, unsigned int offset)
+		 const_rtx reg, poly_uint64 offset)
 {
   unsigned int isize = GET_MODE_SIZE (imode);
   unsigned int osize = GET_MODE_SIZE (omode);
 
   /* All subregs must be aligned.  */
-  if (offset % osize != 0)
+  if (!multiple_p (offset, osize))
     return false;
 
   /* The subreg offset cannot be outside the inner object.  */
-  if (offset >= isize)
+  if (may_ge (offset, isize))
     return false;
 
   unsigned int regsize = REGMODE_NATURAL_SIZE (imode);
@@ -977,7 +977,7 @@ validate_subreg (machine_mode omode, mac
 
   /* Paradoxical subregs must have offset zero.  */
   if (osize > isize)
-    return offset == 0;
+    return known_zero (offset);
 
   /* This is a normal subreg.  Verify that the offset is representable.  */
 
@@ -1009,18 +1009,20 @@ validate_subreg (machine_mode omode, mac
   if (osize < regsize
       && ! (lra_in_progress && (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))))
     {
-      unsigned int block_size = MIN (isize, regsize);
-      unsigned int offset_within_block = offset % block_size;
-      if (BYTES_BIG_ENDIAN
-	  ? offset_within_block != block_size - osize
-	  : offset_within_block != 0)
+      poly_uint64 block_size = MIN (isize, regsize);
+      unsigned int start_reg;
+      poly_uint64 offset_within_reg;
+      if (!can_div_trunc_p (offset, block_size, &start_reg, &offset_within_reg)
+	  || (BYTES_BIG_ENDIAN
+	      ? may_ne (offset_within_reg, block_size - osize)
+	      : maybe_nonzero (offset_within_reg)))
 	return false;
     }
   return true;
 }
 
 rtx
-gen_rtx_SUBREG (machine_mode mode, rtx reg, int offset)
+gen_rtx_SUBREG (machine_mode mode, rtx reg, poly_uint64 offset)
 {
   gcc_assert (validate_subreg (mode, GET_MODE (reg), reg, offset));
   return gen_rtx_raw_SUBREG (mode, reg, offset);
@@ -1121,7 +1123,7 @@ gen_rtvec_v (int n, rtx_insn **argp)
    paradoxical lowpart, in which case the offset will be negative
    on big-endian targets.  */
 
-int
+poly_int64
 byte_lowpart_offset (machine_mode outer_mode,
 		     machine_mode inner_mode)
 {
@@ -1135,13 +1137,13 @@ byte_lowpart_offset (machine_mode outer_
    from address X.  For paradoxical big-endian subregs this is a
    negative value, otherwise it's the same as OFFSET.  */
 
-int
+poly_int64
 subreg_memory_offset (machine_mode outer_mode, machine_mode inner_mode,
-		      unsigned int offset)
+		      poly_uint64 offset)
 {
   if (paradoxical_subreg_p (outer_mode, inner_mode))
     {
-      gcc_assert (offset == 0);
+      gcc_assert (known_zero (offset));
       return -subreg_lowpart_offset (inner_mode, outer_mode);
     }
   return offset;
@@ -1151,7 +1153,7 @@ subreg_memory_offset (machine_mode outer
    if SUBREG_REG (X) were stored in memory.  The only significant thing
    about the current SUBREG_REG is its mode.  */
 
-int
+poly_int64
 subreg_memory_offset (const_rtx x)
 {
   return subreg_memory_offset (GET_MODE (x), GET_MODE (SUBREG_REG (x)),
@@ -1657,10 +1659,11 @@ gen_highpart_mode (machine_mode outermod
 /* Return the SUBREG_BYTE for a lowpart subreg whose outer mode has
    OUTER_BYTES bytes and whose inner mode has INNER_BYTES bytes.  */
 
-unsigned int
-subreg_size_lowpart_offset (unsigned int outer_bytes, unsigned int inner_bytes)
+poly_uint64
+subreg_size_lowpart_offset (poly_uint64 outer_bytes, poly_uint64 inner_bytes)
 {
-  if (outer_bytes > inner_bytes)
+  gcc_checking_assert (ordered_p (outer_bytes, inner_bytes));
+  if (may_gt (outer_bytes, inner_bytes))
     /* Paradoxical subregs always have a SUBREG_BYTE of 0.  */
     return 0;
 
@@ -1675,11 +1678,10 @@ subreg_size_lowpart_offset (unsigned int
 /* Return the SUBREG_BYTE for a highpart subreg whose outer mode has
    OUTER_BYTES bytes and whose inner mode has INNER_BYTES bytes.  */
 
-unsigned int
-subreg_size_highpart_offset (unsigned int outer_bytes,
-			     unsigned int inner_bytes)
+poly_uint64
+subreg_size_highpart_offset (poly_uint64 outer_bytes, poly_uint64 inner_bytes)
 {
-  gcc_assert (inner_bytes >= outer_bytes);
+  gcc_assert (must_ge (inner_bytes, outer_bytes));
 
   if (BYTES_BIG_ENDIAN && WORDS_BIG_ENDIAN)
     return 0;
@@ -1703,8 +1705,9 @@ subreg_lowpart_p (const_rtx x)
   else if (GET_MODE (SUBREG_REG (x)) == VOIDmode)
     return 0;
 
-  return (subreg_lowpart_offset (GET_MODE (x), GET_MODE (SUBREG_REG (x)))
-	  == SUBREG_BYTE (x));
+  return must_eq (subreg_lowpart_offset (GET_MODE (x),
+					 GET_MODE (SUBREG_REG (x))),
+		  SUBREG_BYTE (x));
 }
 \f
 /* Return subword OFFSET of operand OP.
@@ -5755,6 +5758,7 @@ copy_insn_1 (rtx orig)
       case 't':
       case 'w':
       case 'i':
+      case 'p':
       case 's':
       case 'S':
       case 'u':
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/rtlanal.c	2017-10-23 17:16:50.375527601 +0100
@@ -1586,7 +1586,7 @@ set_noop_p (const_rtx set)
 
   if (GET_CODE (src) == SUBREG && GET_CODE (dst) == SUBREG)
     {
-      if (SUBREG_BYTE (src) != SUBREG_BYTE (dst))
+      if (may_ne (SUBREG_BYTE (src), SUBREG_BYTE (dst)))
 	return 0;
       src = SUBREG_REG (src);
       dst = SUBREG_REG (dst);
@@ -3557,48 +3557,50 @@ loc_mentioned_in_p (rtx *loc, const_rtx
    and SUBREG_BYTE, return the bit offset where the subreg begins
    (counting from the least significant bit of the operand).  */
 
-unsigned int
+poly_uint64
 subreg_lsb_1 (machine_mode outer_mode,
 	      machine_mode inner_mode,
-	      unsigned int subreg_byte)
+	      poly_uint64 subreg_byte)
 {
-  unsigned int bitpos;
-  unsigned int byte;
-  unsigned int word;
+  poly_uint64 subreg_end, trailing_bytes, byte_pos;
 
   /* A paradoxical subreg begins at bit position 0.  */
   if (paradoxical_subreg_p (outer_mode, inner_mode))
     return 0;
 
-  if (WORDS_BIG_ENDIAN != BYTES_BIG_ENDIAN)
-    /* If the subreg crosses a word boundary ensure that
-       it also begins and ends on a word boundary.  */
-    gcc_assert (!((subreg_byte % UNITS_PER_WORD
-		  + GET_MODE_SIZE (outer_mode)) > UNITS_PER_WORD
-		  && (subreg_byte % UNITS_PER_WORD
-		      || GET_MODE_SIZE (outer_mode) % UNITS_PER_WORD)));
-
-  if (WORDS_BIG_ENDIAN)
-    word = (GET_MODE_SIZE (inner_mode)
-	    - (subreg_byte + GET_MODE_SIZE (outer_mode))) / UNITS_PER_WORD;
-  else
-    word = subreg_byte / UNITS_PER_WORD;
-  bitpos = word * BITS_PER_WORD;
-
-  if (BYTES_BIG_ENDIAN)
-    byte = (GET_MODE_SIZE (inner_mode)
-	    - (subreg_byte + GET_MODE_SIZE (outer_mode))) % UNITS_PER_WORD;
+  subreg_end = subreg_byte + GET_MODE_SIZE (outer_mode);
+  trailing_bytes = GET_MODE_SIZE (inner_mode) - subreg_end;
+  if (WORDS_BIG_ENDIAN && BYTES_BIG_ENDIAN)
+    byte_pos = trailing_bytes;
+  else if (!WORDS_BIG_ENDIAN && !BYTES_BIG_ENDIAN)
+    byte_pos = subreg_byte;
   else
-    byte = subreg_byte % UNITS_PER_WORD;
-  bitpos += byte * BITS_PER_UNIT;
+    {
+      /* When bytes and words have oppposite endianness, we must be able
+	 to split offsets into words and bytes at compile time.  */
+      poly_uint64 leading_word_part
+	= force_align_down (subreg_byte, UNITS_PER_WORD);
+      poly_uint64 trailing_word_part
+	= force_align_down (trailing_bytes, UNITS_PER_WORD);
+      /* If the subreg crosses a word boundary ensure that
+	 it also begins and ends on a word boundary.  */
+      gcc_assert (must_le (subreg_end - leading_word_part,
+			   (unsigned int) UNITS_PER_WORD)
+		  || (must_eq (leading_word_part, subreg_byte)
+		      && must_eq (trailing_word_part, trailing_bytes)));
+      if (WORDS_BIG_ENDIAN)
+	byte_pos = trailing_word_part + (subreg_byte - leading_word_part);
+      else
+	byte_pos = leading_word_part + (trailing_bytes - trailing_word_part);
+    }
 
-  return bitpos;
+  return byte_pos * BITS_PER_UNIT;
 }
 
 /* Given a subreg X, return the bit offset where the subreg begins
    (counting from the least significant bit of the reg).  */
 
-unsigned int
+poly_uint64
 subreg_lsb (const_rtx x)
 {
   return subreg_lsb_1 (GET_MODE (x), GET_MODE (SUBREG_REG (x)),
@@ -3611,29 +3613,32 @@ subreg_lsb (const_rtx x)
    lsb of the inner value.  This is the inverse of the calculation
    performed by subreg_lsb_1 (which converts byte offsets to bit shifts).  */
 
-unsigned int
-subreg_size_offset_from_lsb (unsigned int outer_bytes,
-			     unsigned int inner_bytes,
-			     unsigned int lsb_shift)
+poly_uint64
+subreg_size_offset_from_lsb (poly_uint64 outer_bytes, poly_uint64 inner_bytes,
+			     poly_uint64 lsb_shift)
 {
   /* A paradoxical subreg begins at bit position 0.  */
-  if (outer_bytes > inner_bytes)
+  gcc_checking_assert (ordered_p (outer_bytes, inner_bytes));
+  if (may_gt (outer_bytes, inner_bytes))
     {
-      gcc_checking_assert (lsb_shift == 0);
+      gcc_checking_assert (known_zero (lsb_shift));
       return 0;
     }
 
-  gcc_assert (lsb_shift % BITS_PER_UNIT == 0);
-  unsigned int lower_bytes = lsb_shift / BITS_PER_UNIT;
-  unsigned int upper_bytes = inner_bytes - (lower_bytes + outer_bytes);
+  poly_uint64 lower_bytes = exact_div (lsb_shift, BITS_PER_UNIT);
+  poly_uint64 upper_bytes = inner_bytes - (lower_bytes + outer_bytes);
   if (WORDS_BIG_ENDIAN && BYTES_BIG_ENDIAN)
     return upper_bytes;
   else if (!WORDS_BIG_ENDIAN && !BYTES_BIG_ENDIAN)
     return lower_bytes;
   else
     {
-      unsigned int lower_word_part = lower_bytes & -UNITS_PER_WORD;
-      unsigned int upper_word_part = upper_bytes & -UNITS_PER_WORD;
+      /* When bytes and words have oppposite endianness, we must be able
+	 to split offsets into words and bytes at compile time.  */
+      poly_uint64 lower_word_part = force_align_down (lower_bytes,
+						      UNITS_PER_WORD);
+      poly_uint64 upper_word_part = force_align_down (upper_bytes,
+						      UNITS_PER_WORD);
       if (WORDS_BIG_ENDIAN)
 	return upper_word_part + (lower_bytes - lower_word_part);
       else
@@ -3662,7 +3667,7 @@ subreg_size_offset_from_lsb (unsigned in
 
 void
 subreg_get_info (unsigned int xregno, machine_mode xmode,
-		 unsigned int offset, machine_mode ymode,
+		 poly_uint64 offset, machine_mode ymode,
 		 struct subreg_info *info)
 {
   unsigned int nregs_xmode, nregs_ymode;
@@ -3679,6 +3684,9 @@ subreg_get_info (unsigned int xregno, ma
      at least one register.  */
   if (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode))
     {
+      /* As a consequence, we must be dealing with a constant number of
+	 scalars, and thus a constant offset.  */
+      HOST_WIDE_INT coffset = offset.to_constant ();
       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
       unsigned int nunits = GET_MODE_NUNITS (xmode);
       scalar_mode xmode_unit = GET_MODE_INNER (xmode);
@@ -3697,9 +3705,9 @@ subreg_get_info (unsigned int xregno, ma
 	 3 for each part, but in memory it's two 128-bit parts.
 	 Padding is assumed to be at the end (not necessarily the 'high part')
 	 of each unit.  */
-      if ((offset / GET_MODE_SIZE (xmode_unit) + 1 < nunits)
-	  && (offset / GET_MODE_SIZE (xmode_unit)
-	      != ((offset + ysize - 1) / GET_MODE_SIZE (xmode_unit))))
+      if ((coffset / GET_MODE_SIZE (xmode_unit) + 1 < nunits)
+	  && (coffset / GET_MODE_SIZE (xmode_unit)
+	      != ((coffset + ysize - 1) / GET_MODE_SIZE (xmode_unit))))
 	{
 	  info->representable_p = false;
 	  rknown = true;
@@ -3711,7 +3719,7 @@ subreg_get_info (unsigned int xregno, ma
   nregs_ymode = hard_regno_nregs (xregno, ymode);
 
   /* Paradoxical subregs are otherwise valid.  */
-  if (!rknown && offset == 0 && ysize > xsize)
+  if (!rknown && known_zero (offset) && ysize > xsize)
     {
       info->representable_p = true;
       /* If this is a big endian paradoxical subreg, which uses more
@@ -3746,16 +3754,22 @@ subreg_get_info (unsigned int xregno, ma
 	{
 	  info->representable_p = false;
 	  info->nregs = CEIL (ysize, regsize_xmode);
-	  info->offset = offset / regsize_xmode;
+	  if (!can_div_trunc_p (offset, regsize_xmode, &info->offset))
+	    /* Checked by validate_subreg.  We must know at compile time
+	       which inner registers are being accessed.  */
+	    gcc_unreachable ();
 	  return;
 	}
       /* It's not valid to extract a subreg of mode YMODE at OFFSET that
 	 would go outside of XMODE.  */
-      if (!rknown && ysize + offset > xsize)
+      if (!rknown && may_gt (ysize + offset, xsize))
 	{
 	  info->representable_p = false;
 	  info->nregs = nregs_ymode;
-	  info->offset = offset / regsize_xmode;
+	  if (!can_div_trunc_p (offset, regsize_xmode, &info->offset))
+	    /* Checked by validate_subreg.  We must know at compile time
+	       which inner registers are being accessed.  */
+	    gcc_unreachable ();
 	  return;
 	}
       /* Quick exit for the simple and common case of extracting whole
@@ -3763,26 +3777,27 @@ subreg_get_info (unsigned int xregno, ma
       /* ??? It would be better to integrate this into the code below,
 	 if we can generalize the concept enough and figure out how
 	 odd-sized modes can coexist with the other weird cases we support.  */
+      HOST_WIDE_INT count;
       if (!rknown
 	  && WORDS_BIG_ENDIAN == REG_WORDS_BIG_ENDIAN
 	  && regsize_xmode == regsize_ymode
-	  && (offset % regsize_ymode) == 0)
+	  && constant_multiple_p (offset, regsize_ymode, &count))
 	{
 	  info->representable_p = true;
 	  info->nregs = nregs_ymode;
-	  info->offset = offset / regsize_ymode;
+	  info->offset = count;
 	  gcc_assert (info->offset + info->nregs <= (int) nregs_xmode);
 	  return;
 	}
     }
 
   /* Lowpart subregs are otherwise valid.  */
-  if (!rknown && offset == subreg_lowpart_offset (ymode, xmode))
+  if (!rknown && must_eq (offset, subreg_lowpart_offset (ymode, xmode)))
     {
       info->representable_p = true;
       rknown = true;
 
-      if (offset == 0 || nregs_xmode == nregs_ymode)
+      if (known_zero (offset) || nregs_xmode == nregs_ymode)
 	{
 	  info->offset = 0;
 	  info->nregs = nregs_ymode;
@@ -3803,19 +3818,24 @@ subreg_get_info (unsigned int xregno, ma
      These conditions may be relaxed but subreg_regno_offset would
      need to be redesigned.  */
   gcc_assert ((xsize % num_blocks) == 0);
-  unsigned int bytes_per_block = xsize / num_blocks;
+  poly_uint64 bytes_per_block = xsize / num_blocks;
 
   /* Get the number of the first block that contains the subreg and the byte
      offset of the subreg from the start of that block.  */
-  unsigned int block_number = offset / bytes_per_block;
-  unsigned int subblock_offset = offset % bytes_per_block;
+  unsigned int block_number;
+  poly_uint64 subblock_offset;
+  if (!can_div_trunc_p (offset, bytes_per_block, &block_number,
+			&subblock_offset))
+    /* Checked by validate_subreg.  We must know at compile time which
+       inner registers are being accessed.  */
+    gcc_unreachable ();
 
   if (!rknown)
     {
       /* Only the lowpart of each block is representable.  */
       info->representable_p
-	= (subblock_offset
-	   == subreg_size_lowpart_offset (ysize, bytes_per_block));
+	= must_eq (subblock_offset,
+		   subreg_size_lowpart_offset (ysize, bytes_per_block));
       rknown = true;
     }
 
@@ -3842,7 +3862,7 @@ subreg_get_info (unsigned int xregno, ma
    RETURN - The regno offset which would be used.  */
 unsigned int
 subreg_regno_offset (unsigned int xregno, machine_mode xmode,
-		     unsigned int offset, machine_mode ymode)
+		     poly_uint64 offset, machine_mode ymode)
 {
   struct subreg_info info;
   subreg_get_info (xregno, xmode, offset, ymode, &info);
@@ -3858,7 +3878,7 @@ subreg_regno_offset (unsigned int xregno
    RETURN - Whether the offset is representable.  */
 bool
 subreg_offset_representable_p (unsigned int xregno, machine_mode xmode,
-			       unsigned int offset, machine_mode ymode)
+			       poly_uint64 offset, machine_mode ymode)
 {
   struct subreg_info info;
   subreg_get_info (xregno, xmode, offset, ymode, &info);
@@ -3875,7 +3895,7 @@ subreg_offset_representable_p (unsigned
 
 int
 simplify_subreg_regno (unsigned int xregno, machine_mode xmode,
-		       unsigned int offset, machine_mode ymode)
+		       poly_uint64 offset, machine_mode ymode)
 {
   struct subreg_info info;
   unsigned int yregno;
Index: gcc/rtlhash.c
===================================================================
--- gcc/rtlhash.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/rtlhash.c	2017-10-23 17:16:50.375527601 +0100
@@ -87,6 +87,9 @@ add_rtx (const_rtx x, hash &hstate)
       case 'i':
 	hstate.add_int (XINT (x, i));
 	break;
+      case 'p':
+	hstate.add_poly_int (SUBREG_BYTE (x));
+	break;
       case 'V':
       case 'E':
 	j = XVECLEN (x, i);
Index: gcc/genemit.c
===================================================================
--- gcc/genemit.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/genemit.c	2017-10-23 17:16:50.366528817 +0100
@@ -235,6 +235,12 @@ gen_exp (rtx x, enum rtx_code subroutine
 	  printf ("%u", REGNO (x));
 	  break;
 
+	case 'p':
+	  /* We don't have a way of parsing polynomial offsets yet,
+	     and hopefully never will.  */
+	  printf ("%d", SUBREG_BYTE (x).to_constant ());
+	  break;
+
 	case 's':
 	  printf ("\"%s\"", XSTR (x, i));
 	  break;
Index: gcc/gengenrtl.c
===================================================================
--- gcc/gengenrtl.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/gengenrtl.c	2017-10-23 17:16:50.366528817 +0100
@@ -54,6 +54,9 @@ type_from_format (int c)
     case 'w':
       return "HOST_WIDE_INT ";
 
+    case 'p':
+      return "poly_uint16 ";
+
     case 's':
       return "const char *";
 
@@ -257,10 +260,12 @@ gendef (const char *format)
   puts ("  PUT_MODE_RAW (rt, mode);");
 
   for (p = format, i = j = 0; *p ; ++p, ++i)
-    if (*p != '0')
-      printf ("  %s (rt, %d) = arg%d;\n", accessor_from_format (*p), i, j++);
-    else
+    if (*p == '0')
       printf ("  X0EXP (rt, %d) = NULL_RTX;\n", i);
+    else if (*p == 'p')
+      printf ("  SUBREG_BYTE (rt) = arg%d;\n", j++);
+    else
+      printf ("  %s (rt, %d) = arg%d;\n", accessor_from_format (*p), i, j++);
 
   puts ("\n  return rt;\n}\n");
   printf ("#define gen_rtx_fmt_%s(c, m", format);
Index: gcc/gensupport.c
===================================================================
--- gcc/gensupport.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/gensupport.c	2017-10-23 17:16:50.368528547 +0100
@@ -883,7 +883,7 @@ subst_pattern_match (rtx x, rtx pt, file
 
       switch (fmt[i])
 	{
-	case 'i': case 'r': case 'w': case 's':
+	case 'r': case 'p': case 'i': case 'w': case 's':
 	  continue;
 
 	case 'e': case 'u':
@@ -1047,7 +1047,8 @@ get_alternatives_number (rtx pattern, in
 	      return 0;
 	  break;
 
-	case 'i': case 'r': case 'w': case '0': case 's': case 'S': case 'T':
+	case 'r': case 'p': case 'i': case 'w':
+	case '0': case 's': case 'S': case 'T':
 	  break;
 
 	default:
@@ -1106,7 +1107,8 @@ collect_insn_data (rtx pattern, int *pal
 	    collect_insn_data (XVECEXP (pattern, i, j), palt, pmax);
 	  break;
 
-	case 'i': case 'r': case 'w': case '0': case 's': case 'S': case 'T':
+	case 'r': case 'p': case 'i': case 'w':
+	case '0': case 's': case 'S': case 'T':
 	  break;
 
 	default:
@@ -1190,7 +1192,7 @@ alter_predicate_for_insn (rtx pattern, i
 	    }
 	  break;
 
-	case 'i': case 'r': case 'w': case '0': case 's':
+	case 'r': case 'p': case 'i': case 'w': case '0': case 's':
 	  break;
 
 	default:
@@ -1248,7 +1250,7 @@ alter_constraints (rtx pattern, int n_du
 	    }
 	  break;
 
-	case 'i': case 'r': case 'w': case '0': case 's':
+	case 'r': case 'p': case 'i': case 'w': case '0': case 's':
 	  break;
 
 	default:
@@ -2164,7 +2166,8 @@ subst_dup (rtx pattern, int n_alt, int n
 						   n_alt, n_subst_alt);
 	  break;
 
-	case 'i': case 'r': case 'w': case '0': case 's': case 'S': case 'T':
+	case 'r': case 'p': case 'i': case 'w':
+	case '0': case 's': case 'S': case 'T':
 	  break;
 
 	default:
Index: gcc/gengtype.c
===================================================================
--- gcc/gengtype.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/gengtype.c	2017-10-23 17:16:50.367528682 +0100
@@ -1241,6 +1241,11 @@ adjust_field_rtx_def (type_p t, options_
 	      subname = "rt_int";
 	      break;
 
+	    case 'p':
+	      t = scalar_tp;
+	      subname = "rt_subreg";
+	      break;
+
 	    case '0':
 	      if (i == MEM && aindex == 1)
 		t = mem_attrs_tp, subname = "rt_mem";
Index: gcc/genrecog.c
===================================================================
--- gcc/genrecog.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/genrecog.c	2017-10-23 17:16:50.367528682 +0100
@@ -388,7 +388,7 @@ find_operand (rtx pattern, int n, rtx st
 	      return r;
 	  break;
 
-	case 'i': case 'r': case 'w': case '0': case 's':
+	case 'r': case 'p': case 'i': case 'w': case '0': case 's':
 	  break;
 
 	default:
@@ -439,7 +439,7 @@ find_matching_operand (rtx pattern, int
 	      return r;
 	  break;
 
-	case 'i': case 'r': case 'w': case '0': case 's':
+	case 'r': case 'p': case 'i': case 'w': case '0': case 's':
 	  break;
 
 	default:
@@ -797,7 +797,7 @@ validate_pattern (rtx pattern, md_rtx_in
 	    validate_pattern (XVECEXP (pattern, i, j), info, NULL_RTX, 0);
 	  break;
 
-	case 'i': case 'r': case 'w': case '0': case 's':
+	case 'r': case 'p': case 'i': case 'w': case '0': case 's':
 	  break;
 
 	default:
@@ -1119,6 +1119,9 @@ struct rtx_test
     /* Check REGNO (X) == LABEL.  */
     REGNO_FIELD,
 
+    /* Check must_eq (SUBREG_BYTE (X), LABEL).  */
+    SUBREG_FIELD,
+
     /* Check XINT (X, u.opno) == LABEL.  */
     INT_FIELD,
 
@@ -1199,6 +1202,7 @@ struct rtx_test
   static rtx_test code (position *);
   static rtx_test mode (position *);
   static rtx_test regno_field (position *);
+  static rtx_test subreg_field (position *);
   static rtx_test int_field (position *, int);
   static rtx_test wide_int_field (position *, int);
   static rtx_test veclen (position *);
@@ -1244,6 +1248,13 @@ rtx_test::regno_field (position *pos)
 }
 
 rtx_test
+rtx_test::subreg_field (position *pos)
+{
+  rtx_test res (pos, rtx_test::SUBREG_FIELD);
+  return res;
+}
+
+rtx_test
 rtx_test::int_field (position *pos, int opno)
 {
   rtx_test res (pos, rtx_test::INT_FIELD);
@@ -1364,6 +1375,7 @@ operator == (const rtx_test &a, const rt
     case rtx_test::CODE:
     case rtx_test::MODE:
     case rtx_test::REGNO_FIELD:
+    case rtx_test::SUBREG_FIELD:
     case rtx_test::VECLEN:
     case rtx_test::HAVE_NUM_CLOBBERS:
       return true;
@@ -1821,6 +1833,7 @@ safe_to_hoist_p (decision *d, const rtx_
       gcc_unreachable ();
 
     case rtx_test::REGNO_FIELD:
+    case rtx_test::SUBREG_FIELD:
     case rtx_test::INT_FIELD:
     case rtx_test::WIDE_INT_FIELD:
     case rtx_test::VECLEN:
@@ -2028,6 +2041,7 @@ transition_parameter_type (rtx_test::kin
       return parameter::MODE;
 
     case rtx_test::REGNO_FIELD:
+    case rtx_test::SUBREG_FIELD:
       return parameter::UINT;
 
     case rtx_test::INT_FIELD:
@@ -4039,6 +4053,14 @@ match_pattern_2 (state *s, md_rtx_info *
 				      XWINT (pattern, 0), false);
 		    break;
 
+		  case 'p':
+		    /* We don't have a way of parsing polynomial offsets yet,
+		       and hopefully never will.  */
+		    s = add_decision (s, rtx_test::subreg_field (pos),
+				      SUBREG_BYTE (pattern).to_constant (),
+				      false);
+		    break;
+
 		  case '0':
 		    break;
 
@@ -4571,6 +4593,12 @@ print_nonbool_test (output_state *os, co
       printf (")");
       break;
 
+    case rtx_test::SUBREG_FIELD:
+      printf ("SUBREG_BYTE (");
+      print_test_rtx (os, test);
+      printf (")");
+      break;
+
     case rtx_test::WIDE_INT_FIELD:
       printf ("XWINT (");
       print_test_rtx (os, test);
@@ -4653,6 +4681,14 @@ print_test (output_state *os, const rtx_
       print_label_value (test, is_param, value);
       break;
 
+    case rtx_test::SUBREG_FIELD:
+      printf ("%s (", invert_p ? "may_ne" : "must_eq");
+      print_nonbool_test (os, test);
+      printf (", ");
+      print_label_value (test, is_param, value);
+      printf (")");
+      break;
+
     case rtx_test::SAVED_CONST_INT:
       gcc_assert (!is_param && value == 1);
       print_test_rtx (os, test);
Index: gcc/genattrtab.c
===================================================================
--- gcc/genattrtab.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/genattrtab.c	2017-10-23 17:16:50.366528817 +0100
@@ -563,6 +563,7 @@ attr_rtx_1 (enum rtx_code code, va_list
 	      break;
 
 	    default:
+	      /* Don't need to handle 'p' for attributes.  */
 	      gcc_unreachable ();
 	    }
 	}
Index: gcc/genpeep.c
===================================================================
--- gcc/genpeep.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/genpeep.c	2017-10-23 17:16:50.367528682 +0100
@@ -306,6 +306,9 @@ match_rtx (rtx x, struct link *path, int
 	  printf ("  if (strcmp (XSTR (x, %d), \"%s\")) goto L%d;\n",
 		  i, XSTR (x, i), fail_label);
 	}
+      else if (fmt[i] == 'p')
+	/* Not going to support subregs for legacy define_peeholes.  */
+	gcc_unreachable ();
     }
 }
 
Index: gcc/print-rtl.c
===================================================================
--- gcc/print-rtl.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/print-rtl.c	2017-10-23 17:16:50.371528142 +0100
@@ -178,6 +178,7 @@ print_mem_expr (FILE *outfile, const_tre
   fputc (' ', outfile);
   print_generic_expr (outfile, CONST_CAST_TREE (expr), dump_flags);
 }
+#endif
 
 /* Print X to FILE.  */
 
@@ -195,7 +196,6 @@ print_poly_int (FILE *file, poly_int64 x
       fprintf (file, "]");
     }
 }
-#endif
 
 /* Subroutine of print_rtx_operand for handling code '0'.
    0 indicates a field for internal use that should not be printed.
@@ -628,6 +628,11 @@ rtx_writer::print_rtx_operand (const_rtx
       print_rtx_operand_code_i (in_rtx, idx);
       break;
 
+    case 'p':
+      fprintf (m_outfile, " ");
+      print_poly_int (m_outfile, SUBREG_BYTE (in_rtx));
+      break;
+
     case 'r':
       print_rtx_operand_code_r (in_rtx);
       break;
@@ -1661,7 +1666,8 @@ print_value (pretty_printer *pp, const_r
       break;
     case SUBREG:
       print_value (pp, SUBREG_REG (x), verbose);
-      pp_printf (pp, "#%d", SUBREG_BYTE (x));
+      pp_printf (pp, "#");
+      pp_wide_integer (pp, SUBREG_BYTE (x));
       break;
     case SCRATCH:
     case CC0:
Index: gcc/read-rtl.c
===================================================================
--- gcc/read-rtl.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/read-rtl.c	2017-10-23 17:16:50.371528142 +0100
@@ -222,7 +222,10 @@ find_int (const char *name)
 static void
 apply_int_iterator (rtx x, unsigned int index, int value)
 {
-  XINT (x, index) = value;
+  if (GET_CODE (x) == SUBREG)
+    SUBREG_BYTE (x) = value;
+  else
+    XINT (x, index) = value;
 }
 
 #ifdef GENERATOR_FILE
@@ -1608,6 +1611,7 @@ rtx_reader::read_rtx_operand (rtx return
 
     case 'i':
     case 'n':
+    case 'p':
       /* Can be an iterator or an integer constant.  */
       read_name (&name);
       record_potential_iterator_use (&ints, return_rtx, idx, name.string);
Index: gcc/alias.c
===================================================================
--- gcc/alias.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/alias.c	2017-10-23 17:16:50.356530167 +0100
@@ -1833,6 +1833,11 @@ rtx_equal_for_memref_p (const_rtx x, con
 	    return 0;
 	  break;
 
+	case 'p':
+	  if (may_ne (SUBREG_BYTE (x), SUBREG_BYTE (y)))
+	    return 0;
+	  break;
+
 	case 'E':
 	  /* Two vectors must have the same length.  */
 	  if (XVECLEN (x, i) != XVECLEN (y, i))
Index: gcc/cselib.c
===================================================================
--- gcc/cselib.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/cselib.c	2017-10-23 17:16:50.359529762 +0100
@@ -987,6 +987,11 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
 	    return 0;
 	  break;
 
+	case 'p':
+	  if (may_ne (SUBREG_BYTE (x), SUBREG_BYTE (y)))
+	    return 0;
+	  break;
+
 	case 'V':
 	case 'E':
 	  /* Two vectors must have the same length.  */
@@ -1278,6 +1283,10 @@ cselib_hash_rtx (rtx x, int create, mach
 	  hash += XINT (x, i);
 	  break;
 
+	case 'p':
+	  hash += constant_lower_bound (SUBREG_BYTE (x));
+	  break;
+
 	case '0':
 	case 't':
 	  /* unused */
Index: gcc/caller-save.c
===================================================================
--- gcc/caller-save.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/caller-save.c	2017-10-23 17:16:50.356530167 +0100
@@ -1129,7 +1129,7 @@ replace_reg_with_saved_mem (rtx *loc,
 	{
 	  /* This is gen_lowpart_if_possible(), but without validating
 	     the newly-formed address.  */
-	  HOST_WIDE_INT offset = byte_lowpart_offset (mode, GET_MODE (mem));
+	  poly_int64 offset = byte_lowpart_offset (mode, GET_MODE (mem));
 	  mem = adjust_address_nv (mem, mode, offset);
 	}
     }
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/calls.c	2017-10-23 17:16:50.357530032 +0100
@@ -4126,8 +4126,8 @@ expand_call (tree exp, rtx target, int i
 					 funtype, 1);
 	  gcc_assert (GET_MODE (target) == pmode);
 
-	  unsigned int offset = subreg_lowpart_offset (TYPE_MODE (type),
-						       GET_MODE (target));
+	  poly_uint64 offset = subreg_lowpart_offset (TYPE_MODE (type),
+						      GET_MODE (target));
 	  target = gen_rtx_SUBREG (TYPE_MODE (type), target, offset);
 	  SUBREG_PROMOTED_VAR_P (target) = 1;
 	  SUBREG_PROMOTED_SET (target, unsignedp);
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/combine.c	2017-10-23 17:16:50.358529897 +0100
@@ -5826,7 +5826,7 @@ combine_simplify_rtx (rtx x, machine_mod
 
       /* See if this can be moved to simplify_subreg.  */
       if (CONSTANT_P (SUBREG_REG (x))
-	  && subreg_lowpart_offset (mode, op0_mode) == SUBREG_BYTE (x)
+	  && must_eq (subreg_lowpart_offset (mode, op0_mode), SUBREG_BYTE (x))
 	     /* Don't call gen_lowpart if the inner mode
 		is VOIDmode and we cannot simplify it, as SUBREG without
 		inner mode is invalid.  */
@@ -5850,8 +5850,8 @@ combine_simplify_rtx (rtx x, machine_mod
 	    && is_a <scalar_int_mode> (op0_mode, &int_op0_mode)
 	    && (GET_MODE_PRECISION (int_mode)
 		< GET_MODE_PRECISION (int_op0_mode))
-	    && (subreg_lowpart_offset (int_mode, int_op0_mode)
-		== SUBREG_BYTE (x))
+	    && must_eq (subreg_lowpart_offset (int_mode, int_op0_mode),
+			SUBREG_BYTE (x))
 	    && HWI_COMPUTABLE_MODE_P (int_op0_mode)
 	    && (nonzero_bits (SUBREG_REG (x), int_op0_mode)
 		& GET_MODE_MASK (int_mode)) == 0)
@@ -7320,7 +7320,8 @@ expand_field_assignment (const_rtx x)
 	{
 	  inner = SUBREG_REG (XEXP (SET_DEST (x), 0));
 	  len = GET_MODE_PRECISION (GET_MODE (XEXP (SET_DEST (x), 0)));
-	  pos = GEN_INT (subreg_lsb (XEXP (SET_DEST (x), 0)));
+	  pos = gen_int_mode (subreg_lsb (XEXP (SET_DEST (x), 0)),
+			      MAX_MODE_INT);
 	}
       else if (GET_CODE (SET_DEST (x)) == ZERO_EXTRACT
 	       && CONST_INT_P (XEXP (SET_DEST (x), 1)))
@@ -7569,7 +7570,7 @@ make_extraction (machine_mode mode, rtx
 		 return a new hard register.  */
 	      if (pos || in_dest)
 		{
-		  unsigned int offset
+		  poly_uint64 offset
 		    = subreg_offset_from_lsb (tmode, inner_mode, pos);
 
 		  /* Avoid creating invalid subregs, for example when
@@ -11626,7 +11627,7 @@ gen_lowpart_for_combine (machine_mode om
       if (paradoxical_subreg_p (omode, imode))
 	return gen_rtx_SUBREG (omode, x, 0);
 
-      HOST_WIDE_INT offset = byte_lowpart_offset (omode, imode);
+      poly_int64 offset = byte_lowpart_offset (omode, imode);
       return adjust_address_nv (x, omode, offset);
     }
 
Index: gcc/loop-invariant.c
===================================================================
--- gcc/loop-invariant.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/loop-invariant.c	2017-10-23 17:16:50.370528277 +0100
@@ -335,6 +335,8 @@ hash_invariant_expr_1 (rtx_insn *insn, r
 	}
       else if (fmt[i] == 'i' || fmt[i] == 'n')
 	val ^= XINT (x, i);
+      else if (fmt[i] == 'p')
+	val ^= constant_lower_bound (SUBREG_BYTE (x));
     }
 
   return val;
@@ -420,6 +422,11 @@ invariant_expr_equal_p (rtx_insn *insn1,
 	  if (XINT (e1, i) != XINT (e2, i))
 	    return false;
 	}
+      else if (fmt[i] == 'p')
+	{
+	  if (may_ne (SUBREG_BYTE (e1), SUBREG_BYTE (e2)))
+	    return false;
+	}
       /* Unhandled type of subexpression, we fail conservatively.  */
       else
 	return false;
Index: gcc/cse.c
===================================================================
--- gcc/cse.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/cse.c	2017-10-23 17:16:50.359529762 +0100
@@ -561,7 +561,7 @@ static struct table_elt *insert (rtx, st
 static void merge_equiv_classes (struct table_elt *, struct table_elt *);
 static void invalidate (rtx, machine_mode);
 static void remove_invalid_refs (unsigned int);
-static void remove_invalid_subreg_refs (unsigned int, unsigned int,
+static void remove_invalid_subreg_refs (unsigned int, poly_uint64,
 					machine_mode);
 static void rehash_using_reg (rtx);
 static void invalidate_memory (void);
@@ -1994,12 +1994,11 @@ remove_invalid_refs (unsigned int regno)
 /* Likewise for a subreg with subreg_reg REGNO, subreg_byte OFFSET,
    and mode MODE.  */
 static void
-remove_invalid_subreg_refs (unsigned int regno, unsigned int offset,
+remove_invalid_subreg_refs (unsigned int regno, poly_uint64 offset,
 			    machine_mode mode)
 {
   unsigned int i;
   struct table_elt *p, *next;
-  unsigned int end = offset + (GET_MODE_SIZE (mode) - 1);
 
   for (i = 0; i < HASH_SIZE; i++)
     for (p = table[i]; p; p = next)
@@ -2011,9 +2010,9 @@ remove_invalid_subreg_refs (unsigned int
 	    && (GET_CODE (exp) != SUBREG
 		|| !REG_P (SUBREG_REG (exp))
 		|| REGNO (SUBREG_REG (exp)) != regno
-		|| (((SUBREG_BYTE (exp)
-		      + (GET_MODE_SIZE (GET_MODE (exp)) - 1)) >= offset)
-		    && SUBREG_BYTE (exp) <= end))
+		|| ranges_may_overlap_p (SUBREG_BYTE (exp),
+					 GET_MODE_SIZE (GET_MODE (exp)),
+					 offset, GET_MODE_SIZE (mode)))
 	    && refers_to_regno_p (regno, p->exp))
 	  remove_from_table (p, i);
       }
@@ -2307,7 +2306,8 @@ hash_rtx_cb (const_rtx x, machine_mode m
 	  {
 	    hash += (((unsigned int) SUBREG << 7)
 		     + REGNO (SUBREG_REG (x))
-		     + (SUBREG_BYTE (x) / UNITS_PER_WORD));
+		     + (constant_lower_bound (SUBREG_BYTE (x))
+			/ UNITS_PER_WORD));
 	    return hash;
 	  }
 	break;
@@ -2526,6 +2526,10 @@ hash_rtx_cb (const_rtx x, machine_mode m
 	  hash += (unsigned int) XINT (x, i);
 	  break;
 
+	case 'p':
+	  hash += constant_lower_bound (SUBREG_BYTE (x));
+	  break;
+
 	case '0': case 't':
 	  /* Unused.  */
 	  break;
@@ -2776,6 +2780,11 @@ exp_equiv_p (const_rtx x, const_rtx y, i
 	    return 0;
 	  break;
 
+	case 'p':
+	  if (may_ne (SUBREG_BYTE (x), SUBREG_BYTE (y)))
+	    return 0;
+	  break;
+
 	case '0':
 	case 't':
 	  break;
@@ -3801,8 +3810,9 @@ equiv_constant (rtx x)
       if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (word_mode)
 	  && GET_MODE_SIZE (word_mode) < GET_MODE_SIZE (imode))
 	{
-	  int byte = SUBREG_BYTE (x) - subreg_lowpart_offset (mode, word_mode);
-	  if (byte >= 0 && (byte % UNITS_PER_WORD) == 0)
+	  poly_int64 byte = (SUBREG_BYTE (x)
+			     - subreg_lowpart_offset (mode, word_mode));
+	  if (must_ge (byte, 0) && multiple_p (byte, UNITS_PER_WORD))
 	    {
 	      rtx y = gen_rtx_SUBREG (word_mode, SUBREG_REG (x), byte);
 	      new_rtx = lookup_as_function (y, CONST_INT);
@@ -6002,7 +6012,7 @@ cse_insn (rtx_insn *insn)
 		  new_src = elt->exp;
 		else
 		  {
-		    unsigned int byte
+		    poly_uint64 byte
 		      = subreg_lowpart_offset (new_mode, GET_MODE (dest));
 		    new_src = simplify_gen_subreg (new_mode, elt->exp,
 					           GET_MODE (dest), byte);
Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/dse.c	2017-10-23 17:16:50.360529627 +0100
@@ -1703,7 +1703,7 @@ find_shift_sequence (poly_int64 access_s
 	 e.g. at -Os, even when no actual shift will be needed.  */
       if (store_info->const_rhs)
 	{
-	  unsigned int byte = subreg_lowpart_offset (new_mode, store_mode);
+	  poly_uint64 byte = subreg_lowpart_offset (new_mode, store_mode);
 	  rtx ret = simplify_subreg (new_mode, store_info->const_rhs,
 				     store_mode, byte);
 	  if (ret && CONSTANT_P (ret))
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/dwarf2out.c	2017-10-23 17:16:50.362529357 +0100
@@ -19152,8 +19152,8 @@ rtl_for_decl_location (tree decl)
 	   && GET_MODE (rtl) != TYPE_MODE (TREE_TYPE (decl)))
     {
       machine_mode addr_mode = get_address_mode (rtl);
-      HOST_WIDE_INT offset = byte_lowpart_offset (TYPE_MODE (TREE_TYPE (decl)),
-						  GET_MODE (rtl));
+      poly_int64 offset = byte_lowpart_offset (TYPE_MODE (TREE_TYPE (decl)),
+					       GET_MODE (rtl));
 
       /* If a variable is declared "register" yet is smaller than
 	 a register, then if we store the variable to memory, it
@@ -19161,7 +19161,7 @@ rtl_for_decl_location (tree decl)
 	 fact we are not.  We need to adjust the offset of the
 	 storage location to reflect the actual value's bytes,
 	 else gdb will not be able to display it.  */
-      if (offset != 0)
+      if (maybe_nonzero (offset))
 	rtl = gen_rtx_MEM (TYPE_MODE (TREE_TYPE (decl)),
 			   plus_constant (addr_mode, XEXP (rtl, 0), offset));
     }
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/expmed.c	2017-10-23 17:16:50.363529222 +0100
@@ -2344,7 +2344,7 @@ extract_low_bits (machine_mode mode, mac
       /* simplify_gen_subreg can't be used here, as if simplify_subreg
 	 fails, it will happily create (subreg (symbol_ref)) or similar
 	 invalid SUBREGs.  */
-      unsigned int byte = subreg_lowpart_offset (mode, src_mode);
+      poly_uint64 byte = subreg_lowpart_offset (mode, src_mode);
       rtx ret = simplify_subreg (mode, src, src_mode, byte);
       if (ret)
 	return ret;
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/expr.c	2017-10-23 17:16:50.364529087 +0100
@@ -2446,7 +2446,7 @@ emit_group_store (rtx orig_dst, rtx src,
     {
       machine_mode outer = GET_MODE (dst);
       machine_mode inner;
-      HOST_WIDE_INT bytepos;
+      poly_int64 bytepos;
       bool done = false;
       rtx temp;
 
@@ -2461,7 +2461,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	{
 	  inner = GET_MODE (tmps[start]);
 	  bytepos = subreg_lowpart_offset (inner, outer);
-	  if (INTVAL (XEXP (XVECEXP (src, 0, start), 1)) == bytepos)
+	  if (must_eq (INTVAL (XEXP (XVECEXP (src, 0, start), 1)), bytepos))
 	    {
 	      temp = simplify_gen_subreg (outer, tmps[start],
 					  inner, 0);
@@ -2480,7 +2480,8 @@ emit_group_store (rtx orig_dst, rtx src,
 	{
 	  inner = GET_MODE (tmps[finish - 1]);
 	  bytepos = subreg_lowpart_offset (inner, outer);
-	  if (INTVAL (XEXP (XVECEXP (src, 0, finish - 1), 1)) == bytepos)
+	  if (must_eq (INTVAL (XEXP (XVECEXP (src, 0, finish - 1), 1)),
+		       bytepos))
 	    {
 	      temp = simplify_gen_subreg (outer, tmps[finish - 1],
 					  inner, 0);
@@ -3543,9 +3544,9 @@ undefined_operand_subword_p (const_rtx o
   if (GET_CODE (op) != SUBREG)
     return false;
   machine_mode innermostmode = GET_MODE (SUBREG_REG (op));
-  HOST_WIDE_INT offset = i * UNITS_PER_WORD + subreg_memory_offset (op);
-  return (offset >= GET_MODE_SIZE (innermostmode)
-	  || offset <= -UNITS_PER_WORD);
+  poly_int64 offset = i * UNITS_PER_WORD + subreg_memory_offset (op);
+  return (must_ge (offset, GET_MODE_SIZE (innermostmode))
+	  || must_le (offset, -UNITS_PER_WORD));
 }
 
 /* A subroutine of emit_move_insn_1.  Generate a move from Y into X.
@@ -9229,8 +9230,8 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
 			>= GET_MODE_BITSIZE (word_mode)))
 		  {
 		    rtx_insn *seq, *seq_old;
-		    unsigned int high_off = subreg_highpart_offset (word_mode,
-								    int_mode);
+		    poly_uint64 high_off = subreg_highpart_offset (word_mode,
+								   int_mode);
 		    bool extend_unsigned
 		      = TYPE_UNSIGNED (TREE_TYPE (gimple_assign_rhs1 (def)));
 		    rtx low = lowpart_subreg (word_mode, op0, int_mode);
Index: gcc/final.c
===================================================================
--- gcc/final.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/final.c	2017-10-23 17:16:50.365528952 +0100
@@ -3194,7 +3194,7 @@ alter_subreg (rtx *xp, bool final_p)
      We are required to.  */
   if (MEM_P (y))
     {
-      int offset = SUBREG_BYTE (x);
+      poly_int64 offset = SUBREG_BYTE (x);
 
       /* For paradoxical subregs on big-endian machines, SUBREG_BYTE
 	 contains 0 instead of the proper offset.  See simplify_subreg.  */
@@ -3217,7 +3217,7 @@ alter_subreg (rtx *xp, bool final_p)
 	{
 	  /* Simplify_subreg can't handle some REG cases, but we have to.  */
 	  unsigned int regno;
-	  HOST_WIDE_INT offset;
+	  poly_int64 offset;
 
 	  regno = subreg_regno (x);
 	  if (subreg_lowpart_p (x))
@@ -4460,6 +4460,7 @@ leaf_renumber_regs_insn (rtx in_rtx)
       case '0':
       case 'i':
       case 'w':
+      case 'p':
       case 'n':
       case 'u':
 	break;
Index: gcc/function.c
===================================================================
--- gcc/function.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/function.c	2017-10-23 17:16:50.365528952 +0100
@@ -2698,9 +2698,9 @@ assign_parm_find_stack_rtl (tree parm, s
 	  set_mem_size (stack_parm, GET_MODE_SIZE (data->promoted_mode));
 	  if (MEM_EXPR (stack_parm) && MEM_OFFSET_KNOWN_P (stack_parm))
 	    {
-	      int offset = subreg_lowpart_offset (DECL_MODE (parm),
-						  data->promoted_mode);
-	      if (offset)
+	      poly_int64 offset = subreg_lowpart_offset (DECL_MODE (parm),
+							 data->promoted_mode);
+	      if (maybe_nonzero (offset))
 		set_mem_offset (stack_parm, MEM_OFFSET (stack_parm) - offset);
 	    }
 	}
@@ -3424,12 +3424,13 @@ assign_parm_setup_stack (struct assign_p
 
       if (data->stack_parm)
 	{
-	  int offset = subreg_lowpart_offset (data->nominal_mode,
-					      GET_MODE (data->stack_parm));
+	  poly_int64 offset
+	    = subreg_lowpart_offset (data->nominal_mode,
+				     GET_MODE (data->stack_parm));
 	  /* ??? This may need a big-endian conversion on sparc64.  */
 	  data->stack_parm
 	    = adjust_address (data->stack_parm, data->nominal_mode, 0);
-	  if (offset && MEM_OFFSET_KNOWN_P (data->stack_parm))
+	  if (maybe_nonzero (offset) && MEM_OFFSET_KNOWN_P (data->stack_parm))
 	    set_mem_offset (data->stack_parm,
 			    MEM_OFFSET (data->stack_parm) + offset);
 	}
Index: gcc/fwprop.c
===================================================================
--- gcc/fwprop.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/fwprop.c	2017-10-23 17:16:50.366528817 +0100
@@ -1263,7 +1263,7 @@ forward_propagate_and_simplify (df_ref u
   reg = DF_REF_REG (use);
   if (GET_CODE (reg) == SUBREG && GET_CODE (SET_DEST (def_set)) == SUBREG)
     {
-      if (SUBREG_BYTE (SET_DEST (def_set)) != SUBREG_BYTE (reg))
+      if (may_ne (SUBREG_BYTE (SET_DEST (def_set)), SUBREG_BYTE (reg)))
 	return false;
     }
   /* Check if the def had a subreg, but the use has the whole reg.  */
Index: gcc/ifcvt.c
===================================================================
--- gcc/ifcvt.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/ifcvt.c	2017-10-23 17:16:50.368528547 +0100
@@ -894,7 +894,7 @@ noce_emit_move_insn (rtx x, rtx y)
 {
   machine_mode outmode;
   rtx outer, inner;
-  int bitpos;
+  poly_int64 bitpos;
 
   if (GET_CODE (x) != STRICT_LOW_PART)
     {
@@ -1724,12 +1724,12 @@ noce_emit_cmove (struct noce_if_info *if
     {
       rtx reg_vtrue = SUBREG_REG (vtrue);
       rtx reg_vfalse = SUBREG_REG (vfalse);
-      unsigned int byte_vtrue = SUBREG_BYTE (vtrue);
-      unsigned int byte_vfalse = SUBREG_BYTE (vfalse);
+      poly_uint64 byte_vtrue = SUBREG_BYTE (vtrue);
+      poly_uint64 byte_vfalse = SUBREG_BYTE (vfalse);
       rtx promoted_target;
 
       if (GET_MODE (reg_vtrue) != GET_MODE (reg_vfalse)
-	  || byte_vtrue != byte_vfalse
+	  || may_ne (byte_vtrue, byte_vfalse)
 	  || (SUBREG_PROMOTED_VAR_P (vtrue)
 	      != SUBREG_PROMOTED_VAR_P (vfalse))
 	  || (SUBREG_PROMOTED_GET (vtrue)
Index: gcc/ira.c
===================================================================
--- gcc/ira.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/ira.c	2017-10-23 17:16:50.369528412 +0100
@@ -4051,8 +4051,7 @@ get_subreg_tracking_sizes (rtx x, HOST_W
   rtx reg = regno_reg_rtx[REGNO (SUBREG_REG (x))];
   *outer_size = GET_MODE_SIZE (GET_MODE (x));
   *inner_size = GET_MODE_SIZE (GET_MODE (reg));
-  *start = SUBREG_BYTE (x);
-  return true;
+  return SUBREG_BYTE (x).is_constant (start);
 }
 
 /* Init LIVE_SUBREGS[ALLOCNUM] and LIVE_SUBREGS_USED[ALLOCNUM] for
Index: gcc/ira-conflicts.c
===================================================================
--- gcc/ira-conflicts.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/ira-conflicts.c	2017-10-23 17:16:50.368528547 +0100
@@ -226,8 +226,11 @@ go_through_subreg (rtx x, int *offset)
   if (REGNO (reg) < FIRST_PSEUDO_REGISTER)
     *offset = subreg_regno_offset (REGNO (reg), GET_MODE (reg),
 				   SUBREG_BYTE (x), GET_MODE (x));
-  else
-    *offset = (SUBREG_BYTE (x) / REGMODE_NATURAL_SIZE (GET_MODE (x)));
+  else if (!can_div_trunc_p (SUBREG_BYTE (x),
+			     REGMODE_NATURAL_SIZE (GET_MODE (x)), offset))
+    /* Checked by validate_subreg.  We must know at compile time which
+       inner hard registers are being accessed.  */
+    gcc_unreachable ();
   return reg;
 }
 
Index: gcc/ira-lives.c
===================================================================
--- gcc/ira-lives.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/ira-lives.c	2017-10-23 17:16:50.369528412 +0100
@@ -919,7 +919,7 @@ process_single_reg_class_operands (bool
 		    (subreg:YMODE (reg:XMODE XREGNO) OFFSET).  */
 	      machine_mode ymode, xmode;
 	      int xregno, yregno;
-	      HOST_WIDE_INT offset;
+	      poly_int64 offset;
 
 	      xmode = recog_data.operand_mode[i];
 	      xregno = ira_class_singleton[cl][xmode];
Index: gcc/jump.c
===================================================================
--- gcc/jump.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/jump.c	2017-10-23 17:16:50.369528412 +0100
@@ -1724,7 +1724,7 @@ rtx_renumbered_equal_p (const_rtx x, con
 				  && REG_P (SUBREG_REG (y)))))
     {
       int reg_x = -1, reg_y = -1;
-      int byte_x = 0, byte_y = 0;
+      poly_int64 byte_x = 0, byte_y = 0;
       struct subreg_info info;
 
       if (GET_MODE (x) != GET_MODE (y))
@@ -1781,7 +1781,7 @@ rtx_renumbered_equal_p (const_rtx x, con
 	    reg_y = reg_renumber[reg_y];
 	}
 
-      return reg_x >= 0 && reg_x == reg_y && byte_x == byte_y;
+      return reg_x >= 0 && reg_x == reg_y && must_eq (byte_x, byte_y);
     }
 
   /* Now we have disposed of all the cases
@@ -1873,6 +1873,11 @@ rtx_renumbered_equal_p (const_rtx x, con
 	    }
 	  break;
 
+	case 'p':
+	  if (may_ne (SUBREG_BYTE (x), SUBREG_BYTE (y)))
+	    return 0;
+	  break;
+
 	case 't':
 	  if (XTREE (x, i) != XTREE (y, i))
 	    return 0;
Index: gcc/lower-subreg.c
===================================================================
--- gcc/lower-subreg.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/lower-subreg.c	2017-10-23 17:16:50.370528277 +0100
@@ -609,19 +609,21 @@ decompose_register (unsigned int regno)
 /* Get a SUBREG of a CONCATN.  */
 
 static rtx
-simplify_subreg_concatn (machine_mode outermode, rtx op,
-			 unsigned int byte)
+simplify_subreg_concatn (machine_mode outermode, rtx op, poly_uint64 orig_byte)
 {
   unsigned int outer_size, outer_words, inner_size, inner_words;
   machine_mode innermode, partmode;
   rtx part;
   unsigned int final_offset;
+  unsigned int byte;
 
   innermode = GET_MODE (op);
   if (!interesting_mode_p (outermode, &outer_size, &outer_words)
       || !interesting_mode_p (innermode, &inner_size, &inner_words))
     gcc_unreachable ();
 
+  /* Must be constant if interesting_mode_p passes.  */
+  byte = orig_byte.to_constant ();
   gcc_assert (GET_CODE (op) == CONCATN);
   gcc_assert (byte % outer_size == 0);
 
@@ -667,7 +669,7 @@ simplify_gen_subreg_concatn (machine_mod
 
       if ((GET_MODE_SIZE (GET_MODE (op))
 	   == GET_MODE_SIZE (GET_MODE (SUBREG_REG (op))))
-	  && SUBREG_BYTE (op) == 0)
+	  && known_zero (SUBREG_BYTE (op)))
 	return simplify_gen_subreg_concatn (outermode, SUBREG_REG (op),
 					    GET_MODE (SUBREG_REG (op)), byte);
 
@@ -866,7 +868,7 @@ resolve_simple_move (rtx set, rtx_insn *
 
   if (GET_CODE (src) == SUBREG
       && resolve_reg_p (SUBREG_REG (src))
-      && (SUBREG_BYTE (src) != 0
+      && (maybe_nonzero (SUBREG_BYTE (src))
 	  || (GET_MODE_SIZE (orig_mode)
 	      != GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))))
     {
@@ -881,7 +883,7 @@ resolve_simple_move (rtx set, rtx_insn *
 
   if (GET_CODE (dest) == SUBREG
       && resolve_reg_p (SUBREG_REG (dest))
-      && (SUBREG_BYTE (dest) != 0
+      && (maybe_nonzero (SUBREG_BYTE (dest))
 	  || (GET_MODE_SIZE (orig_mode)
 	      != GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest))))))
     {
Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/lra-constraints.c	2017-10-23 17:16:50.370528277 +0100
@@ -786,6 +786,11 @@ operands_match_p (rtx x, rtx y, int y_ha
 	    return false;
 	  break;
 
+	case 'p':
+	  if (may_ne (SUBREG_BYTE (x), SUBREG_BYTE (y)))
+	    return false;
+	  break;
+
 	case 'e':
 	  val = operands_match_p (XEXP (x, i), XEXP (y, i), -1);
 	  if (val == 0)
@@ -974,7 +979,7 @@ match_reload (signed char out, signed ch
 	      if (REG_P (subreg_reg)
 		  && (int) REGNO (subreg_reg) < lra_new_regno_start
 		  && GET_MODE (subreg_reg) == outmode
-		  && SUBREG_BYTE (in_rtx) == SUBREG_BYTE (new_in_reg)
+		  && must_eq (SUBREG_BYTE (in_rtx), SUBREG_BYTE (new_in_reg))
 		  && find_regno_note (curr_insn, REG_DEAD, REGNO (subreg_reg))
 		  && (! early_clobber_p
 		      || check_conflict_input_operands (REGNO (subreg_reg),
@@ -4204,7 +4209,7 @@ curr_insn_transform (bool check_only_p)
 	{
 	  machine_mode mode;
 	  rtx reg, *loc;
-	  int hard_regno, byte;
+	  int hard_regno;
 	  enum op_type type = curr_static_id->operand[i].type;
 
 	  loc = curr_id->operand_loc[i];
@@ -4212,7 +4217,7 @@ curr_insn_transform (bool check_only_p)
 	  if (GET_CODE (*loc) == SUBREG)
 	    {
 	      reg = SUBREG_REG (*loc);
-	      byte = SUBREG_BYTE (*loc);
+	      poly_int64 byte = SUBREG_BYTE (*loc);
 	      if (REG_P (reg)
 		  /* Strict_low_part requires reload the register not
 		     the sub-register.	*/
Index: gcc/lra-spills.c
===================================================================
--- gcc/lra-spills.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/lra-spills.c	2017-10-23 17:16:50.371528142 +0100
@@ -136,7 +136,7 @@ assign_mem_slot (int i)
   machine_mode wider_mode
     = wider_subreg_mode (mode, lra_reg_info[i].biggest_mode);
   HOST_WIDE_INT total_size = GET_MODE_SIZE (wider_mode);
-  HOST_WIDE_INT adjust = 0;
+  poly_int64 adjust = 0;
 
   lra_assert (regno_reg_rtx[i] != NULL_RTX && REG_P (regno_reg_rtx[i])
 	      && lra_reg_info[i].nrefs != 0 && reg_renumber[i] < 0);
Index: gcc/postreload.c
===================================================================
--- gcc/postreload.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/postreload.c	2017-10-23 17:16:50.371528142 +0100
@@ -1704,9 +1704,9 @@ move2add_valid_value_p (int regno, scala
 	 mode after truncation only if (REG:mode regno) is the lowpart of
 	 (REG:reg_mode[regno] regno).  Now, for big endian, the starting
 	 regno of the lowpart might be different.  */
-      int s_off = subreg_lowpart_offset (mode, old_mode);
+      poly_int64 s_off = subreg_lowpart_offset (mode, old_mode);
       s_off = subreg_regno_offset (regno, old_mode, s_off, mode);
-      if (s_off != 0)
+      if (maybe_nonzero (s_off))
 	/* We could in principle adjust regno, check reg_mode[regno] to be
 	   BLKmode, and return s_off to the caller (vs. -1 for failure),
 	   but we currently have no callers that could make use of this
Index: gcc/recog.c
===================================================================
--- gcc/recog.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/recog.c	2017-10-23 17:16:50.372528007 +0100
@@ -1006,7 +1006,8 @@ general_operand (rtx op, machine_mode mo
 	 might be called from cleanup_subreg_operands.
 
 	 ??? This is a kludge.  */
-      if (!reload_completed && SUBREG_BYTE (op) != 0
+      if (!reload_completed
+	  && maybe_nonzero (SUBREG_BYTE (op))
 	  && MEM_P (sub))
 	return 0;
 
@@ -1368,9 +1369,6 @@ indirect_operand (rtx op, machine_mode m
   if (! reload_completed
       && GET_CODE (op) == SUBREG && MEM_P (SUBREG_REG (op)))
     {
-      int offset = SUBREG_BYTE (op);
-      rtx inner = SUBREG_REG (op);
-
       if (mode != VOIDmode && GET_MODE (op) != mode)
 	return 0;
 
@@ -1378,12 +1376,10 @@ indirect_operand (rtx op, machine_mode m
 	 address is if OFFSET is zero and the address already is an operand
 	 or if the address is (plus Y (const_int -OFFSET)) and Y is an
 	 operand.  */
-
-      return ((offset == 0 && general_operand (XEXP (inner, 0), Pmode))
-	      || (GET_CODE (XEXP (inner, 0)) == PLUS
-		  && CONST_INT_P (XEXP (XEXP (inner, 0), 1))
-		  && INTVAL (XEXP (XEXP (inner, 0), 1)) == -offset
-		  && general_operand (XEXP (XEXP (inner, 0), 0), Pmode)));
+      poly_int64 offset;
+      rtx addr = strip_offset (XEXP (SUBREG_REG (op), 0), &offset);
+      return (known_zero (offset + SUBREG_BYTE (op))
+	      && general_operand (addr, Pmode));
     }
 
   return (MEM_P (op)
Index: gcc/regcprop.c
===================================================================
--- gcc/regcprop.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/regcprop.c	2017-10-23 17:16:50.372528007 +0100
@@ -345,7 +345,8 @@ copy_value (rtx dest, rtx src, struct va
      We can't properly represent the latter case in our tables, so don't
      record anything then.  */
   else if (sn < hard_regno_nregs (sr, vd->e[sr].mode)
-	   && subreg_lowpart_offset (GET_MODE (dest), vd->e[sr].mode) != 0)
+	   && maybe_nonzero (subreg_lowpart_offset (GET_MODE (dest),
+						    vd->e[sr].mode)))
     return;
 
   /* If SRC had been assigned a mode narrower than the copy, we can't
@@ -407,7 +408,7 @@ maybe_mode_change (machine_mode orig_mod
       int use_nregs = hard_regno_nregs (copy_regno, new_mode);
       int copy_offset
 	= GET_MODE_SIZE (copy_mode) / copy_nregs * (copy_nregs - use_nregs);
-      unsigned int offset
+      poly_uint64 offset
 	= subreg_size_lowpart_offset (GET_MODE_SIZE (new_mode) + copy_offset,
 				      GET_MODE_SIZE (orig_mode));
       regno += subreg_regno_offset (regno, orig_mode, offset, new_mode);
@@ -866,7 +867,8 @@ copyprop_hardreg_forward_1 (basic_block
 	      /* And likewise, if we are narrowing on big endian the transformation
 		 is also invalid.  */
 	      if (REG_NREGS (src) < hard_regno_nregs (regno, vd->e[regno].mode)
-		  && subreg_lowpart_offset (mode, vd->e[regno].mode) != 0)
+		  && maybe_nonzero (subreg_lowpart_offset (mode,
+							   vd->e[regno].mode)))
 		goto no_move_special_case;
 	    }
 
Index: gcc/reginfo.c
===================================================================
--- gcc/reginfo.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/reginfo.c	2017-10-23 17:16:50.372528007 +0100
@@ -1206,7 +1206,9 @@ reg_classes_intersect_p (reg_class_t c1,
 inline hashval_t
 simplifiable_subregs_hasher::hash (const simplifiable_subreg *value)
 {
-  return value->shape.unique_id ();
+  inchash::hash h;
+  h.add_hwi (value->shape.unique_id ());
+  return h.end ();
 }
 
 inline bool
@@ -1231,9 +1233,11 @@ simplifiable_subregs (const subreg_shape
   if (!this_target_hard_regs->x_simplifiable_subregs)
     this_target_hard_regs->x_simplifiable_subregs
       = new hash_table <simplifiable_subregs_hasher> (30);
+  inchash::hash h;
+  h.add_hwi (shape.unique_id ());
   simplifiable_subreg **slot
     = (this_target_hard_regs->x_simplifiable_subregs
-       ->find_slot_with_hash (&shape, shape.unique_id (), INSERT));
+       ->find_slot_with_hash (&shape, h.end (), INSERT));
 
   if (!*slot)
     {
@@ -1294,7 +1298,7 @@ record_subregs_of_mode (rtx subreg, bool
       unsigned int size = MAX (REGMODE_NATURAL_SIZE (shape.inner_mode),
 			       GET_MODE_SIZE (shape.outer_mode));
       gcc_checking_assert (size < GET_MODE_SIZE (shape.inner_mode));
-      if (shape.offset >= size)
+      if (must_ge (shape.offset, size))
 	shape.offset -= size;
       else
 	shape.offset += size;
Index: gcc/rtlhooks.c
===================================================================
--- gcc/rtlhooks.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/rtlhooks.c	2017-10-23 17:16:50.375527601 +0100
@@ -70,7 +70,7 @@ gen_lowpart_general (machine_mode mode,
 	  && !reload_completed)
 	return gen_lowpart_general (mode, force_reg (xmode, x));
 
-      HOST_WIDE_INT offset = byte_lowpart_offset (mode, GET_MODE (x));
+      poly_int64 offset = byte_lowpart_offset (mode, GET_MODE (x));
       return adjust_address (x, mode, offset);
     }
 }
@@ -115,7 +115,7 @@ gen_lowpart_if_possible (machine_mode mo
   else if (MEM_P (x))
     {
       /* This is the only other case we handle.  */
-      HOST_WIDE_INT offset = byte_lowpart_offset (mode, GET_MODE (x));
+      poly_int64 offset = byte_lowpart_offset (mode, GET_MODE (x));
       rtx new_rtx = adjust_address_nv (x, mode, offset);
       if (! memory_address_addr_space_p (mode, XEXP (new_rtx, 0),
 					 MEM_ADDR_SPACE (x)))
Index: gcc/reload.c
===================================================================
--- gcc/reload.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/reload.c	2017-10-23 17:16:50.373527872 +0100
@@ -2307,6 +2307,11 @@ operands_match_p (rtx x, rtx y)
 	    return 0;
 	  break;
 
+	case 'p':
+	  if (may_ne (SUBREG_BYTE (x), SUBREG_BYTE (y)))
+	    return 0;
+	  break;
+
 	case 'e':
 	  val = operands_match_p (XEXP (x, i), XEXP (y, i));
 	  if (val == 0)
@@ -6095,7 +6100,7 @@ find_reloads_subreg_address (rtx x, int
   int regno = REGNO (SUBREG_REG (x));
   int reloaded = 0;
   rtx tem, orig;
-  int offset;
+  poly_int64 offset;
 
   gcc_assert (reg_equiv_memory_loc (regno) != 0);
 
@@ -6142,7 +6147,7 @@ find_reloads_subreg_address (rtx x, int
 				   XEXP (tem, 0), &XEXP (tem, 0),
 				   opnum, type, ind_levels, insn);
   /* ??? Do we need to handle nonzero offsets somehow?  */
-  if (!offset && !rtx_equal_p (tem, orig))
+  if (known_zero (offset) && !rtx_equal_p (tem, orig))
     push_reg_equiv_alt_mem (regno, tem);
 
   /* For some processors an address may be valid in the original mode but
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/reload1.c	2017-10-23 17:16:50.373527872 +0100
@@ -2145,7 +2145,7 @@ alter_reg (int i, int from_reg, bool don
       machine_mode wider_mode = wider_subreg_mode (mode, reg_max_ref_mode[i]);
       unsigned int total_size = GET_MODE_SIZE (wider_mode);
       unsigned int min_align = GET_MODE_BITSIZE (reg_max_ref_mode[i]);
-      int adjust = 0;
+      poly_int64 adjust = 0;
 
       something_was_spilled = true;
 
@@ -2185,7 +2185,7 @@ alter_reg (int i, int from_reg, bool don
 	  if (BYTES_BIG_ENDIAN)
 	    {
 	      adjust = inherent_size - total_size;
-	      if (adjust)
+	      if (maybe_nonzero (adjust))
 		{
 		  unsigned int total_bits = total_size * BITS_PER_UNIT;
 		  machine_mode mem_mode
@@ -2237,7 +2237,7 @@ alter_reg (int i, int from_reg, bool don
 	  if (BYTES_BIG_ENDIAN)
 	    {
 	      adjust = GET_MODE_SIZE (mode) - total_size;
-	      if (adjust)
+	      if (maybe_nonzero (adjust))
 		{
 		  unsigned int total_bits = total_size * BITS_PER_UNIT;
 		  machine_mode mem_mode
@@ -6347,12 +6347,12 @@ replaced_subreg (rtx x)
    SUBREG is non-NULL if the pseudo is a subreg whose reg is a pseudo,
    otherwise it is NULL.  */
 
-static int
+static poly_int64
 compute_reload_subreg_offset (machine_mode outermode,
 			      rtx subreg,
 			      machine_mode innermode)
 {
-  int outer_offset;
+  poly_int64 outer_offset;
   machine_mode middlemode;
 
   if (!subreg)
@@ -6506,7 +6506,7 @@ choose_reload_regs (struct insn_chain *c
 
 	  if (inheritance)
 	    {
-	      int byte = 0;
+	      poly_int64 byte = 0;
 	      int regno = -1;
 	      machine_mode mode = VOIDmode;
 	      rtx subreg = NULL_RTX;
@@ -6556,8 +6556,9 @@ choose_reload_regs (struct insn_chain *c
 
 	      if (regno >= 0
 		  && reg_last_reload_reg[regno] != 0
-		  && (GET_MODE_SIZE (GET_MODE (reg_last_reload_reg[regno]))
-		      >= GET_MODE_SIZE (mode) + byte)
+		  && (must_ge
+		      (GET_MODE_SIZE (GET_MODE (reg_last_reload_reg[regno])),
+		       GET_MODE_SIZE (mode) + byte))
 		  /* Verify that the register it's in can be used in
 		     mode MODE.  */
 		  && (REG_CAN_CHANGE_MODE_P
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/simplify-rtx.c	2017-10-23 17:16:50.376527466 +0100
@@ -789,7 +789,7 @@ simplify_truncation (machine_mode mode,
       && (INTVAL (XEXP (op, 1)) & (precision - 1)) == 0
       && UINTVAL (XEXP (op, 1)) < op_precision)
     {
-      int byte = subreg_lowpart_offset (mode, op_mode);
+      poly_int64 byte = subreg_lowpart_offset (mode, op_mode);
       int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
       return simplify_gen_subreg (mode, XEXP (op, 0), op_mode,
 				  (WORDS_BIG_ENDIAN
@@ -815,7 +815,7 @@ simplify_truncation (machine_mode mode,
       && (GET_MODE_SIZE (int_mode) >= UNITS_PER_WORD
 	  || WORDS_BIG_ENDIAN == BYTES_BIG_ENDIAN))
     {
-      int byte = subreg_lowpart_offset (int_mode, int_op_mode);
+      poly_int64 byte = subreg_lowpart_offset (int_mode, int_op_mode);
       int shifted_bytes = INTVAL (XEXP (op, 1)) / BITS_PER_UNIT;
       return adjust_address_nv (XEXP (op, 0), int_mode,
 				(WORDS_BIG_ENDIAN
@@ -2826,7 +2826,7 @@ simplify_binary_operation_1 (enum rtx_co
           && GET_CODE (SUBREG_REG (opleft)) == ASHIFT
           && GET_CODE (opright) == LSHIFTRT
           && GET_CODE (XEXP (opright, 0)) == SUBREG
-          && SUBREG_BYTE (opleft) == SUBREG_BYTE (XEXP (opright, 0))
+	  && must_eq (SUBREG_BYTE (opleft), SUBREG_BYTE (XEXP (opright, 0)))
 	  && GET_MODE_SIZE (int_mode) < GET_MODE_SIZE (inner_mode)
           && rtx_equal_p (XEXP (SUBREG_REG (opleft), 0),
                           SUBREG_REG (XEXP (opright, 0)))
@@ -6183,7 +6183,7 @@ simplify_immed_subreg (fixed_size_mode o
    Return 0 if no simplifications are possible.  */
 rtx
 simplify_subreg (machine_mode outermode, rtx op,
-		 machine_mode innermode, unsigned int byte)
+		 machine_mode innermode, poly_uint64 byte)
 {
   /* Little bit of sanity checking.  */
   gcc_assert (innermode != VOIDmode);
@@ -6194,16 +6194,16 @@ simplify_subreg (machine_mode outermode,
   gcc_assert (GET_MODE (op) == innermode
 	      || GET_MODE (op) == VOIDmode);
 
-  if ((byte % GET_MODE_SIZE (outermode)) != 0)
+  if (!multiple_p (byte, GET_MODE_SIZE (outermode)))
     return NULL_RTX;
 
-  if (byte >= GET_MODE_SIZE (innermode))
+  if (may_ge (byte, GET_MODE_SIZE (innermode)))
     return NULL_RTX;
 
-  if (outermode == innermode && !byte)
+  if (outermode == innermode && known_zero (byte))
     return op;
 
-  if (byte % GET_MODE_UNIT_SIZE (innermode) == 0)
+  if (multiple_p (byte, GET_MODE_UNIT_SIZE (innermode)))
     {
       rtx elt;
 
@@ -6224,12 +6224,15 @@ simplify_subreg (machine_mode outermode,
     {
       /* simplify_immed_subreg deconstructs OP into bytes and constructs
 	 the result from bytes, so it only works if the sizes of the modes
-	 are known at compile time.  Cases that apply to general modes
-	 should be handled here before calling simplify_immed_subreg.  */
+	 and the value of the offset are known at compile time.  Cases that
+	 that apply to general modes and offsets should be handled here
+	 before calling simplify_immed_subreg.  */
       fixed_size_mode fs_outermode, fs_innermode;
+      unsigned HOST_WIDE_INT cbyte;
       if (is_a <fixed_size_mode> (outermode, &fs_outermode)
-	  && is_a <fixed_size_mode> (innermode, &fs_innermode))
-	return simplify_immed_subreg (fs_outermode, op, fs_innermode, byte);
+	  && is_a <fixed_size_mode> (innermode, &fs_innermode)
+	  && byte.is_constant (&cbyte))
+	return simplify_immed_subreg (fs_outermode, op, fs_innermode, cbyte);
 
       return NULL_RTX;
     }
@@ -6242,32 +6245,33 @@ simplify_subreg (machine_mode outermode,
       rtx newx;
 
       if (outermode == innermostmode
-	  && byte == 0 && SUBREG_BYTE (op) == 0)
+	  && known_zero (byte)
+	  && known_zero (SUBREG_BYTE (op)))
 	return SUBREG_REG (op);
 
       /* Work out the memory offset of the final OUTERMODE value relative
 	 to the inner value of OP.  */
-      HOST_WIDE_INT mem_offset = subreg_memory_offset (outermode,
-						       innermode, byte);
-      HOST_WIDE_INT op_mem_offset = subreg_memory_offset (op);
-      HOST_WIDE_INT final_offset = mem_offset + op_mem_offset;
+      poly_int64 mem_offset = subreg_memory_offset (outermode,
+						    innermode, byte);
+      poly_int64 op_mem_offset = subreg_memory_offset (op);
+      poly_int64 final_offset = mem_offset + op_mem_offset;
 
       /* See whether resulting subreg will be paradoxical.  */
       if (!paradoxical_subreg_p (outermode, innermostmode))
 	{
 	  /* In nonparadoxical subregs we can't handle negative offsets.  */
-	  if (final_offset < 0)
+	  if (may_lt (final_offset, 0))
 	    return NULL_RTX;
 	  /* Bail out in case resulting subreg would be incorrect.  */
-	  if (final_offset % GET_MODE_SIZE (outermode)
-	      || (unsigned) final_offset >= GET_MODE_SIZE (innermostmode))
+	  if (!multiple_p (final_offset, GET_MODE_SIZE (outermode))
+	      || may_ge (final_offset, GET_MODE_SIZE (innermostmode)))
 	    return NULL_RTX;
 	}
       else
 	{
-	  HOST_WIDE_INT required_offset
-	    = subreg_memory_offset (outermode, innermostmode, 0);
-	  if (final_offset != required_offset)
+	  poly_int64 required_offset = subreg_memory_offset (outermode,
+							     innermostmode, 0);
+	  if (may_ne (final_offset, required_offset))
 	    return NULL_RTX;
 	  /* Paradoxical subregs always have byte offset 0.  */
 	  final_offset = 0;
@@ -6320,7 +6324,7 @@ simplify_subreg (machine_mode outermode,
 	     The information is used only by alias analysis that can not
 	     grog partial register anyway.  */
 
-	  if (subreg_lowpart_offset (outermode, innermode) == byte)
+	  if (must_eq (subreg_lowpart_offset (outermode, innermode), byte))
 	    ORIGINAL_REGNO (x) = ORIGINAL_REGNO (op);
 	  return x;
 	}
@@ -6345,25 +6349,28 @@ simplify_subreg (machine_mode outermode,
   if (GET_CODE (op) == CONCAT
       || GET_CODE (op) == VEC_CONCAT)
     {
-      unsigned int part_size, final_offset;
+      unsigned int part_size;
+      poly_uint64 final_offset;
       rtx part, res;
 
       machine_mode part_mode = GET_MODE (XEXP (op, 0));
       if (part_mode == VOIDmode)
 	part_mode = GET_MODE_INNER (GET_MODE (op));
       part_size = GET_MODE_SIZE (part_mode);
-      if (byte < part_size)
+      if (must_lt (byte, part_size))
 	{
 	  part = XEXP (op, 0);
 	  final_offset = byte;
 	}
-      else
+      else if (must_ge (byte, part_size))
 	{
 	  part = XEXP (op, 1);
 	  final_offset = byte - part_size;
 	}
+      else
+	return NULL_RTX;
 
-      if (final_offset + GET_MODE_SIZE (outermode) > part_size)
+      if (may_gt (final_offset + GET_MODE_SIZE (outermode), part_size))
 	return NULL_RTX;
 
       part_mode = GET_MODE (part);
@@ -6381,15 +6388,15 @@ simplify_subreg (machine_mode outermode,
      it extracts higher bits that the ZERO_EXTEND's source bits.  */
   if (GET_CODE (op) == ZERO_EXTEND && SCALAR_INT_MODE_P (innermode))
     {
-      unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte);
-      if (bitpos >= GET_MODE_PRECISION (GET_MODE (XEXP (op, 0))))
+      poly_uint64 bitpos = subreg_lsb_1 (outermode, innermode, byte);
+      if (must_ge (bitpos, GET_MODE_PRECISION (GET_MODE (XEXP (op, 0)))))
 	return CONST0_RTX (outermode);
     }
 
   scalar_int_mode int_outermode, int_innermode;
   if (is_a <scalar_int_mode> (outermode, &int_outermode)
       && is_a <scalar_int_mode> (innermode, &int_innermode)
-      && byte == subreg_lowpart_offset (int_outermode, int_innermode))
+      && must_eq (byte, subreg_lowpart_offset (int_outermode, int_innermode)))
     {
       /* Handle polynomial integers.  The upper bits of a paradoxical
 	 subreg are undefined, so this is safe regardless of whether
@@ -6419,7 +6426,7 @@ simplify_subreg (machine_mode outermode,
 
 rtx
 simplify_gen_subreg (machine_mode outermode, rtx op,
-		     machine_mode innermode, unsigned int byte)
+		     machine_mode innermode, poly_uint64 byte)
 {
   rtx newx;
 
@@ -6615,7 +6622,7 @@ test_vector_ops_duplicate (machine_mode
 						duplicate, last_par));
 
   /* Test a scalar subreg of a VEC_DUPLICATE.  */
-  unsigned int offset = subreg_lowpart_offset (inner_mode, mode);
+  poly_uint64 offset = subreg_lowpart_offset (inner_mode, mode);
   ASSERT_RTX_EQ (scalar_reg,
 		 simplify_gen_subreg (inner_mode, duplicate,
 				      mode, offset));
@@ -6635,7 +6642,7 @@ test_vector_ops_duplicate (machine_mode
 						duplicate, vec_par));
 
       /* Test a vector subreg of a VEC_DUPLICATE.  */
-      unsigned int offset = subreg_lowpart_offset (narrower_mode, mode);
+      poly_uint64 offset = subreg_lowpart_offset (narrower_mode, mode);
       ASSERT_RTX_EQ (narrower_duplicate,
 		     simplify_gen_subreg (narrower_mode, duplicate,
 					  mode, offset));
@@ -6745,7 +6752,7 @@ simplify_const_poly_int_tests<N>::run ()
   rtx x10 = gen_int_mode (poly_int64 (-31, -24), HImode);
   rtx two = GEN_INT (2);
   rtx six = GEN_INT (6);
-  HOST_WIDE_INT offset = subreg_lowpart_offset (QImode, HImode);
+  poly_uint64 offset = subreg_lowpart_offset (QImode, HImode);
 
   /* These tests only try limited operation combinations.  Fuller arithmetic
      testing is done directly on poly_ints.  */
Index: gcc/valtrack.c
===================================================================
--- gcc/valtrack.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/valtrack.c	2017-10-23 17:16:50.376527466 +0100
@@ -550,7 +550,7 @@ debug_lowpart_subreg (machine_mode outer
 {
   if (inner_mode == VOIDmode)
     inner_mode = GET_MODE (expr);
-  int offset = subreg_lowpart_offset (outer_mode, inner_mode);
+  poly_int64 offset = subreg_lowpart_offset (outer_mode, inner_mode);
   rtx ret = simplify_gen_subreg (outer_mode, expr, inner_mode, offset);
   if (ret)
     return ret;
Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c	2017-10-23 17:16:35.057923923 +0100
+++ gcc/var-tracking.c	2017-10-23 17:16:50.377527331 +0100
@@ -3522,6 +3522,12 @@ loc_cmp (rtx x, rtx y)
 	else
 	  return 1;
 
+      case 'p':
+	r = compare_sizes_for_sort (SUBREG_BYTE (x), SUBREG_BYTE (y));
+	if (r != 0)
+	  return r;
+	break;
+
       case 'V':
       case 'E':
 	/* Compare the vector length first.  */
@@ -5369,7 +5375,7 @@ track_loc_p (rtx loc, tree expr, poly_in
 static rtx
 var_lowpart (machine_mode mode, rtx loc)
 {
-  unsigned int offset, reg_offset, regno;
+  unsigned int regno;
 
   if (GET_MODE (loc) == mode)
     return loc;
@@ -5377,12 +5383,12 @@ var_lowpart (machine_mode mode, rtx loc)
   if (!REG_P (loc) && !MEM_P (loc))
     return NULL;
 
-  offset = byte_lowpart_offset (mode, GET_MODE (loc));
+  poly_uint64 offset = byte_lowpart_offset (mode, GET_MODE (loc));
 
   if (MEM_P (loc))
     return adjust_address_nv (loc, mode, offset);
 
-  reg_offset = subreg_lowpart_offset (mode, GET_MODE (loc));
+  poly_uint64 reg_offset = subreg_lowpart_offset (mode, GET_MODE (loc));
   regno = REGNO (loc) + subreg_regno_offset (REGNO (loc), GET_MODE (loc),
 					     reg_offset, mode);
   return gen_rtx_REG_offset (loc, mode, regno, offset);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [024/nnn] poly_int: ira subreg liveness tracking
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (23 preceding siblings ...)
  2017-10-23 17:10 ` [025/nnn] poly_int: SUBREG_BYTE Richard Sandiford
@ 2017-10-23 17:10 ` Richard Sandiford
  2017-11-28 21:10   ` Jeff Law
  2017-10-23 17:11 ` [026/nnn] poly_int: operand_subword Richard Sandiford
                   ` (82 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:10 UTC (permalink / raw)
  To: gcc-patches

Normmaly the IRA-reload interface tries to track the liveness of
individual bytes of an allocno if the allocno is sometimes written
to as a SUBREG.  This isn't possible for variable-sized allocnos,
but it doesn't matter because targets with variable-sized registers
should use LRA instead.

This patch adds a get_subreg_tracking_sizes function for deciding
whether it is possible to model a partial read or write.  Later
patches make it return false if anything is variable.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* ira.c (get_subreg_tracking_sizes): New function.
	(init_live_subregs): Take an integer size rather than a register.
	(build_insn_chain): Use get_subreg_tracking_sizes.  Update calls
	to init_live_subregs.

Index: gcc/ira.c
===================================================================
--- gcc/ira.c	2017-10-23 17:11:43.647074065 +0100
+++ gcc/ira.c	2017-10-23 17:11:59.074107016 +0100
@@ -4040,16 +4040,27 @@ pseudo_for_reload_consideration_p (int r
   return (reg_renumber[regno] >= 0 || ira_conflicts_p);
 }
 
-/* Init LIVE_SUBREGS[ALLOCNUM] and LIVE_SUBREGS_USED[ALLOCNUM] using
-   REG to the number of nregs, and INIT_VALUE to get the
-   initialization.  ALLOCNUM need not be the regno of REG.  */
+/* Return true if we can track the individual bytes of subreg X.
+   When returning true, set *OUTER_SIZE to the number of bytes in
+   X itself, *INNER_SIZE to the number of bytes in the inner register
+   and *START to the offset of the first byte.  */
+static bool
+get_subreg_tracking_sizes (rtx x, HOST_WIDE_INT *outer_size,
+			   HOST_WIDE_INT *inner_size, HOST_WIDE_INT *start)
+{
+  rtx reg = regno_reg_rtx[REGNO (SUBREG_REG (x))];
+  *outer_size = GET_MODE_SIZE (GET_MODE (x));
+  *inner_size = GET_MODE_SIZE (GET_MODE (reg));
+  *start = SUBREG_BYTE (x);
+  return true;
+}
+
+/* Init LIVE_SUBREGS[ALLOCNUM] and LIVE_SUBREGS_USED[ALLOCNUM] for
+   a register with SIZE bytes, making the register live if INIT_VALUE.  */
 static void
 init_live_subregs (bool init_value, sbitmap *live_subregs,
-		   bitmap live_subregs_used, int allocnum, rtx reg)
+		   bitmap live_subregs_used, int allocnum, int size)
 {
-  unsigned int regno = REGNO (SUBREG_REG (reg));
-  int size = GET_MODE_SIZE (GET_MODE (regno_reg_rtx[regno]));
-
   gcc_assert (size > 0);
 
   /* Been there, done that.  */
@@ -4158,19 +4169,26 @@ build_insn_chain (void)
 			&& (!DF_REF_FLAGS_IS_SET (def, DF_REF_CONDITIONAL)))
 		      {
 			rtx reg = DF_REF_REG (def);
+			HOST_WIDE_INT outer_size, inner_size, start;
 
-			/* We can model subregs, but not if they are
-			   wrapped in ZERO_EXTRACTS.  */
+			/* We can usually track the liveness of individual
+			   bytes within a subreg.  The only exceptions are
+			   subregs wrapped in ZERO_EXTRACTs and subregs whose
+			   size is not known; in those cases we need to be
+			   conservative and treat the definition as a partial
+			   definition of the full register rather than a full
+			   definition of a specific part of the register.  */
 			if (GET_CODE (reg) == SUBREG
-			    && !DF_REF_FLAGS_IS_SET (def, DF_REF_ZERO_EXTRACT))
+			    && !DF_REF_FLAGS_IS_SET (def, DF_REF_ZERO_EXTRACT)
+			    && get_subreg_tracking_sizes (reg, &outer_size,
+							  &inner_size, &start))
 			  {
-			    unsigned int start = SUBREG_BYTE (reg);
-			    unsigned int last = start
-			      + GET_MODE_SIZE (GET_MODE (reg));
+			    HOST_WIDE_INT last = start + outer_size;
 
 			    init_live_subregs
 			      (bitmap_bit_p (live_relevant_regs, regno),
-			       live_subregs, live_subregs_used, regno, reg);
+			       live_subregs, live_subregs_used, regno,
+			       inner_size);
 
 			    if (!DF_REF_FLAGS_IS_SET
 				(def, DF_REF_STRICT_LOW_PART))
@@ -4255,18 +4273,20 @@ build_insn_chain (void)
 		    if (regno < FIRST_PSEUDO_REGISTER
 			|| pseudo_for_reload_consideration_p (regno))
 		      {
+			HOST_WIDE_INT outer_size, inner_size, start;
 			if (GET_CODE (reg) == SUBREG
 			    && !DF_REF_FLAGS_IS_SET (use,
 						     DF_REF_SIGN_EXTRACT
-						     | DF_REF_ZERO_EXTRACT))
+						     | DF_REF_ZERO_EXTRACT)
+			    && get_subreg_tracking_sizes (reg, &outer_size,
+							  &inner_size, &start))
 			  {
-			    unsigned int start = SUBREG_BYTE (reg);
-			    unsigned int last = start
-			      + GET_MODE_SIZE (GET_MODE (reg));
+			    HOST_WIDE_INT last = start + outer_size;
 
 			    init_live_subregs
 			      (bitmap_bit_p (live_relevant_regs, regno),
-			       live_subregs, live_subregs_used, regno, reg);
+			       live_subregs, live_subregs_used, regno,
+			       inner_size);
 
 			    /* Ignore the paradoxical bits.  */
 			    if (last > SBITMAP_SIZE (live_subregs[regno]))

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [027/nnn] poly_int: DWARF CFA offsets
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (25 preceding siblings ...)
  2017-10-23 17:11 ` [026/nnn] poly_int: operand_subword Richard Sandiford
@ 2017-10-23 17:11 ` Richard Sandiford
  2017-12-06  0:40   ` Jeff Law
  2017-10-23 17:12 ` [030/nnn] poly_int: get_addr_unit_base_and_extent Richard Sandiford
                   ` (80 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:11 UTC (permalink / raw)
  To: gcc-patches

This patch makes the DWARF code use poly_int64 rather than
HOST_WIDE_INT for CFA offsets.  The main changes are:

- to make reg_save use a DW_CFA_expression representation when
  the offset isn't constant and

- to record the CFA information alongside a def_cfa_expression
  if either offset is polynomial, since it's quite difficult
  to reconstruct the CFA information otherwise.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* gengtype.c (main): Handle poly_int64_pod.
	* dwarf2out.h (dw_cfi_oprnd_cfa_loc): New dw_cfi_oprnd_type.
	(dw_cfi_oprnd::dw_cfi_cfa_loc): New field.
	(dw_cfa_location::offset, dw_cfa_location::base_offset): Change
	from HOST_WIDE_INT to poly_int64_pod.
	* dwarf2cfi.c (queued_reg_save::cfa_offset): Likewise.
	(copy_cfa): New function.
	(lookup_cfa_1): Use the cached dw_cfi_cfa_loc, if it exists.
	(cfi_oprnd_equal_p): Handle dw_cfi_oprnd_cfa_loc.
	(cfa_equal_p, dwarf2out_frame_debug_adjust_cfa)
	(dwarf2out_frame_debug_cfa_offset, dwarf2out_frame_debug_expr)
	(initial_return_save): Treat offsets as poly_ints.
	(def_cfa_0): Likewise.  Cache the CFA in dw_cfi_cfa_loc if either
	offset is nonconstant.
	(reg_save): Take the offset as a poly_int64.  Fall back to
	DW_CFA_expression for nonconstant offsets.
	(queue_reg_save): Take the offset as a poly_int64.
	* dwarf2out.c (dw_cfi_oprnd2_desc): Handle DW_CFA_def_cfa_expression.

Index: gcc/gengtype.c
===================================================================
--- gcc/gengtype.c	2017-10-23 17:16:50.367528682 +0100
+++ gcc/gengtype.c	2017-10-23 17:16:57.211604434 +0100
@@ -5192,6 +5192,7 @@ #define POS_HERE(Call) do { pos.file = t
       POS_HERE (do_scalar_typedef ("REAL_VALUE_TYPE", &pos));
       POS_HERE (do_scalar_typedef ("FIXED_VALUE_TYPE", &pos));
       POS_HERE (do_scalar_typedef ("double_int", &pos));
+      POS_HERE (do_scalar_typedef ("poly_int64_pod", &pos));
       POS_HERE (do_scalar_typedef ("offset_int", &pos));
       POS_HERE (do_scalar_typedef ("widest_int", &pos));
       POS_HERE (do_scalar_typedef ("int64_t", &pos));
Index: gcc/dwarf2out.h
===================================================================
--- gcc/dwarf2out.h	2017-10-23 17:11:40.311071579 +0100
+++ gcc/dwarf2out.h	2017-10-23 17:16:57.210604569 +0100
@@ -43,7 +43,8 @@ enum dw_cfi_oprnd_type {
   dw_cfi_oprnd_reg_num,
   dw_cfi_oprnd_offset,
   dw_cfi_oprnd_addr,
-  dw_cfi_oprnd_loc
+  dw_cfi_oprnd_loc,
+  dw_cfi_oprnd_cfa_loc
 };
 
 typedef union GTY(()) {
@@ -51,6 +52,8 @@ typedef union GTY(()) {
   HOST_WIDE_INT GTY ((tag ("dw_cfi_oprnd_offset"))) dw_cfi_offset;
   const char * GTY ((tag ("dw_cfi_oprnd_addr"))) dw_cfi_addr;
   struct dw_loc_descr_node * GTY ((tag ("dw_cfi_oprnd_loc"))) dw_cfi_loc;
+  struct dw_cfa_location * GTY ((tag ("dw_cfi_oprnd_cfa_loc")))
+    dw_cfi_cfa_loc;
 } dw_cfi_oprnd;
 
 struct GTY(()) dw_cfi_node {
@@ -114,8 +117,8 @@ struct GTY(()) dw_fde_node {
    Instead of passing around REG and OFFSET, we pass a copy
    of this structure.  */
 struct GTY(()) dw_cfa_location {
-  HOST_WIDE_INT offset;
-  HOST_WIDE_INT base_offset;
+  poly_int64_pod offset;
+  poly_int64_pod base_offset;
   /* REG is in DWARF_FRAME_REGNUM space, *not* normal REGNO space.  */
   unsigned int reg;
   BOOL_BITFIELD indirect : 1;  /* 1 if CFA is accessed via a dereference.  */
Index: gcc/dwarf2cfi.c
===================================================================
--- gcc/dwarf2cfi.c	2017-10-23 17:07:41.013611927 +0100
+++ gcc/dwarf2cfi.c	2017-10-23 17:16:57.208604839 +0100
@@ -206,7 +206,7 @@ static GTY(()) unsigned long dwarf2out_c
 struct queued_reg_save {
   rtx reg;
   rtx saved_reg;
-  HOST_WIDE_INT cfa_offset;
+  poly_int64_pod cfa_offset;
 };
 
 
@@ -434,6 +434,16 @@ copy_cfi_row (dw_cfi_row *src)
   return dst;
 }
 
+/* Return a copy of an existing CFA location.  */
+
+static dw_cfa_location *
+copy_cfa (dw_cfa_location *src)
+{
+  dw_cfa_location *dst = ggc_alloc<dw_cfa_location> ();
+  *dst = *src;
+  return dst;
+}
+
 /* Generate a new label for the CFI info to refer to.  */
 
 static char *
@@ -629,7 +639,10 @@ lookup_cfa_1 (dw_cfi_ref cfi, dw_cfa_loc
       loc->offset = cfi->dw_cfi_oprnd2.dw_cfi_offset;
       break;
     case DW_CFA_def_cfa_expression:
-      get_cfa_from_loc_descr (loc, cfi->dw_cfi_oprnd1.dw_cfi_loc);
+      if (cfi->dw_cfi_oprnd2.dw_cfi_cfa_loc)
+	*loc = *cfi->dw_cfi_oprnd2.dw_cfi_cfa_loc;
+      else
+	get_cfa_from_loc_descr (loc, cfi->dw_cfi_oprnd1.dw_cfi_loc);
       break;
 
     case DW_CFA_remember_state:
@@ -654,10 +667,10 @@ lookup_cfa_1 (dw_cfi_ref cfi, dw_cfa_loc
 cfa_equal_p (const dw_cfa_location *loc1, const dw_cfa_location *loc2)
 {
   return (loc1->reg == loc2->reg
-	  && loc1->offset == loc2->offset
+	  && must_eq (loc1->offset, loc2->offset)
 	  && loc1->indirect == loc2->indirect
 	  && (loc1->indirect == 0
-	      || loc1->base_offset == loc2->base_offset));
+	      || must_eq (loc1->base_offset, loc2->base_offset)));
 }
 
 /* Determine if two CFI operands are identical.  */
@@ -678,6 +691,8 @@ cfi_oprnd_equal_p (enum dw_cfi_oprnd_typ
 	      || strcmp (a->dw_cfi_addr, b->dw_cfi_addr) == 0);
     case dw_cfi_oprnd_loc:
       return loc_descr_equal_p (a->dw_cfi_loc, b->dw_cfi_loc);
+    case dw_cfi_oprnd_cfa_loc:
+      return cfa_equal_p (a->dw_cfi_cfa_loc, b->dw_cfi_cfa_loc);
     }
   gcc_unreachable ();
 }
@@ -758,19 +773,23 @@ def_cfa_0 (dw_cfa_location *old_cfa, dw_
 
   cfi = new_cfi ();
 
-  if (new_cfa->reg == old_cfa->reg && !new_cfa->indirect && !old_cfa->indirect)
+  HOST_WIDE_INT const_offset;
+  if (new_cfa->reg == old_cfa->reg
+      && !new_cfa->indirect
+      && !old_cfa->indirect
+      && new_cfa->offset.is_constant (&const_offset))
     {
       /* Construct a "DW_CFA_def_cfa_offset <offset>" instruction, indicating
 	 the CFA register did not change but the offset did.  The data
 	 factoring for DW_CFA_def_cfa_offset_sf happens in output_cfi, or
 	 in the assembler via the .cfi_def_cfa_offset directive.  */
-      if (new_cfa->offset < 0)
+      if (const_offset < 0)
 	cfi->dw_cfi_opc = DW_CFA_def_cfa_offset_sf;
       else
 	cfi->dw_cfi_opc = DW_CFA_def_cfa_offset;
-      cfi->dw_cfi_oprnd1.dw_cfi_offset = new_cfa->offset;
+      cfi->dw_cfi_oprnd1.dw_cfi_offset = const_offset;
     }
-  else if (new_cfa->offset == old_cfa->offset
+  else if (must_eq (new_cfa->offset, old_cfa->offset)
 	   && old_cfa->reg != INVALID_REGNUM
 	   && !new_cfa->indirect
 	   && !old_cfa->indirect)
@@ -781,19 +800,20 @@ def_cfa_0 (dw_cfa_location *old_cfa, dw_
       cfi->dw_cfi_opc = DW_CFA_def_cfa_register;
       cfi->dw_cfi_oprnd1.dw_cfi_reg_num = new_cfa->reg;
     }
-  else if (new_cfa->indirect == 0)
+  else if (new_cfa->indirect == 0
+	   && new_cfa->offset.is_constant (&const_offset))
     {
       /* Construct a "DW_CFA_def_cfa <register> <offset>" instruction,
 	 indicating the CFA register has changed to <register> with
 	 the specified offset.  The data factoring for DW_CFA_def_cfa_sf
 	 happens in output_cfi, or in the assembler via the .cfi_def_cfa
 	 directive.  */
-      if (new_cfa->offset < 0)
+      if (const_offset < 0)
 	cfi->dw_cfi_opc = DW_CFA_def_cfa_sf;
       else
 	cfi->dw_cfi_opc = DW_CFA_def_cfa;
       cfi->dw_cfi_oprnd1.dw_cfi_reg_num = new_cfa->reg;
-      cfi->dw_cfi_oprnd2.dw_cfi_offset = new_cfa->offset;
+      cfi->dw_cfi_oprnd2.dw_cfi_offset = const_offset;
     }
   else
     {
@@ -805,6 +825,13 @@ def_cfa_0 (dw_cfa_location *old_cfa, dw_
       cfi->dw_cfi_opc = DW_CFA_def_cfa_expression;
       loc_list = build_cfa_loc (new_cfa, 0);
       cfi->dw_cfi_oprnd1.dw_cfi_loc = loc_list;
+      if (!new_cfa->offset.is_constant ()
+	  || !new_cfa->base_offset.is_constant ())
+	/* It's hard to reconstruct the CFA location for a polynomial
+	   expression, so just cache it instead.  */
+	cfi->dw_cfi_oprnd2.dw_cfi_cfa_loc = copy_cfa (new_cfa);
+      else
+	cfi->dw_cfi_oprnd2.dw_cfi_cfa_loc = NULL;
     }
 
   return cfi;
@@ -836,33 +863,42 @@ def_cfa_1 (dw_cfa_location *new_cfa)
    otherwise it is saved in SREG.  */
 
 static void
-reg_save (unsigned int reg, unsigned int sreg, HOST_WIDE_INT offset)
+reg_save (unsigned int reg, unsigned int sreg, poly_int64 offset)
 {
   dw_fde_ref fde = cfun ? cfun->fde : NULL;
   dw_cfi_ref cfi = new_cfi ();
 
   cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
 
-  /* When stack is aligned, store REG using DW_CFA_expression with FP.  */
-  if (fde
-      && fde->stack_realign
-      && sreg == INVALID_REGNUM)
-    {
-      cfi->dw_cfi_opc = DW_CFA_expression;
-      cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
-      cfi->dw_cfi_oprnd2.dw_cfi_loc
-	= build_cfa_aligned_loc (&cur_row->cfa, offset,
-				 fde->stack_realignment);
-    }
-  else if (sreg == INVALID_REGNUM)
-    {
-      if (need_data_align_sf_opcode (offset))
-	cfi->dw_cfi_opc = DW_CFA_offset_extended_sf;
-      else if (reg & ~0x3f)
-	cfi->dw_cfi_opc = DW_CFA_offset_extended;
+  if (sreg == INVALID_REGNUM)
+    {
+      HOST_WIDE_INT const_offset;
+      /* When stack is aligned, store REG using DW_CFA_expression with FP.  */
+      if (fde && fde->stack_realign)
+	{
+	  cfi->dw_cfi_opc = DW_CFA_expression;
+	  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
+	  cfi->dw_cfi_oprnd2.dw_cfi_loc
+	    = build_cfa_aligned_loc (&cur_row->cfa, offset,
+				     fde->stack_realignment);
+	}
+      else if (offset.is_constant (&const_offset))
+	{
+	  if (need_data_align_sf_opcode (const_offset))
+	    cfi->dw_cfi_opc = DW_CFA_offset_extended_sf;
+	  else if (reg & ~0x3f)
+	    cfi->dw_cfi_opc = DW_CFA_offset_extended;
+	  else
+	    cfi->dw_cfi_opc = DW_CFA_offset;
+	  cfi->dw_cfi_oprnd2.dw_cfi_offset = const_offset;
+	}
       else
-	cfi->dw_cfi_opc = DW_CFA_offset;
-      cfi->dw_cfi_oprnd2.dw_cfi_offset = offset;
+	{
+	  cfi->dw_cfi_opc = DW_CFA_expression;
+	  cfi->dw_cfi_oprnd1.dw_cfi_reg_num = reg;
+	  cfi->dw_cfi_oprnd2.dw_cfi_loc
+	    = build_cfa_loc (&cur_row->cfa, offset);
+	}
     }
   else if (sreg == reg)
     {
@@ -995,7 +1031,7 @@ record_reg_saved_in_reg (rtx dest, rtx s
    SREG, or if SREG is NULL then it is saved at OFFSET to the CFA.  */
 
 static void
-queue_reg_save (rtx reg, rtx sreg, HOST_WIDE_INT offset)
+queue_reg_save (rtx reg, rtx sreg, poly_int64 offset)
 {
   queued_reg_save *q;
   queued_reg_save e = {reg, sreg, offset};
@@ -1097,20 +1133,11 @@ dwarf2out_frame_debug_def_cfa (rtx pat)
 {
   memset (cur_cfa, 0, sizeof (*cur_cfa));
 
-  if (GET_CODE (pat) == PLUS)
-    {
-      cur_cfa->offset = INTVAL (XEXP (pat, 1));
-      pat = XEXP (pat, 0);
-    }
+  pat = strip_offset (pat, &cur_cfa->offset);
   if (MEM_P (pat))
     {
       cur_cfa->indirect = 1;
-      pat = XEXP (pat, 0);
-      if (GET_CODE (pat) == PLUS)
-	{
-	  cur_cfa->base_offset = INTVAL (XEXP (pat, 1));
-	  pat = XEXP (pat, 0);
-	}
+      pat = strip_offset (XEXP (pat, 0), &cur_cfa->base_offset);
     }
   /* ??? If this fails, we could be calling into the _loc functions to
      define a full expression.  So far no port does that.  */
@@ -1133,7 +1160,7 @@ dwarf2out_frame_debug_adjust_cfa (rtx pa
     {
     case PLUS:
       gcc_assert (dwf_regno (XEXP (src, 0)) == cur_cfa->reg);
-      cur_cfa->offset -= INTVAL (XEXP (src, 1));
+      cur_cfa->offset -= rtx_to_poly_int64 (XEXP (src, 1));
       break;
 
     case REG:
@@ -1152,7 +1179,7 @@ dwarf2out_frame_debug_adjust_cfa (rtx pa
 static void
 dwarf2out_frame_debug_cfa_offset (rtx set)
 {
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
   rtx src, addr, span;
   unsigned int sregno;
 
@@ -1170,7 +1197,7 @@ dwarf2out_frame_debug_cfa_offset (rtx se
       break;
     case PLUS:
       gcc_assert (dwf_regno (XEXP (addr, 0)) == cur_cfa->reg);
-      offset = INTVAL (XEXP (addr, 1)) - cur_cfa->offset;
+      offset = rtx_to_poly_int64 (XEXP (addr, 1)) - cur_cfa->offset;
       break;
     default:
       gcc_unreachable ();
@@ -1195,7 +1222,7 @@ dwarf2out_frame_debug_cfa_offset (rtx se
     {
       /* We have a PARALLEL describing where the contents of SRC live.
    	 Adjust the offset for each piece of the PARALLEL.  */
-      HOST_WIDE_INT span_offset = offset;
+      poly_int64 span_offset = offset;
 
       gcc_assert (GET_CODE (span) == PARALLEL);
 
@@ -1535,7 +1562,7 @@ dwarf2out_frame_debug_cfa_window_save (v
 dwarf2out_frame_debug_expr (rtx expr)
 {
   rtx src, dest, span;
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
   dw_fde_ref fde;
 
   /* If RTX_FRAME_RELATED_P is set on a PARALLEL, process each member of
@@ -1639,19 +1666,14 @@ dwarf2out_frame_debug_expr (rtx expr)
 	    {
 	      /* Rule 2 */
 	      /* Adjusting SP.  */
-	      switch (GET_CODE (XEXP (src, 1)))
+	      if (REG_P (XEXP (src, 1)))
 		{
-		case CONST_INT:
-		  offset = INTVAL (XEXP (src, 1));
-		  break;
-		case REG:
 		  gcc_assert (dwf_regno (XEXP (src, 1))
 			      == cur_trace->cfa_temp.reg);
 		  offset = cur_trace->cfa_temp.offset;
-		  break;
-		default:
-		  gcc_unreachable ();
 		}
+	      else if (!poly_int_rtx_p (XEXP (src, 1), &offset))
+		gcc_unreachable ();
 
 	      if (XEXP (src, 0) == hard_frame_pointer_rtx)
 		{
@@ -1680,9 +1702,8 @@ dwarf2out_frame_debug_expr (rtx expr)
 	      gcc_assert (frame_pointer_needed);
 
 	      gcc_assert (REG_P (XEXP (src, 0))
-			  && dwf_regno (XEXP (src, 0)) == cur_cfa->reg
-			  && CONST_INT_P (XEXP (src, 1)));
-	      offset = INTVAL (XEXP (src, 1));
+			  && dwf_regno (XEXP (src, 0)) == cur_cfa->reg);
+	      offset = rtx_to_poly_int64 (XEXP (src, 1));
 	      if (GET_CODE (src) != MINUS)
 		offset = -offset;
 	      cur_cfa->offset += offset;
@@ -1695,11 +1716,11 @@ dwarf2out_frame_debug_expr (rtx expr)
 	      /* Rule 4 */
 	      if (REG_P (XEXP (src, 0))
 		  && dwf_regno (XEXP (src, 0)) == cur_cfa->reg
-		  && CONST_INT_P (XEXP (src, 1)))
+		  && poly_int_rtx_p (XEXP (src, 1), &offset))
 		{
 		  /* Setting a temporary CFA register that will be copied
 		     into the FP later on.  */
-		  offset = - INTVAL (XEXP (src, 1));
+		  offset = -offset;
 		  cur_cfa->offset += offset;
 		  cur_cfa->reg = dwf_regno (dest);
 		  /* Or used to save regs to the stack.  */
@@ -1722,11 +1743,9 @@ dwarf2out_frame_debug_expr (rtx expr)
 
 	      /* Rule 9 */
 	      else if (GET_CODE (src) == LO_SUM
-		       && CONST_INT_P (XEXP (src, 1)))
-		{
-		  cur_trace->cfa_temp.reg = dwf_regno (dest);
-		  cur_trace->cfa_temp.offset = INTVAL (XEXP (src, 1));
-		}
+		       && poly_int_rtx_p (XEXP (src, 1),
+					  &cur_trace->cfa_temp.offset))
+		cur_trace->cfa_temp.reg = dwf_regno (dest);
 	      else
 		gcc_unreachable ();
 	    }
@@ -1734,8 +1753,9 @@ dwarf2out_frame_debug_expr (rtx expr)
 
 	  /* Rule 6 */
 	case CONST_INT:
+	case POLY_INT_CST:
 	  cur_trace->cfa_temp.reg = dwf_regno (dest);
-	  cur_trace->cfa_temp.offset = INTVAL (src);
+	  cur_trace->cfa_temp.offset = rtx_to_poly_int64 (src);
 	  break;
 
 	  /* Rule 7 */
@@ -1745,7 +1765,11 @@ dwarf2out_frame_debug_expr (rtx expr)
 		      && CONST_INT_P (XEXP (src, 1)));
 
 	  cur_trace->cfa_temp.reg = dwf_regno (dest);
-	  cur_trace->cfa_temp.offset |= INTVAL (XEXP (src, 1));
+	  if (!can_ior_p (cur_trace->cfa_temp.offset, INTVAL (XEXP (src, 1)),
+			  &cur_trace->cfa_temp.offset))
+	    /* The target shouldn't generate this kind of CFI note if we
+	       can't represent it.  */
+	    gcc_unreachable ();
 	  break;
 
 	  /* Skip over HIGH, assuming it will be followed by a LO_SUM,
@@ -1800,9 +1824,7 @@ dwarf2out_frame_debug_expr (rtx expr)
 	case PRE_MODIFY:
 	case POST_MODIFY:
 	  /* We can't handle variable size modifications.  */
-	  gcc_assert (GET_CODE (XEXP (XEXP (XEXP (dest, 0), 1), 1))
-		      == CONST_INT);
-	  offset = -INTVAL (XEXP (XEXP (XEXP (dest, 0), 1), 1));
+	  offset = -rtx_to_poly_int64 (XEXP (XEXP (XEXP (dest, 0), 1), 1));
 
 	  gcc_assert (REGNO (XEXP (XEXP (dest, 0), 0)) == STACK_POINTER_REGNUM
 		      && cur_trace->cfa_store.reg == dw_stack_pointer_regnum);
@@ -1860,9 +1882,8 @@ dwarf2out_frame_debug_expr (rtx expr)
 	  {
 	    unsigned int regno;
 
-	    gcc_assert (CONST_INT_P (XEXP (XEXP (dest, 0), 1))
-			&& REG_P (XEXP (XEXP (dest, 0), 0)));
-	    offset = INTVAL (XEXP (XEXP (dest, 0), 1));
+	    gcc_assert (REG_P (XEXP (XEXP (dest, 0), 0)));
+	    offset = rtx_to_poly_int64 (XEXP (XEXP (dest, 0), 1));
 	    if (GET_CODE (XEXP (dest, 0)) == MINUS)
 	      offset = -offset;
 
@@ -1923,7 +1944,7 @@ dwarf2out_frame_debug_expr (rtx expr)
 	{
 	  /* We're storing the current CFA reg into the stack.  */
 
-	  if (cur_cfa->offset == 0)
+	  if (known_zero (cur_cfa->offset))
 	    {
               /* Rule 19 */
               /* If stack is aligned, putting CFA reg into stack means
@@ -1981,7 +2002,7 @@ dwarf2out_frame_debug_expr (rtx expr)
 	{
 	  /* We have a PARALLEL describing where the contents of SRC live.
 	     Queue register saves for each piece of the PARALLEL.  */
-	  HOST_WIDE_INT span_offset = offset;
+	  poly_int64 span_offset = offset;
 
 	  gcc_assert (GET_CODE (span) == PARALLEL);
 
@@ -2884,7 +2905,7 @@ create_pseudo_cfg (void)
 initial_return_save (rtx rtl)
 {
   unsigned int reg = INVALID_REGNUM;
-  HOST_WIDE_INT offset = 0;
+  poly_int64 offset = 0;
 
   switch (GET_CODE (rtl))
     {
@@ -2905,12 +2926,12 @@ initial_return_save (rtx rtl)
 
 	case PLUS:
 	  gcc_assert (REGNO (XEXP (rtl, 0)) == STACK_POINTER_REGNUM);
-	  offset = INTVAL (XEXP (rtl, 1));
+	  offset = rtx_to_poly_int64 (XEXP (rtl, 1));
 	  break;
 
 	case MINUS:
 	  gcc_assert (REGNO (XEXP (rtl, 0)) == STACK_POINTER_REGNUM);
-	  offset = -INTVAL (XEXP (rtl, 1));
+	  offset = -rtx_to_poly_int64 (XEXP (rtl, 1));
 	  break;
 
 	default:
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 17:16:50.362529357 +0100
+++ gcc/dwarf2out.c	2017-10-23 17:16:57.210604569 +0100
@@ -570,6 +570,9 @@ dw_cfi_oprnd2_desc (enum dwarf_call_fram
     case DW_CFA_val_expression:
       return dw_cfi_oprnd_loc;
 
+    case DW_CFA_def_cfa_expression:
+      return dw_cfi_oprnd_cfa_loc;
+
     default:
       return dw_cfi_oprnd_unused;
     }

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [026/nnn] poly_int: operand_subword
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (24 preceding siblings ...)
  2017-10-23 17:10 ` [024/nnn] poly_int: ira subreg liveness tracking Richard Sandiford
@ 2017-10-23 17:11 ` Richard Sandiford
  2017-11-28 17:51   ` Jeff Law
  2017-10-23 17:11 ` [027/nnn] poly_int: DWARF CFA offsets Richard Sandiford
                   ` (81 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:11 UTC (permalink / raw)
  To: gcc-patches

This patch makes operand_subword and operand_subword_force take
polynomial offsets.  This is a fairly old-school interface and
these days should only be used when splitting multiword operations
into word operations.  It still doesn't hurt to support polynomial
offsets and it helps make callers easier to write.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtl.h (operand_subword, operand_subword_force): Take the offset
	as a poly_uint64 an unsigned int.
	* emit-rtl.c (operand_subword, operand_subword_force): Likewise.

Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 17:16:50.374527737 +0100
+++ gcc/rtl.h	2017-10-23 17:16:55.754801166 +0100
@@ -3017,10 +3017,10 @@ extern rtx gen_lowpart_if_possible (mach
 /* In emit-rtl.c */
 extern rtx gen_highpart (machine_mode, rtx);
 extern rtx gen_highpart_mode (machine_mode, machine_mode, rtx);
-extern rtx operand_subword (rtx, unsigned int, int, machine_mode);
+extern rtx operand_subword (rtx, poly_uint64, int, machine_mode);
 
 /* In emit-rtl.c */
-extern rtx operand_subword_force (rtx, unsigned int, machine_mode);
+extern rtx operand_subword_force (rtx, poly_uint64, machine_mode);
 extern int subreg_lowpart_p (const_rtx);
 extern poly_uint64 subreg_size_lowpart_offset (poly_uint64, poly_uint64);
 
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 17:16:50.363529222 +0100
+++ gcc/emit-rtl.c	2017-10-23 17:16:55.754801166 +0100
@@ -1736,7 +1736,8 @@ subreg_lowpart_p (const_rtx x)
  */
 
 rtx
-operand_subword (rtx op, unsigned int offset, int validate_address, machine_mode mode)
+operand_subword (rtx op, poly_uint64 offset, int validate_address,
+		 machine_mode mode)
 {
   if (mode == VOIDmode)
     mode = GET_MODE (op);
@@ -1745,12 +1746,12 @@ operand_subword (rtx op, unsigned int of
 
   /* If OP is narrower than a word, fail.  */
   if (mode != BLKmode
-      && (GET_MODE_SIZE (mode) < UNITS_PER_WORD))
+      && may_lt (GET_MODE_SIZE (mode), UNITS_PER_WORD))
     return 0;
 
   /* If we want a word outside OP, return zero.  */
   if (mode != BLKmode
-      && (offset + 1) * UNITS_PER_WORD > GET_MODE_SIZE (mode))
+      && may_gt ((offset + 1) * UNITS_PER_WORD, GET_MODE_SIZE (mode)))
     return const0_rtx;
 
   /* Form a new MEM at the requested address.  */
@@ -1784,7 +1785,7 @@ operand_subword (rtx op, unsigned int of
    MODE is the mode of OP, in case it is CONST_INT.  */
 
 rtx
-operand_subword_force (rtx op, unsigned int offset, machine_mode mode)
+operand_subword_force (rtx op, poly_uint64 offset, machine_mode mode)
 {
   rtx result = operand_subword (op, offset, 1, mode);
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [029/nnn] poly_int: get_ref_base_and_extent
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (27 preceding siblings ...)
  2017-10-23 17:12 ` [030/nnn] poly_int: get_addr_unit_base_and_extent Richard Sandiford
@ 2017-10-23 17:12 ` Richard Sandiford
  2017-12-06 20:03   ` Jeff Law
  2017-10-23 17:12 ` [028/nnn] poly_int: ipa_parm_adjustment Richard Sandiford
                   ` (78 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:12 UTC (permalink / raw)
  To: gcc-patches

This patch changes the types of the bit offsets and sizes returned
by get_ref_base_and_extent to poly_int64.

There are some callers that can't sensibly operate on polynomial
offsets or handle cases where the offset and size aren't known
exactly.  This includes the IPA devirtualisation code (since
there's no defined way of having vtables at variable offsets)
and some parts of the DWARF code.  The patch therefore adds
a helper function get_ref_base_and_extent_hwi that either returns
exact HOST_WIDE_INT bit positions and sizes or returns a null
base to indicate failure.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-dfa.h (get_ref_base_and_extent): Return the base, size and
	max_size as poly_int64_pods rather than HOST_WIDE_INTs.
	(get_ref_base_and_extent_hwi): Declare.
	* tree-dfa.c (get_ref_base_and_extent): Return the base, size and
	max_size as poly_int64_pods rather than HOST_WIDE_INTs.
	(get_ref_base_and_extent_hwi): New function.
	* cfgexpand.c (expand_debug_expr): Update call to
	get_ref_base_and_extent.
	* dwarf2out.c (add_var_loc_to_decl): Likewise.
	* gimple-fold.c (get_base_constructor): Return the offset as a
	poly_int64_pod rather than a HOST_WIDE_INT.
	(fold_const_aggregate_ref_1): Track polynomial sizes and offsets.
	* ipa-polymorphic-call.c
	(ipa_polymorphic_call_context::set_by_invariant)
	(extr_type_from_vtbl_ptr_store): Track polynomial offsets.
	(ipa_polymorphic_call_context::ipa_polymorphic_call_context)
	(check_stmt_for_type_change): Use get_ref_base_and_extent_hwi
	rather than get_ref_base_and_extent.
	(ipa_polymorphic_call_context::get_dynamic_type): Likewise.
	* ipa-prop.c (ipa_load_from_parm_agg, compute_complex_assign_jump_func)
	(get_ancestor_addr_info, determine_locally_known_aggregate_parts):
	Likewise.
	(ipa_get_adjustment_candidate): Update call to get_ref_base_and_extent.
	* tree-sra.c (create_access, get_access_for_expr): Likewise.
	* tree-ssa-alias.c (ao_ref_base, aliasing_component_refs_p)
	(stmt_kills_ref_p): Likewise.
	* tree-ssa-dce.c (mark_aliased_reaching_defs_necessary_1): Likewise.
	* tree-ssa-scopedtables.c (avail_expr_hash, equal_mem_array_ref_p):
	Likewise.
	* tree-ssa-sccvn.c (vn_reference_lookup_3): Likewise.
	Use get_ref_base_and_extent_hwi rather than get_ref_base_and_extent
	when calling native_encode_expr.
	* tree-ssa-structalias.c (get_constraint_for_component_ref): Update
	call to get_ref_base_and_extent.
	(do_structure_copy): Use get_ref_base_and_extent_hwi rather than
	get_ref_base_and_extent.
	* var-tracking.c (track_expr_p): Likewise.

Index: gcc/tree-dfa.h
===================================================================
--- gcc/tree-dfa.h	2017-10-23 17:07:40.909726192 +0100
+++ gcc/tree-dfa.h	2017-10-23 17:16:59.705267681 +0100
@@ -29,8 +29,10 @@ extern void debug_dfa_stats (void);
 extern tree ssa_default_def (struct function *, tree);
 extern void set_ssa_default_def (struct function *, tree, tree);
 extern tree get_or_create_ssa_default_def (struct function *, tree);
-extern tree get_ref_base_and_extent (tree, HOST_WIDE_INT *,
-				     HOST_WIDE_INT *, HOST_WIDE_INT *, bool *);
+extern tree get_ref_base_and_extent (tree, poly_int64_pod *, poly_int64_pod *,
+				     poly_int64_pod *, bool *);
+extern tree get_ref_base_and_extent_hwi (tree, HOST_WIDE_INT *,
+					 HOST_WIDE_INT *, bool *);
 extern tree get_addr_base_and_unit_offset_1 (tree, HOST_WIDE_INT *,
 					     tree (*) (tree));
 extern tree get_addr_base_and_unit_offset (tree, HOST_WIDE_INT *);
Index: gcc/tree-dfa.c
===================================================================
--- gcc/tree-dfa.c	2017-10-23 17:07:40.909726192 +0100
+++ gcc/tree-dfa.c	2017-10-23 17:16:59.705267681 +0100
@@ -377,15 +377,15 @@ get_or_create_ssa_default_def (struct fu
    true, the storage order of the reference is reversed.  */
 
 tree
-get_ref_base_and_extent (tree exp, HOST_WIDE_INT *poffset,
-			 HOST_WIDE_INT *psize,
-			 HOST_WIDE_INT *pmax_size,
+get_ref_base_and_extent (tree exp, poly_int64_pod *poffset,
+			 poly_int64_pod *psize,
+			 poly_int64_pod *pmax_size,
 			 bool *preverse)
 {
-  offset_int bitsize = -1;
-  offset_int maxsize;
+  poly_offset_int bitsize = -1;
+  poly_offset_int maxsize;
   tree size_tree = NULL_TREE;
-  offset_int bit_offset = 0;
+  poly_offset_int bit_offset = 0;
   bool seen_variable_array_ref = false;
 
   /* First get the final access size and the storage order from just the
@@ -400,11 +400,11 @@ get_ref_base_and_extent (tree exp, HOST_
       if (mode == BLKmode)
 	size_tree = TYPE_SIZE (TREE_TYPE (exp));
       else
-	bitsize = int (GET_MODE_BITSIZE (mode));
+	bitsize = GET_MODE_BITSIZE (mode);
     }
   if (size_tree != NULL_TREE
-      && TREE_CODE (size_tree) == INTEGER_CST)
-    bitsize = wi::to_offset (size_tree);
+      && poly_int_tree_p (size_tree))
+    bitsize = wi::to_poly_offset (size_tree);
 
   *preverse = reverse_storage_order_for_component_p (exp);
 
@@ -419,7 +419,7 @@ get_ref_base_and_extent (tree exp, HOST_
       switch (TREE_CODE (exp))
 	{
 	case BIT_FIELD_REF:
-	  bit_offset += wi::to_offset (TREE_OPERAND (exp, 2));
+	  bit_offset += wi::to_poly_offset (TREE_OPERAND (exp, 2));
 	  break;
 
 	case COMPONENT_REF:
@@ -427,10 +427,10 @@ get_ref_base_and_extent (tree exp, HOST_
 	    tree field = TREE_OPERAND (exp, 1);
 	    tree this_offset = component_ref_field_offset (exp);
 
-	    if (this_offset && TREE_CODE (this_offset) == INTEGER_CST)
+	    if (this_offset && poly_int_tree_p (this_offset))
 	      {
-		offset_int woffset = (wi::to_offset (this_offset)
-				      << LOG2_BITS_PER_UNIT);
+		poly_offset_int woffset = (wi::to_poly_offset (this_offset)
+					   << LOG2_BITS_PER_UNIT);
 		woffset += wi::to_offset (DECL_FIELD_BIT_OFFSET (field));
 		bit_offset += woffset;
 
@@ -438,7 +438,7 @@ get_ref_base_and_extent (tree exp, HOST_
 		   referenced the last field of a struct or a union member
 		   then we have to adjust maxsize by the padding at the end
 		   of our field.  */
-		if (seen_variable_array_ref && maxsize != -1)
+		if (seen_variable_array_ref && known_size_p (maxsize))
 		  {
 		    tree stype = TREE_TYPE (TREE_OPERAND (exp, 0));
 		    tree next = DECL_CHAIN (field);
@@ -450,14 +450,15 @@ get_ref_base_and_extent (tree exp, HOST_
 			tree fsize = DECL_SIZE_UNIT (field);
 			tree ssize = TYPE_SIZE_UNIT (stype);
 			if (fsize == NULL
-			    || TREE_CODE (fsize) != INTEGER_CST
+			    || !poly_int_tree_p (fsize)
 			    || ssize == NULL
-			    || TREE_CODE (ssize) != INTEGER_CST)
+			    || !poly_int_tree_p (ssize))
 			  maxsize = -1;
 			else
 			  {
-			    offset_int tem = (wi::to_offset (ssize)
-					      - wi::to_offset (fsize));
+			    poly_offset_int tem
+			      = (wi::to_poly_offset (ssize)
+				 - wi::to_poly_offset (fsize));
 			    tem <<= LOG2_BITS_PER_UNIT;
 			    tem -= woffset;
 			    maxsize += tem;
@@ -471,10 +472,10 @@ get_ref_base_and_extent (tree exp, HOST_
 		/* We need to adjust maxsize to the whole structure bitsize.
 		   But we can subtract any constant offset seen so far,
 		   because that would get us out of the structure otherwise.  */
-		if (maxsize != -1
+		if (known_size_p (maxsize)
 		    && csize
-		    && TREE_CODE (csize) == INTEGER_CST)
-		  maxsize = wi::to_offset (csize) - bit_offset;
+		    && poly_int_tree_p (csize))
+		  maxsize = wi::to_poly_offset (csize) - bit_offset;
 		else
 		  maxsize = -1;
 	      }
@@ -488,14 +489,15 @@ get_ref_base_and_extent (tree exp, HOST_
 	    tree low_bound, unit_size;
 
 	    /* If the resulting bit-offset is constant, track it.  */
-	    if (TREE_CODE (index) == INTEGER_CST
+	    if (poly_int_tree_p (index)
 		&& (low_bound = array_ref_low_bound (exp),
- 		    TREE_CODE (low_bound) == INTEGER_CST)
+		    poly_int_tree_p (low_bound))
 		&& (unit_size = array_ref_element_size (exp),
 		    TREE_CODE (unit_size) == INTEGER_CST))
 	      {
-		offset_int woffset
-		  = wi::sext (wi::to_offset (index) - wi::to_offset (low_bound),
+		poly_offset_int woffset
+		  = wi::sext (wi::to_poly_offset (index)
+			      - wi::to_poly_offset (low_bound),
 			      TYPE_PRECISION (TREE_TYPE (index)));
 		woffset *= wi::to_offset (unit_size);
 		woffset <<= LOG2_BITS_PER_UNIT;
@@ -512,10 +514,10 @@ get_ref_base_and_extent (tree exp, HOST_
 		/* We need to adjust maxsize to the whole array bitsize.
 		   But we can subtract any constant offset seen so far,
 		   because that would get us outside of the array otherwise.  */
-		if (maxsize != -1
+		if (known_size_p (maxsize)
 		    && asize
-		    && TREE_CODE (asize) == INTEGER_CST)
-		  maxsize = wi::to_offset (asize) - bit_offset;
+		    && poly_int_tree_p (asize))
+		  maxsize = wi::to_poly_offset (asize) - bit_offset;
 		else
 		  maxsize = -1;
 
@@ -560,11 +562,11 @@ get_ref_base_and_extent (tree exp, HOST_
 	     base type boundary.  This needs to include possible trailing
 	     padding that is there for alignment purposes.  */
 	  if (seen_variable_array_ref
-	      && maxsize != -1
+	      && known_size_p (maxsize)
 	      && (TYPE_SIZE (TREE_TYPE (exp)) == NULL_TREE
-		  || TREE_CODE (TYPE_SIZE (TREE_TYPE (exp))) != INTEGER_CST
-		  || (bit_offset + maxsize
-		      == wi::to_offset (TYPE_SIZE (TREE_TYPE (exp))))))
+		  || !poly_int_tree_p (TYPE_SIZE (TREE_TYPE (exp)))
+		  || may_eq (bit_offset + maxsize,
+			     wi::to_poly_offset (TYPE_SIZE (TREE_TYPE (exp))))))
 	    maxsize = -1;
 
 	  /* Hand back the decl for MEM[&decl, off].  */
@@ -574,12 +576,13 @@ get_ref_base_and_extent (tree exp, HOST_
 		exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0);
 	      else
 		{
-		  offset_int off = mem_ref_offset (exp);
+		  poly_offset_int off = mem_ref_offset (exp);
 		  off <<= LOG2_BITS_PER_UNIT;
 		  off += bit_offset;
-		  if (wi::fits_shwi_p (off))
+		  poly_int64 off_hwi;
+		  if (off.to_shwi (&off_hwi))
 		    {
-		      bit_offset = off;
+		      bit_offset = off_hwi;
 		      exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0);
 		    }
 		}
@@ -594,7 +597,7 @@ get_ref_base_and_extent (tree exp, HOST_
     }
 
  done:
-  if (!wi::fits_shwi_p (bitsize) || wi::neg_p (bitsize))
+  if (!bitsize.to_shwi (psize) || may_lt (*psize, 0))
     {
       *poffset = 0;
       *psize = -1;
@@ -603,9 +606,10 @@ get_ref_base_and_extent (tree exp, HOST_
       return exp;
     }
 
-  *psize = bitsize.to_shwi ();
-
-  if (!wi::fits_shwi_p (bit_offset))
+  /* ???  Due to negative offsets in ARRAY_REF we can end up with
+     negative bit_offset here.  We might want to store a zero offset
+     in this case.  */
+  if (!bit_offset.to_shwi (poffset))
     {
       *poffset = 0;
       *pmax_size = -1;
@@ -625,44 +629,37 @@ get_ref_base_and_extent (tree exp, HOST_
 	  if (TREE_CODE (TREE_TYPE (exp)) == ARRAY_TYPE
 	      || (seen_variable_array_ref
 		  && (sz_tree == NULL_TREE
-		      || TREE_CODE (sz_tree) != INTEGER_CST
-		      || (bit_offset + maxsize == wi::to_offset (sz_tree)))))
+		      || !poly_int_tree_p (sz_tree)
+		      || may_eq (bit_offset + maxsize,
+				 wi::to_poly_offset (sz_tree)))))
 	    maxsize = -1;
 	}
       /* If maxsize is unknown adjust it according to the size of the
          base decl.  */
-      else if (maxsize == -1
-	  && DECL_SIZE (exp)
-	  && TREE_CODE (DECL_SIZE (exp)) == INTEGER_CST)
-	maxsize = wi::to_offset (DECL_SIZE (exp)) - bit_offset;
+      else if (!known_size_p (maxsize)
+	       && DECL_SIZE (exp)
+	       && poly_int_tree_p (DECL_SIZE (exp)))
+	maxsize = wi::to_poly_offset (DECL_SIZE (exp)) - bit_offset;
     }
   else if (CONSTANT_CLASS_P (exp))
     {
       /* If maxsize is unknown adjust it according to the size of the
          base type constant.  */
-      if (maxsize == -1
+      if (!known_size_p (maxsize)
 	  && TYPE_SIZE (TREE_TYPE (exp))
-	  && TREE_CODE (TYPE_SIZE (TREE_TYPE (exp))) == INTEGER_CST)
-	maxsize = (wi::to_offset (TYPE_SIZE (TREE_TYPE (exp)))
+	  && poly_int_tree_p (TYPE_SIZE (TREE_TYPE (exp))))
+	maxsize = (wi::to_poly_offset (TYPE_SIZE (TREE_TYPE (exp)))
 		   - bit_offset);
     }
 
-  /* ???  Due to negative offsets in ARRAY_REF we can end up with
-     negative bit_offset here.  We might want to store a zero offset
-     in this case.  */
-  *poffset = bit_offset.to_shwi ();
-  if (!wi::fits_shwi_p (maxsize) || wi::neg_p (maxsize))
+  if (!maxsize.to_shwi (pmax_size)
+      || may_lt (*pmax_size, 0)
+      || !endpoint_representable_p (*poffset, *pmax_size))
     *pmax_size = -1;
-  else
-    {
-      *pmax_size = maxsize.to_shwi ();
-      if (*poffset > HOST_WIDE_INT_MAX - *pmax_size)
-	*pmax_size = -1;
-    }
 
   /* Punt if *POFFSET + *PSIZE overflows in HOST_WIDE_INT, the callers don't
      check for such overflows individually and assume it works.  */
-  if (*psize != -1 && *poffset > HOST_WIDE_INT_MAX - *psize)
+  if (!endpoint_representable_p (*poffset, *psize))
     {
       *poffset = 0;
       *psize = -1;
@@ -674,6 +671,32 @@ get_ref_base_and_extent (tree exp, HOST_
   return exp;
 }
 
+/* Like get_ref_base_and_extent, but for cases in which we only care
+   about constant-width accesses at constant offsets.  Return null
+   if the access is anything else.  */
+
+tree
+get_ref_base_and_extent_hwi (tree exp, HOST_WIDE_INT *poffset,
+			     HOST_WIDE_INT *psize, bool *preverse)
+{
+  poly_int64 offset, size, max_size;
+  HOST_WIDE_INT const_offset, const_size;
+  bool reverse;
+  tree decl = get_ref_base_and_extent (exp, &offset, &size, &max_size,
+				       &reverse);
+  if (!offset.is_constant (&const_offset)
+      || !size.is_constant (&const_size)
+      || const_offset < 0
+      || !known_size_p (max_size)
+      || may_ne (max_size, const_size))
+    return NULL_TREE;
+
+  *poffset = const_offset;
+  *psize = const_size;
+  *preverse = reverse;
+  return decl;
+}
+
 /* Returns the base object and a constant BITS_PER_UNIT offset in *POFFSET that
    denotes the starting address of the memory access EXP.
    Returns NULL_TREE if the offset is not constant or any component
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-10-23 17:11:40.240937549 +0100
+++ gcc/cfgexpand.c	2017-10-23 17:16:59.700268356 +0100
@@ -4878,7 +4878,7 @@ expand_debug_expr (tree exp)
 
 	  if (handled_component_p (TREE_OPERAND (exp, 0)))
 	    {
-	      HOST_WIDE_INT bitoffset, bitsize, maxsize;
+	      poly_int64 bitoffset, bitsize, maxsize, byteoffset;
 	      bool reverse;
 	      tree decl
 		= get_ref_base_and_extent (TREE_OPERAND (exp, 0), &bitoffset,
@@ -4888,12 +4888,12 @@ expand_debug_expr (tree exp)
 		   || TREE_CODE (decl) == RESULT_DECL)
 		  && (!TREE_ADDRESSABLE (decl)
 		      || target_for_debug_bind (decl))
-		  && (bitoffset % BITS_PER_UNIT) == 0
-		  && bitsize > 0
-		  && bitsize == maxsize)
+		  && multiple_p (bitoffset, BITS_PER_UNIT, &byteoffset)
+		  && must_gt (bitsize, 0)
+		  && must_eq (bitsize, maxsize))
 		{
 		  rtx base = gen_rtx_DEBUG_IMPLICIT_PTR (mode, decl);
-		  return plus_constant (mode, base, bitoffset / BITS_PER_UNIT);
+		  return plus_constant (mode, base, byteoffset);
 		}
 	    }
 
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 17:16:57.210604569 +0100
+++ gcc/dwarf2out.c	2017-10-23 17:16:59.703267951 +0100
@@ -5914,17 +5914,15 @@ add_var_loc_to_decl (tree decl, rtx loc_
 	  || (TREE_CODE (realdecl) == MEM_REF
 	      && TREE_CODE (TREE_OPERAND (realdecl, 0)) == ADDR_EXPR))
 	{
-	  HOST_WIDE_INT maxsize;
 	  bool reverse;
-	  tree innerdecl
-	    = get_ref_base_and_extent (realdecl, &bitpos, &bitsize, &maxsize,
-				       &reverse);
-	  if (!DECL_P (innerdecl)
+	  tree innerdecl = get_ref_base_and_extent_hwi (realdecl, &bitpos,
+							&bitsize, &reverse);
+	  if (!innerdecl
+	      || !DECL_P (innerdecl)
 	      || DECL_IGNORED_P (innerdecl)
 	      || TREE_STATIC (innerdecl)
-	      || bitsize <= 0
-	      || bitpos + bitsize > 256
-	      || bitsize != maxsize)
+	      || bitsize == 0
+	      || bitpos + bitsize > 256)
 	    return NULL;
 	  decl = innerdecl;
 	}
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	2017-10-23 17:11:40.321090726 +0100
+++ gcc/gimple-fold.c	2017-10-23 17:16:59.703267951 +0100
@@ -6171,10 +6171,10 @@ gimple_fold_stmt_to_constant (gimple *st
    is not explicitly available, but it is known to be zero
    such as 'static const int a;'.  */
 static tree
-get_base_constructor (tree base, HOST_WIDE_INT *bit_offset,
+get_base_constructor (tree base, poly_int64_pod *bit_offset,
 		      tree (*valueize)(tree))
 {
-  HOST_WIDE_INT bit_offset2, size, max_size;
+  poly_int64 bit_offset2, size, max_size;
   bool reverse;
 
   if (TREE_CODE (base) == MEM_REF)
@@ -6226,7 +6226,7 @@ get_base_constructor (tree base, HOST_WI
     case COMPONENT_REF:
       base = get_ref_base_and_extent (base, &bit_offset2, &size, &max_size,
 				      &reverse);
-      if (max_size == -1 || size != max_size)
+      if (!known_size_p (max_size) || may_ne (size, max_size))
 	return NULL_TREE;
       *bit_offset +=  bit_offset2;
       return get_base_constructor (base, bit_offset, valueize);
@@ -6437,7 +6437,7 @@ fold_ctor_reference (tree type, tree cto
 fold_const_aggregate_ref_1 (tree t, tree (*valueize) (tree))
 {
   tree ctor, idx, base;
-  HOST_WIDE_INT offset, size, max_size;
+  poly_int64 offset, size, max_size;
   tree tem;
   bool reverse;
 
@@ -6463,23 +6463,23 @@ fold_const_aggregate_ref_1 (tree t, tree
       if (TREE_CODE (TREE_OPERAND (t, 1)) == SSA_NAME
 	  && valueize
 	  && (idx = (*valueize) (TREE_OPERAND (t, 1)))
-	  && TREE_CODE (idx) == INTEGER_CST)
+	  && poly_int_tree_p (idx))
 	{
 	  tree low_bound, unit_size;
 
 	  /* If the resulting bit-offset is constant, track it.  */
 	  if ((low_bound = array_ref_low_bound (t),
-	       TREE_CODE (low_bound) == INTEGER_CST)
+	       poly_int_tree_p (low_bound))
 	      && (unit_size = array_ref_element_size (t),
 		  tree_fits_uhwi_p (unit_size)))
 	    {
-	      offset_int woffset
-		= wi::sext (wi::to_offset (idx) - wi::to_offset (low_bound),
+	      poly_offset_int woffset
+		= wi::sext (wi::to_poly_offset (idx)
+			    - wi::to_poly_offset (low_bound),
 			    TYPE_PRECISION (TREE_TYPE (idx)));
 
-	      if (wi::fits_shwi_p (woffset))
+	      if (woffset.to_shwi (&offset))
 		{
-		  offset = woffset.to_shwi ();
 		  /* TODO: This code seems wrong, multiply then check
 		     to see if it fits.  */
 		  offset *= tree_to_uhwi (unit_size);
@@ -6492,7 +6492,7 @@ fold_const_aggregate_ref_1 (tree t, tree
 		    return build_zero_cst (TREE_TYPE (t));
 		  /* Out of bound array access.  Value is undefined,
 		     but don't fold.  */
-		  if (offset < 0)
+		  if (may_lt (offset, 0))
 		    return NULL_TREE;
 		  /* We can not determine ctor.  */
 		  if (!ctor)
@@ -6517,14 +6517,14 @@ fold_const_aggregate_ref_1 (tree t, tree
       if (ctor == error_mark_node)
 	return build_zero_cst (TREE_TYPE (t));
       /* We do not know precise address.  */
-      if (max_size == -1 || max_size != size)
+      if (!known_size_p (max_size) || may_ne (max_size, size))
 	return NULL_TREE;
       /* We can not determine ctor.  */
       if (!ctor)
 	return NULL_TREE;
 
       /* Out of bound array access.  Value is undefined, but don't fold.  */
-      if (offset < 0)
+      if (may_lt (offset, 0))
 	return NULL_TREE;
 
       return fold_ctor_reference (TREE_TYPE (t), ctor, offset, size,
Index: gcc/ipa-polymorphic-call.c
===================================================================
--- gcc/ipa-polymorphic-call.c	2017-10-23 17:07:40.909726192 +0100
+++ gcc/ipa-polymorphic-call.c	2017-10-23 17:16:59.704267816 +0100
@@ -759,7 +759,7 @@ ipa_polymorphic_call_context::set_by_inv
 						tree otr_type,
 						HOST_WIDE_INT off)
 {
-  HOST_WIDE_INT offset2, size, max_size;
+  poly_int64 offset2, size, max_size;
   bool reverse;
   tree base;
 
@@ -772,7 +772,7 @@ ipa_polymorphic_call_context::set_by_inv
 
   cst = TREE_OPERAND (cst, 0);
   base = get_ref_base_and_extent (cst, &offset2, &size, &max_size, &reverse);
-  if (!DECL_P (base) || max_size == -1 || max_size != size)
+  if (!DECL_P (base) || !known_size_p (max_size) || may_ne (max_size, size))
     return false;
 
   /* Only type inconsistent programs can have otr_type that is
@@ -899,23 +899,21 @@ ipa_polymorphic_call_context::ipa_polymo
       base_pointer = walk_ssa_copies (base_pointer, &visited);
       if (TREE_CODE (base_pointer) == ADDR_EXPR)
 	{
-	  HOST_WIDE_INT size, max_size;
-	  HOST_WIDE_INT offset2;
+	  HOST_WIDE_INT offset2, size;
 	  bool reverse;
 	  tree base
-	    = get_ref_base_and_extent (TREE_OPERAND (base_pointer, 0),
-				       &offset2, &size, &max_size, &reverse);
+	    = get_ref_base_and_extent_hwi (TREE_OPERAND (base_pointer, 0),
+					   &offset2, &size, &reverse);
+	  if (!base)
+	    break;
 
-	  if (max_size != -1 && max_size == size)
-	    combine_speculation_with (TYPE_MAIN_VARIANT (TREE_TYPE (base)),
-				      offset + offset2,
-				      true,
-				      NULL /* Do not change outer type.  */);
+	  combine_speculation_with (TYPE_MAIN_VARIANT (TREE_TYPE (base)),
+				    offset + offset2,
+				    true,
+				    NULL /* Do not change outer type.  */);
 
 	  /* If this is a varying address, punt.  */
-	  if ((TREE_CODE (base) == MEM_REF || DECL_P (base))
-	      && max_size != -1
-	      && max_size == size)
+	  if (TREE_CODE (base) == MEM_REF || DECL_P (base))
 	    {
 	      /* We found dereference of a pointer.  Type of the pointer
 		 and MEM_REF is meaningless, but we can look futher.  */
@@ -1181,7 +1179,7 @@ noncall_stmt_may_be_vtbl_ptr_store (gimp
 extr_type_from_vtbl_ptr_store (gimple *stmt, struct type_change_info *tci,
 			       HOST_WIDE_INT *type_offset)
 {
-  HOST_WIDE_INT offset, size, max_size;
+  poly_int64 offset, size, max_size;
   tree lhs, rhs, base;
   bool reverse;
 
@@ -1263,17 +1261,23 @@ extr_type_from_vtbl_ptr_store (gimple *s
 	    }
 	  return tci->offset > POINTER_SIZE ? error_mark_node : NULL_TREE;
 	}
-      if (offset != tci->offset
-	  || size != POINTER_SIZE
-	  || max_size != POINTER_SIZE)
+      if (may_ne (offset, tci->offset)
+	  || may_ne (size, POINTER_SIZE)
+	  || may_ne (max_size, POINTER_SIZE))
 	{
 	  if (dump_file)
-	    fprintf (dump_file, "    wrong offset %i!=%i or size %i\n",
-		     (int)offset, (int)tci->offset, (int)size);
-	  return offset + POINTER_SIZE <= tci->offset
-	         || (max_size != -1
-		     && tci->offset + POINTER_SIZE > offset + max_size)
-		 ? error_mark_node : NULL;
+	    {
+	      fprintf (dump_file, "    wrong offset ");
+	      print_dec (offset, dump_file);
+	      fprintf (dump_file, "!=%i or size ", (int) tci->offset);
+	      print_dec (size, dump_file);
+	      fprintf (dump_file, "\n");
+	    }
+	  return (must_le (offset + POINTER_SIZE, tci->offset)
+		  || (known_size_p (max_size)
+		      && must_gt (tci->offset + POINTER_SIZE,
+				  offset + max_size))
+		  ? error_mark_node : NULL);
 	}
     }
 
@@ -1403,7 +1407,7 @@ check_stmt_for_type_change (ao_ref *ao A
       {
 	tree op = walk_ssa_copies (gimple_call_arg (stmt, 0));
 	tree type = TYPE_METHOD_BASETYPE (TREE_TYPE (fn));
-	HOST_WIDE_INT offset = 0, size, max_size;
+	HOST_WIDE_INT offset = 0;
 	bool reverse;
 
 	if (dump_file)
@@ -1415,14 +1419,15 @@ check_stmt_for_type_change (ao_ref *ao A
 	/* See if THIS parameter seems like instance pointer.  */
 	if (TREE_CODE (op) == ADDR_EXPR)
 	  {
-	    op = get_ref_base_and_extent (TREE_OPERAND (op, 0), &offset,
-					  &size, &max_size, &reverse);
-	    if (size != max_size || max_size == -1)
+	    HOST_WIDE_INT size;
+	    op = get_ref_base_and_extent_hwi (TREE_OPERAND (op, 0),
+					      &offset, &size, &reverse);
+	    if (!op)
 	      {
                 tci->speculative++;
 	        return csftc_abort_walking_p (tci->speculative);
 	      }
-	    if (op && TREE_CODE (op) == MEM_REF)
+	    if (TREE_CODE (op) == MEM_REF)
 	      {
 		if (!tree_fits_shwi_p (TREE_OPERAND (op, 1)))
 		  {
@@ -1578,7 +1583,6 @@ ipa_polymorphic_call_context::get_dynami
   if (gimple_code (call) == GIMPLE_CALL)
     {
       tree ref = gimple_call_fn (call);
-      HOST_WIDE_INT offset2, size, max_size;
       bool reverse;
 
       if (TREE_CODE (ref) == OBJ_TYPE_REF)
@@ -1608,10 +1612,11 @@ ipa_polymorphic_call_context::get_dynami
 		  && !SSA_NAME_IS_DEFAULT_DEF (ref)
 		  && gimple_assign_load_p (SSA_NAME_DEF_STMT (ref)))
 		{
+		  HOST_WIDE_INT offset2, size;
 		  tree ref_exp = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (ref));
 		  tree base_ref
-		    = get_ref_base_and_extent (ref_exp, &offset2, &size,
-					       &max_size, &reverse);
+		    = get_ref_base_and_extent_hwi (ref_exp, &offset2,
+						   &size, &reverse);
 
 		  /* Finally verify that what we found looks like read from
 		     OTR_OBJECT or from INSTANCE with offset OFFSET.  */
Index: gcc/ipa-prop.c
===================================================================
--- gcc/ipa-prop.c	2017-10-23 17:16:58.507429441 +0100
+++ gcc/ipa-prop.c	2017-10-23 17:16:59.704267816 +0100
@@ -1071,12 +1071,11 @@ ipa_load_from_parm_agg (struct ipa_func_
 			bool *by_ref_p, bool *guaranteed_unmodified)
 {
   int index;
-  HOST_WIDE_INT size, max_size;
+  HOST_WIDE_INT size;
   bool reverse;
-  tree base
-    = get_ref_base_and_extent (op, offset_p, &size, &max_size, &reverse);
+  tree base = get_ref_base_and_extent_hwi (op, offset_p, &size, &reverse);
 
-  if (max_size == -1 || max_size != size || *offset_p < 0)
+  if (!base)
     return false;
 
   if (DECL_P (base))
@@ -1204,7 +1203,7 @@ compute_complex_assign_jump_func (struct
 				  gcall *call, gimple *stmt, tree name,
 				  tree param_type)
 {
-  HOST_WIDE_INT offset, size, max_size;
+  HOST_WIDE_INT offset, size;
   tree op1, tc_ssa, base, ssa;
   bool reverse;
   int index;
@@ -1267,11 +1266,8 @@ compute_complex_assign_jump_func (struct
   op1 = TREE_OPERAND (op1, 0);
   if (TREE_CODE (TREE_TYPE (op1)) != RECORD_TYPE)
     return;
-  base = get_ref_base_and_extent (op1, &offset, &size, &max_size, &reverse);
-  if (TREE_CODE (base) != MEM_REF
-      /* If this is a varying address, punt.  */
-      || max_size == -1
-      || max_size != size)
+  base = get_ref_base_and_extent_hwi (op1, &offset, &size, &reverse);
+  if (!base || TREE_CODE (base) != MEM_REF)
     return;
   offset += mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT;
   ssa = TREE_OPERAND (base, 0);
@@ -1301,7 +1297,7 @@ compute_complex_assign_jump_func (struct
 static tree
 get_ancestor_addr_info (gimple *assign, tree *obj_p, HOST_WIDE_INT *offset)
 {
-  HOST_WIDE_INT size, max_size;
+  HOST_WIDE_INT size;
   tree expr, parm, obj;
   bool reverse;
 
@@ -1313,13 +1309,9 @@ get_ancestor_addr_info (gimple *assign,
     return NULL_TREE;
   expr = TREE_OPERAND (expr, 0);
   obj = expr;
-  expr = get_ref_base_and_extent (expr, offset, &size, &max_size, &reverse);
+  expr = get_ref_base_and_extent_hwi (expr, offset, &size, &reverse);
 
-  if (TREE_CODE (expr) != MEM_REF
-      /* If this is a varying address, punt.  */
-      || max_size == -1
-      || max_size != size
-      || *offset < 0)
+  if (!expr || TREE_CODE (expr) != MEM_REF)
     return NULL_TREE;
   parm = TREE_OPERAND (expr, 0);
   if (TREE_CODE (parm) != SSA_NAME
@@ -1581,15 +1573,12 @@ determine_locally_known_aggregate_parts
 	}
       else if (TREE_CODE (arg) == ADDR_EXPR)
 	{
-	  HOST_WIDE_INT arg_max_size;
 	  bool reverse;
 
 	  arg = TREE_OPERAND (arg, 0);
-	  arg_base = get_ref_base_and_extent (arg, &arg_offset, &arg_size,
-					      &arg_max_size, &reverse);
-	  if (arg_max_size == -1
-	      || arg_max_size != arg_size
-	      || arg_offset < 0)
+	  arg_base = get_ref_base_and_extent_hwi (arg, &arg_offset,
+						  &arg_size, &reverse);
+	  if (!arg_base)
 	    return;
 	  if (DECL_P (arg_base))
 	    {
@@ -1604,18 +1593,15 @@ determine_locally_known_aggregate_parts
     }
   else
     {
-      HOST_WIDE_INT arg_max_size;
       bool reverse;
 
       gcc_checking_assert (AGGREGATE_TYPE_P (TREE_TYPE (arg)));
 
       by_ref = false;
       check_ref = false;
-      arg_base = get_ref_base_and_extent (arg, &arg_offset, &arg_size,
-					  &arg_max_size, &reverse);
-      if (arg_max_size == -1
-	  || arg_max_size != arg_size
-	  || arg_offset < 0)
+      arg_base = get_ref_base_and_extent_hwi (arg, &arg_offset,
+					      &arg_size, &reverse);
+      if (!arg_base)
 	return;
 
       ao_ref_init (&r, arg);
@@ -1631,7 +1617,7 @@ determine_locally_known_aggregate_parts
     {
       struct ipa_known_agg_contents_list *n, **p;
       gimple *stmt = gsi_stmt (gsi);
-      HOST_WIDE_INT lhs_offset, lhs_size, lhs_max_size;
+      HOST_WIDE_INT lhs_offset, lhs_size;
       tree lhs, rhs, lhs_base;
       bool reverse;
 
@@ -1647,10 +1633,9 @@ determine_locally_known_aggregate_parts
 	  || contains_bitfld_component_ref_p (lhs))
 	break;
 
-      lhs_base = get_ref_base_and_extent (lhs, &lhs_offset, &lhs_size,
-					  &lhs_max_size, &reverse);
-      if (lhs_max_size == -1
-	  || lhs_max_size != lhs_size)
+      lhs_base = get_ref_base_and_extent_hwi (lhs, &lhs_offset,
+					      &lhs_size, &reverse);
+      if (!lhs_base)
 	break;
 
       if (check_ref)
@@ -4574,11 +4559,11 @@ ipa_get_adjustment_candidate (tree **exp
 	*convert = true;
     }
 
-  HOST_WIDE_INT offset, size, max_size;
+  poly_int64 offset, size, max_size;
   bool reverse;
   tree base
     = get_ref_base_and_extent (**expr, &offset, &size, &max_size, &reverse);
-  if (!base || size == -1 || max_size == -1)
+  if (!base || !known_size_p (size) || !known_size_p (max_size))
     return NULL;
 
   if (TREE_CODE (base) == MEM_REF)
Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c	2017-10-23 17:07:40.909726192 +0100
+++ gcc/tree-sra.c	2017-10-23 17:16:59.705267681 +0100
@@ -865,11 +865,20 @@ static bool maybe_add_sra_candidate (tre
 create_access (tree expr, gimple *stmt, bool write)
 {
   struct access *access;
+  poly_int64 poffset, psize, pmax_size;
   HOST_WIDE_INT offset, size, max_size;
   tree base = expr;
   bool reverse, ptr, unscalarizable_region = false;
 
-  base = get_ref_base_and_extent (expr, &offset, &size, &max_size, &reverse);
+  base = get_ref_base_and_extent (expr, &poffset, &psize, &pmax_size,
+				  &reverse);
+  if (!poffset.is_constant (&offset)
+      || !psize.is_constant (&size)
+      || !pmax_size.is_constant (&max_size))
+    {
+      disqualify_candidate (base, "Encountered a polynomial-sized access.");
+      return NULL;
+    }
 
   if (sra_mode == SRA_MODE_EARLY_IPA
       && TREE_CODE (base) == MEM_REF)
@@ -3048,7 +3057,8 @@ clobber_subtree (struct access *access,
 static struct access *
 get_access_for_expr (tree expr)
 {
-  HOST_WIDE_INT offset, size, max_size;
+  poly_int64 poffset, psize, pmax_size;
+  HOST_WIDE_INT offset, max_size;
   tree base;
   bool reverse;
 
@@ -3058,8 +3068,12 @@ get_access_for_expr (tree expr)
   if (TREE_CODE (expr) == VIEW_CONVERT_EXPR)
     expr = TREE_OPERAND (expr, 0);
 
-  base = get_ref_base_and_extent (expr, &offset, &size, &max_size, &reverse);
-  if (max_size == -1 || !DECL_P (base))
+  base = get_ref_base_and_extent (expr, &poffset, &psize, &pmax_size,
+				  &reverse);
+  if (!known_size_p (pmax_size)
+      || !pmax_size.is_constant (&max_size)
+      || !poffset.is_constant (&offset)
+      || !DECL_P (base))
     return NULL;
 
   if (!bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c	2017-10-23 17:11:40.347140508 +0100
+++ gcc/tree-ssa-alias.c	2017-10-23 17:16:59.705267681 +0100
@@ -635,7 +635,7 @@ ao_ref_init (ao_ref *r, tree ref)
 ao_ref_base (ao_ref *ref)
 {
   bool reverse;
-  HOST_WIDE_INT offset, size, max_size;
+  poly_int64 offset, size, max_size;
 
   if (ref->base)
     return ref->base;
@@ -823,7 +823,7 @@ aliasing_component_refs_p (tree ref1,
     return true;
   else if (same_p == 1)
     {
-      HOST_WIDE_INT offadj, sztmp, msztmp;
+      poly_int64 offadj, sztmp, msztmp;
       bool reverse;
       get_ref_base_and_extent (*refp, &offadj, &sztmp, &msztmp, &reverse);
       offset2 -= offadj;
@@ -842,7 +842,7 @@ aliasing_component_refs_p (tree ref1,
     return true;
   else if (same_p == 1)
     {
-      HOST_WIDE_INT offadj, sztmp, msztmp;
+      poly_int64 offadj, sztmp, msztmp;
       bool reverse;
       get_ref_base_and_extent (*refp, &offadj, &sztmp, &msztmp, &reverse);
       offset1 -= offadj;
@@ -2450,15 +2450,12 @@ stmt_kills_ref_p (gimple *stmt, ao_ref *
 	 the access properly.  */
       if (!ref->max_size_known_p ())
 	return false;
-      HOST_WIDE_INT size, max_size, const_offset;
-      poly_int64 ref_offset = ref->offset;
+      poly_int64 size, offset, max_size, ref_offset = ref->offset;
       bool reverse;
-      tree base
-	= get_ref_base_and_extent (lhs, &const_offset, &size, &max_size,
-				   &reverse);
+      tree base = get_ref_base_and_extent (lhs, &offset, &size, &max_size,
+					   &reverse);
       /* We can get MEM[symbol: sZ, index: D.8862_1] here,
 	 so base == ref->base does not always hold.  */
-      poly_int64 offset = const_offset;
       if (base != ref->base)
 	{
 	  /* Try using points-to info.  */
@@ -2490,7 +2487,7 @@ stmt_kills_ref_p (gimple *stmt, ao_ref *
 	}
       /* For a must-alias check we need to be able to constrain
 	 the access properly.  */
-      if (size == max_size
+      if (must_eq (size, max_size)
 	  && known_subrange_p (ref_offset, ref->max_size, offset, size))
 	return true;
     }
Index: gcc/tree-ssa-dce.c
===================================================================
--- gcc/tree-ssa-dce.c	2017-10-23 17:11:40.348142423 +0100
+++ gcc/tree-ssa-dce.c	2017-10-23 17:16:59.706267546 +0100
@@ -477,7 +477,7 @@ mark_aliased_reaching_defs_necessary_1 (
       && !stmt_can_throw_internal (def_stmt))
     {
       tree base, lhs = gimple_get_lhs (def_stmt);
-      HOST_WIDE_INT size, offset, max_size;
+      poly_int64 size, offset, max_size;
       bool reverse;
       ao_ref_base (ref);
       base
@@ -488,7 +488,7 @@ mark_aliased_reaching_defs_necessary_1 (
 	{
 	  /* For a must-alias check we need to be able to constrain
 	     the accesses properly.  */
-	  if (size == max_size
+	  if (must_eq (size, max_size)
 	      && known_subrange_p (ref->offset, ref->max_size, offset, size))
 	    return true;
 	  /* Or they need to be exactly the same.  */
Index: gcc/tree-ssa-scopedtables.c
===================================================================
--- gcc/tree-ssa-scopedtables.c	2017-10-23 17:07:40.909726192 +0100
+++ gcc/tree-ssa-scopedtables.c	2017-10-23 17:16:59.706267546 +0100
@@ -480,13 +480,13 @@ avail_expr_hash (class expr_hash_elt *p)
 	     Dealing with both MEM_REF and ARRAY_REF allows us not to care
 	     about equivalence with other statements not considered here.  */
 	  bool reverse;
-	  HOST_WIDE_INT offset, size, max_size;
+	  poly_int64 offset, size, max_size;
 	  tree base = get_ref_base_and_extent (t, &offset, &size, &max_size,
 					       &reverse);
 	  /* Strictly, we could try to normalize variable-sized accesses too,
 	    but here we just deal with the common case.  */
-	  if (size != -1
-	      && size == max_size)
+	  if (known_size_p (max_size)
+	      && must_eq (size, max_size))
 	    {
 	      enum tree_code code = MEM_REF;
 	      hstate.add_object (code);
@@ -520,26 +520,26 @@ equal_mem_array_ref_p (tree t0, tree t1)
   if (!types_compatible_p (TREE_TYPE (t0), TREE_TYPE (t1)))
     return false;
   bool rev0;
-  HOST_WIDE_INT off0, sz0, max0;
+  poly_int64 off0, sz0, max0;
   tree base0 = get_ref_base_and_extent (t0, &off0, &sz0, &max0, &rev0);
-  if (sz0 == -1
-      || sz0 != max0)
+  if (!known_size_p (max0)
+      || may_ne (sz0, max0))
     return false;
 
   bool rev1;
-  HOST_WIDE_INT off1, sz1, max1;
+  poly_int64 off1, sz1, max1;
   tree base1 = get_ref_base_and_extent (t1, &off1, &sz1, &max1, &rev1);
-  if (sz1 == -1
-      || sz1 != max1)
+  if (!known_size_p (max1)
+      || may_ne (sz1, max1))
     return false;
 
   if (rev0 != rev1)
     return false;
 
   /* Types were compatible, so this is a sanity check.  */
-  gcc_assert (sz0 == sz1);
+  gcc_assert (must_eq (sz0, sz1));
 
-  return (off0 == off1) && operand_equal_p (base0, base1, 0);
+  return must_eq (off0, off1) && operand_equal_p (base0, base1, 0);
 }
 
 /* Compare two hashable_expr structures for equivalence.  They are
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	2017-10-23 17:11:40.349144338 +0100
+++ gcc/tree-ssa-sccvn.c	2017-10-23 17:16:59.706267546 +0100
@@ -1920,7 +1920,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
     {
       tree ref2 = TREE_OPERAND (gimple_call_arg (def_stmt, 0), 0);
       tree base2;
-      HOST_WIDE_INT offset2, size2, maxsize2;
+      poly_int64 offset2, size2, maxsize2;
       bool reverse;
       base2 = get_ref_base_and_extent (ref2, &offset2, &size2, &maxsize2,
 				       &reverse);
@@ -1943,7 +1943,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	   && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (def_stmt)) == 0)
     {
       tree base2;
-      HOST_WIDE_INT offset2, size2, maxsize2;
+      poly_int64 offset2, size2, maxsize2;
       bool reverse;
       base2 = get_ref_base_and_extent (gimple_assign_lhs (def_stmt),
 				       &offset2, &size2, &maxsize2, &reverse);
@@ -1975,13 +1975,12 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 		   && is_gimple_min_invariant (SSA_VAL (gimple_assign_rhs1 (def_stmt))))))
     {
       tree base2;
-      HOST_WIDE_INT offset2, size2, maxsize2;
+      HOST_WIDE_INT offset2, size2;
       bool reverse;
-      base2 = get_ref_base_and_extent (gimple_assign_lhs (def_stmt),
-				       &offset2, &size2, &maxsize2, &reverse);
-      if (!reverse
-	  && maxsize2 != -1
-	  && maxsize2 == size2
+      base2 = get_ref_base_and_extent_hwi (gimple_assign_lhs (def_stmt),
+					   &offset2, &size2, &reverse);
+      if (base2
+	  && !reverse
 	  && size2 % BITS_PER_UNIT == 0
 	  && offset2 % BITS_PER_UNIT == 0
 	  && operand_equal_p (base, base2, 0)
@@ -2039,14 +2038,14 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	   && TREE_CODE (gimple_assign_rhs1 (def_stmt)) == SSA_NAME)
     {
       tree base2;
-      HOST_WIDE_INT offset2, size2, maxsize2;
+      poly_int64 offset2, size2, maxsize2;
       bool reverse;
       base2 = get_ref_base_and_extent (gimple_assign_lhs (def_stmt),
 				       &offset2, &size2, &maxsize2,
 				       &reverse);
       if (!reverse
-	  && maxsize2 != -1
-	  && maxsize2 == size2
+	  && known_size_p (maxsize2)
+	  && must_eq (maxsize2, size2)
 	  && operand_equal_p (base, base2, 0)
 	  && known_subrange_p (offset, maxsize, offset2, size2)
 	  /* ???  We can't handle bitfield precision extracts without
Index: gcc/tree-ssa-structalias.c
===================================================================
--- gcc/tree-ssa-structalias.c	2017-10-23 17:07:40.909726192 +0100
+++ gcc/tree-ssa-structalias.c	2017-10-23 17:16:59.707267411 +0100
@@ -3191,9 +3191,9 @@ get_constraint_for_component_ref (tree t
 				  bool address_p, bool lhs_p)
 {
   tree orig_t = t;
-  HOST_WIDE_INT bitsize = -1;
-  HOST_WIDE_INT bitmaxsize = -1;
-  HOST_WIDE_INT bitpos;
+  poly_int64 bitsize = -1;
+  poly_int64 bitmaxsize = -1;
+  poly_int64 bitpos;
   bool reverse;
   tree forzero;
 
@@ -3255,8 +3255,8 @@ get_constraint_for_component_ref (tree t
 	 ignore this constraint. When we handle pointer subtraction,
 	 we may have to do something cute here.  */
 
-      if ((unsigned HOST_WIDE_INT)bitpos < get_varinfo (result.var)->fullsize
-	  && bitmaxsize != 0)
+      if (may_lt (poly_uint64 (bitpos), get_varinfo (result.var)->fullsize)
+	  && maybe_nonzero (bitmaxsize))
 	{
 	  /* It's also not true that the constraint will actually start at the
 	     right offset, it may start in some padding.  We only care about
@@ -3268,8 +3268,8 @@ get_constraint_for_component_ref (tree t
 	  cexpr.offset = 0;
 	  for (curr = get_varinfo (cexpr.var); curr; curr = vi_next (curr))
 	    {
-	      if (ranges_overlap_p (curr->offset, curr->size,
-				    bitpos, bitmaxsize))
+	      if (ranges_may_overlap_p (poly_int64 (curr->offset), curr->size,
+					bitpos, bitmaxsize))
 		{
 		  cexpr.var = curr->id;
 		  results->safe_push (cexpr);
@@ -3302,7 +3302,7 @@ get_constraint_for_component_ref (tree t
 	      results->safe_push (cexpr);
 	    }
 	}
-      else if (bitmaxsize == 0)
+      else if (known_zero (bitmaxsize))
 	{
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	    fprintf (dump_file, "Access to zero-sized part of variable, "
@@ -3317,13 +3317,15 @@ get_constraint_for_component_ref (tree t
       /* If we do not know exactly where the access goes say so.  Note
 	 that only for non-structure accesses we know that we access
 	 at most one subfiled of any variable.  */
-      if (bitpos == -1
-	  || bitsize != bitmaxsize
+      HOST_WIDE_INT const_bitpos;
+      if (!bitpos.is_constant (&const_bitpos)
+	  || const_bitpos == -1
+	  || may_ne (bitsize, bitmaxsize)
 	  || AGGREGATE_TYPE_P (TREE_TYPE (orig_t))
 	  || result.offset == UNKNOWN_OFFSET)
 	result.offset = UNKNOWN_OFFSET;
       else
-	result.offset += bitpos;
+	result.offset += const_bitpos;
     }
   else if (result.type == ADDRESSOF)
     {
@@ -3660,14 +3662,17 @@ do_structure_copy (tree lhsop, tree rhso
 	   && (rhsp->type == SCALAR
 	       || rhsp->type == ADDRESSOF))
     {
-      HOST_WIDE_INT lhssize, lhsmaxsize, lhsoffset;
-      HOST_WIDE_INT rhssize, rhsmaxsize, rhsoffset;
+      HOST_WIDE_INT lhssize, lhsoffset;
+      HOST_WIDE_INT rhssize, rhsoffset;
       bool reverse;
       unsigned k = 0;
-      get_ref_base_and_extent (lhsop, &lhsoffset, &lhssize, &lhsmaxsize,
-			       &reverse);
-      get_ref_base_and_extent (rhsop, &rhsoffset, &rhssize, &rhsmaxsize,
-			       &reverse);
+      if (!get_ref_base_and_extent_hwi (lhsop, &lhsoffset, &lhssize, &reverse)
+	  || !get_ref_base_and_extent_hwi (rhsop, &rhsoffset, &rhssize,
+					   &reverse))
+	{
+	  process_all_all_constraints (lhsc, rhsc);
+	  return;
+	}
       for (j = 0; lhsc.iterate (j, &lhsp);)
 	{
 	  varinfo_t lhsv, rhsv;
Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c	2017-10-23 17:16:50.377527331 +0100
+++ gcc/var-tracking.c	2017-10-23 17:16:59.708267276 +0100
@@ -5208,20 +5208,20 @@ track_expr_p (tree expr, bool need_rtl)
 	      || (TREE_CODE (realdecl) == MEM_REF
 		  && TREE_CODE (TREE_OPERAND (realdecl, 0)) == ADDR_EXPR))
 	    {
-	      HOST_WIDE_INT bitsize, bitpos, maxsize;
+	      HOST_WIDE_INT bitsize, bitpos;
 	      bool reverse;
 	      tree innerdecl
-		= get_ref_base_and_extent (realdecl, &bitpos, &bitsize,
-					   &maxsize, &reverse);
-	      if (!DECL_P (innerdecl)
+		= get_ref_base_and_extent_hwi (realdecl, &bitpos,
+					       &bitsize, &reverse);
+	      if (!innerdecl
+		  || !DECL_P (innerdecl)
 		  || DECL_IGNORED_P (innerdecl)
 		  /* Do not track declarations for parts of tracked record
 		     parameters since we want to track them as a whole.  */
 		  || tracked_record_parameter_p (innerdecl)
 		  || TREE_STATIC (innerdecl)
-		  || bitsize <= 0
-		  || bitpos + bitsize > 256
-		  || bitsize != maxsize)
+		  || bitsize == 0
+		  || bitpos + bitsize > 256)
 		return 0;
 	      else
 		realdecl = expr;

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [028/nnn] poly_int: ipa_parm_adjustment
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (28 preceding siblings ...)
  2017-10-23 17:12 ` [029/nnn] poly_int: get_ref_base_and_extent Richard Sandiford
@ 2017-10-23 17:12 ` Richard Sandiford
  2017-11-28 17:47   ` Jeff Law
  2017-10-23 17:13 ` [033/nnn] poly_int: pointer_may_wrap_p Richard Sandiford
                   ` (77 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:12 UTC (permalink / raw)
  To: gcc-patches

This patch changes the type of ipa_parm_adjustment::offset from
HOST_WIDE_INT to poly_int64 and updates uses accordingly.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* ipa-prop.h (ipa_parm_adjustment::offset): Change from
	HOST_WIDE_INT to poly_int64_pod.
	* ipa-prop.c (ipa_modify_call_arguments): Track polynomail
	parameter offsets.

Index: gcc/ipa-prop.h
===================================================================
--- gcc/ipa-prop.h	2017-10-23 17:07:40.959671257 +0100
+++ gcc/ipa-prop.h	2017-10-23 17:16:58.508429306 +0100
@@ -828,7 +828,7 @@ struct ipa_parm_adjustment
 
   /* Offset into the original parameter (for the cases when the new parameter
      is a component of an original one).  */
-  HOST_WIDE_INT offset;
+  poly_int64_pod offset;
 
   /* Zero based index of the original parameter this one is based on.  */
   int base_index;
Index: gcc/ipa-prop.c
===================================================================
--- gcc/ipa-prop.c	2017-10-23 17:07:40.959671257 +0100
+++ gcc/ipa-prop.c	2017-10-23 17:16:58.507429441 +0100
@@ -4302,15 +4302,14 @@ ipa_modify_call_arguments (struct cgraph
 	     simply taking the address of a reference inside the original
 	     aggregate.  */
 
-	  gcc_checking_assert (adj->offset % BITS_PER_UNIT == 0);
+	  poly_int64 byte_offset = exact_div (adj->offset, BITS_PER_UNIT);
 	  base = gimple_call_arg (stmt, adj->base_index);
 	  loc = DECL_P (base) ? DECL_SOURCE_LOCATION (base)
 			      : EXPR_LOCATION (base);
 
 	  if (TREE_CODE (base) != ADDR_EXPR
 	      && POINTER_TYPE_P (TREE_TYPE (base)))
-	    off = build_int_cst (adj->alias_ptr_type,
-				 adj->offset / BITS_PER_UNIT);
+	    off = build_int_cst (adj->alias_ptr_type, byte_offset);
 	  else
 	    {
 	      HOST_WIDE_INT base_offset;
@@ -4330,8 +4329,7 @@ ipa_modify_call_arguments (struct cgraph
 	      if (!base)
 		{
 		  base = build_fold_addr_expr (prev_base);
-		  off = build_int_cst (adj->alias_ptr_type,
-				       adj->offset / BITS_PER_UNIT);
+		  off = build_int_cst (adj->alias_ptr_type, byte_offset);
 		}
 	      else if (TREE_CODE (base) == MEM_REF)
 		{
@@ -4341,8 +4339,7 @@ ipa_modify_call_arguments (struct cgraph
 		      deref_align = TYPE_ALIGN (TREE_TYPE (base));
 		    }
 		  off = build_int_cst (adj->alias_ptr_type,
-				       base_offset
-				       + adj->offset / BITS_PER_UNIT);
+				       base_offset + byte_offset);
 		  off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1),
 					 off);
 		  base = TREE_OPERAND (base, 0);
@@ -4350,8 +4347,7 @@ ipa_modify_call_arguments (struct cgraph
 	      else
 		{
 		  off = build_int_cst (adj->alias_ptr_type,
-				       base_offset
-				       + adj->offset / BITS_PER_UNIT);
+				       base_offset + byte_offset);
 		  base = build_fold_addr_expr (base);
 		}
 	    }
@@ -4602,7 +4598,7 @@ ipa_get_adjustment_candidate (tree **exp
       struct ipa_parm_adjustment *adj = &adjustments[i];
 
       if (adj->base == base
-	  && (adj->offset == offset || adj->op == IPA_PARM_OP_REMOVE))
+	  && (must_eq (adj->offset, offset) || adj->op == IPA_PARM_OP_REMOVE))
 	{
 	  cand = adj;
 	  break;
@@ -4766,7 +4762,10 @@ ipa_dump_param_adjustments (FILE *file,
       else if (adj->op == IPA_PARM_OP_REMOVE)
 	fprintf (file, ", remove_param");
       else
-	fprintf (file, ", offset %li", (long) adj->offset);
+	{
+	  fprintf (file, ", offset ");
+	  print_dec (adj->offset, file);
+	}
       if (adj->by_ref)
 	fprintf (file, ", by_ref");
       print_node_brief (file, ", type: ", adj->type, 0);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [030/nnn] poly_int: get_addr_unit_base_and_extent
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (26 preceding siblings ...)
  2017-10-23 17:11 ` [027/nnn] poly_int: DWARF CFA offsets Richard Sandiford
@ 2017-10-23 17:12 ` Richard Sandiford
  2017-12-06  0:26   ` Jeff Law
  2017-10-23 17:12 ` [029/nnn] poly_int: get_ref_base_and_extent Richard Sandiford
                   ` (79 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:12 UTC (permalink / raw)
  To: gcc-patches

This patch changes the values returned by
get_addr_unit_base_and_extent from HOST_WIDE_INT to poly_int64.

maxsize in gimple_fold_builtin_memory_op goes from HOST_WIDE_INT
to poly_uint64 (rather than poly_int) to match the previous use
of tree_fits_uhwi_p.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-dfa.h (get_addr_base_and_unit_offset_1): Return the offset
	as a poly_int64_pod rather than a HOST_WIDE_INT.
	(get_addr_base_and_unit_offset): Likewise.
	* tree-dfa.c (get_addr_base_and_unit_offset_1): Likewise.
	(get_addr_base_and_unit_offset): Likewise.
	* doc/match-and-simplify.texi: Change off from HOST_WIDE_INT
	to poly_int64 in example.
	* fold-const.c (fold_binary_loc): Update call to
	get_addr_base_and_unit_offset.
	* gimple-fold.c (gimple_fold_builtin_memory_op): Likewise.
	(maybe_canonicalize_mem_ref_addr): Likewise.
	(gimple_fold_stmt_to_constant_1): Likewise.
	* ipa-prop.c (ipa_modify_call_arguments): Likewise.
	* match.pd: Likewise.
	* omp-low.c (lower_omp_target): Likewise.
	* tree-sra.c (build_ref_for_offset): Likewise.
	(build_debug_ref_for_model): Likewise.
	* tree-ssa-address.c (maybe_fold_tmr): Likewise.
	* tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Likewise.
	* tree-ssa-ccp.c (optimize_memcpy): Likewise.
	* tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Likewise.
	(constant_pointer_difference): Likewise.
	* tree-ssa-loop-niter.c (expand_simple_operations): Likewise.
	* tree-ssa-phiopt.c (jump_function_from_stmt): Likewise.
	* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
	* tree-ssa-sccvn.c (vn_reference_fold_indirect): Likewise.
	(vn_reference_maybe_forwprop_address, vn_reference_lookup_3): Likewise.
	(set_ssa_val_to): Likewise.
	* tree-ssa-strlen.c (get_addr_stridx, addr_stridxptr): Likewise.
	* tree.c (build_simple_mem_ref_loc): Likewise.

Index: gcc/tree-dfa.h
===================================================================
--- gcc/tree-dfa.h	2017-10-23 17:16:59.705267681 +0100
+++ gcc/tree-dfa.h	2017-10-23 17:17:01.432034493 +0100
@@ -33,9 +33,9 @@ extern tree get_ref_base_and_extent (tre
 				     poly_int64_pod *, bool *);
 extern tree get_ref_base_and_extent_hwi (tree, HOST_WIDE_INT *,
 					 HOST_WIDE_INT *, bool *);
-extern tree get_addr_base_and_unit_offset_1 (tree, HOST_WIDE_INT *,
+extern tree get_addr_base_and_unit_offset_1 (tree, poly_int64_pod *,
 					     tree (*) (tree));
-extern tree get_addr_base_and_unit_offset (tree, HOST_WIDE_INT *);
+extern tree get_addr_base_and_unit_offset (tree, poly_int64_pod *);
 extern bool stmt_references_abnormal_ssa_name (gimple *);
 extern void replace_abnormal_ssa_names (gimple *);
 extern void dump_enumerated_decls (FILE *, dump_flags_t);
Index: gcc/tree-dfa.c
===================================================================
--- gcc/tree-dfa.c	2017-10-23 17:16:59.705267681 +0100
+++ gcc/tree-dfa.c	2017-10-23 17:17:01.432034493 +0100
@@ -705,10 +705,10 @@ get_ref_base_and_extent_hwi (tree exp, H
    its argument or a constant if the argument is known to be constant.  */
 
 tree
-get_addr_base_and_unit_offset_1 (tree exp, HOST_WIDE_INT *poffset,
+get_addr_base_and_unit_offset_1 (tree exp, poly_int64_pod *poffset,
 				 tree (*valueize) (tree))
 {
-  HOST_WIDE_INT byte_offset = 0;
+  poly_int64 byte_offset = 0;
 
   /* Compute cumulative byte-offset for nested component-refs and array-refs,
      and find the ultimate containing object.  */
@@ -718,10 +718,13 @@ get_addr_base_and_unit_offset_1 (tree ex
 	{
 	case BIT_FIELD_REF:
 	  {
-	    HOST_WIDE_INT this_off = TREE_INT_CST_LOW (TREE_OPERAND (exp, 2));
-	    if (this_off % BITS_PER_UNIT)
+	    poly_int64 this_byte_offset;
+	    poly_uint64 this_bit_offset;
+	    if (!poly_int_tree_p (TREE_OPERAND (exp, 2), &this_bit_offset)
+		|| !multiple_p (this_bit_offset, BITS_PER_UNIT,
+				&this_byte_offset))
 	      return NULL_TREE;
-	    byte_offset += this_off / BITS_PER_UNIT;
+	    byte_offset += this_byte_offset;
 	  }
 	  break;
 
@@ -729,15 +732,14 @@ get_addr_base_and_unit_offset_1 (tree ex
 	  {
 	    tree field = TREE_OPERAND (exp, 1);
 	    tree this_offset = component_ref_field_offset (exp);
-	    HOST_WIDE_INT hthis_offset;
+	    poly_int64 hthis_offset;
 
 	    if (!this_offset
-		|| TREE_CODE (this_offset) != INTEGER_CST
+		|| !poly_int_tree_p (this_offset, &hthis_offset)
 		|| (TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field))
 		    % BITS_PER_UNIT))
 	      return NULL_TREE;
 
-	    hthis_offset = TREE_INT_CST_LOW (this_offset);
 	    hthis_offset += (TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field))
 			     / BITS_PER_UNIT);
 	    byte_offset += hthis_offset;
@@ -755,17 +757,18 @@ get_addr_base_and_unit_offset_1 (tree ex
 	      index = (*valueize) (index);
 
 	    /* If the resulting bit-offset is constant, track it.  */
-	    if (TREE_CODE (index) == INTEGER_CST
+	    if (poly_int_tree_p (index)
 		&& (low_bound = array_ref_low_bound (exp),
-		    TREE_CODE (low_bound) == INTEGER_CST)
+		    poly_int_tree_p (low_bound))
 		&& (unit_size = array_ref_element_size (exp),
 		    TREE_CODE (unit_size) == INTEGER_CST))
 	      {
-		offset_int woffset
-		  = wi::sext (wi::to_offset (index) - wi::to_offset (low_bound),
+		poly_offset_int woffset
+		  = wi::sext (wi::to_poly_offset (index)
+			      - wi::to_poly_offset (low_bound),
 			      TYPE_PRECISION (TREE_TYPE (index)));
 		woffset *= wi::to_offset (unit_size);
-		byte_offset += woffset.to_shwi ();
+		byte_offset += woffset.force_shwi ();
 	      }
 	    else
 	      return NULL_TREE;
@@ -842,7 +845,7 @@ get_addr_base_and_unit_offset_1 (tree ex
    is not BITS_PER_UNIT-aligned.  */
 
 tree
-get_addr_base_and_unit_offset (tree exp, HOST_WIDE_INT *poffset)
+get_addr_base_and_unit_offset (tree exp, poly_int64_pod *poffset)
 {
   return get_addr_base_and_unit_offset_1 (exp, poffset, NULL);
 }
Index: gcc/doc/match-and-simplify.texi
===================================================================
--- gcc/doc/match-and-simplify.texi	2017-10-23 17:07:40.843798706 +0100
+++ gcc/doc/match-and-simplify.texi	2017-10-23 17:17:01.428035033 +0100
@@ -205,7 +205,7 @@ Captures can also be used for capturing
   (pointer_plus (addr@@2 @@0) INTEGER_CST_P@@1)
   (if (is_gimple_min_invariant (@@2)))
   @{
-    HOST_WIDE_INT off;
+    poly_int64 off;
     tree base = get_addr_base_and_unit_offset (@@0, &off);
     off += tree_to_uhwi (@@1);
     /* Now with that we should be able to simply write
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 17:11:40.244945208 +0100
+++ gcc/fold-const.c	2017-10-23 17:17:01.429034898 +0100
@@ -9455,7 +9455,7 @@ fold_binary_loc (location_t loc,
 	  && handled_component_p (TREE_OPERAND (arg0, 0)))
 	{
 	  tree base;
-	  HOST_WIDE_INT coffset;
+	  poly_int64 coffset;
 	  base = get_addr_base_and_unit_offset (TREE_OPERAND (arg0, 0),
 						&coffset);
 	  if (!base)
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	2017-10-23 17:16:59.703267951 +0100
+++ gcc/gimple-fold.c	2017-10-23 17:17:01.430034763 +0100
@@ -838,8 +838,8 @@ gimple_fold_builtin_memory_op (gimple_st
 	      && TREE_CODE (dest) == ADDR_EXPR)
 	    {
 	      tree src_base, dest_base, fn;
-	      HOST_WIDE_INT src_offset = 0, dest_offset = 0;
-	      HOST_WIDE_INT maxsize;
+	      poly_int64 src_offset = 0, dest_offset = 0;
+	      poly_uint64 maxsize;
 
 	      srcvar = TREE_OPERAND (src, 0);
 	      src_base = get_addr_base_and_unit_offset (srcvar, &src_offset);
@@ -850,16 +850,14 @@ gimple_fold_builtin_memory_op (gimple_st
 							 &dest_offset);
 	      if (dest_base == NULL)
 		dest_base = destvar;
-	      if (tree_fits_uhwi_p (len))
-		maxsize = tree_to_uhwi (len);
-	      else
+	      if (!poly_int_tree_p (len, &maxsize))
 		maxsize = -1;
 	      if (SSA_VAR_P (src_base)
 		  && SSA_VAR_P (dest_base))
 		{
 		  if (operand_equal_p (src_base, dest_base, 0)
-		      && ranges_overlap_p (src_offset, maxsize,
-					   dest_offset, maxsize))
+		      && ranges_may_overlap_p (src_offset, maxsize,
+					       dest_offset, maxsize))
 		    return false;
 		}
 	      else if (TREE_CODE (src_base) == MEM_REF
@@ -868,17 +866,12 @@ gimple_fold_builtin_memory_op (gimple_st
 		  if (! operand_equal_p (TREE_OPERAND (src_base, 0),
 					 TREE_OPERAND (dest_base, 0), 0))
 		    return false;
-		  offset_int off = mem_ref_offset (src_base) + src_offset;
-		  if (!wi::fits_shwi_p (off))
-		    return false;
-		  src_offset = off.to_shwi ();
-
-		  off = mem_ref_offset (dest_base) + dest_offset;
-		  if (!wi::fits_shwi_p (off))
-		    return false;
-		  dest_offset = off.to_shwi ();
-		  if (ranges_overlap_p (src_offset, maxsize,
-					dest_offset, maxsize))
+		  poly_offset_int full_src_offset
+		    = mem_ref_offset (src_base) + src_offset;
+		  poly_offset_int full_dest_offset
+		    = mem_ref_offset (dest_base) + dest_offset;
+		  if (ranges_may_overlap_p (full_src_offset, maxsize,
+					    full_dest_offset, maxsize))
 		    return false;
 		}
 	      else
@@ -4317,7 +4310,7 @@ maybe_canonicalize_mem_ref_addr (tree *t
 	      || handled_component_p (TREE_OPERAND (addr, 0))))
 	{
 	  tree base;
-	  HOST_WIDE_INT coffset;
+	  poly_int64 coffset;
 	  base = get_addr_base_and_unit_offset (TREE_OPERAND (addr, 0),
 						&coffset);
 	  if (!base)
@@ -5903,7 +5896,7 @@ gimple_fold_stmt_to_constant_1 (gimple *
 	      else if (TREE_CODE (rhs) == ADDR_EXPR
 		       && !is_gimple_min_invariant (rhs))
 		{
-		  HOST_WIDE_INT offset = 0;
+		  poly_int64 offset = 0;
 		  tree base;
 		  base = get_addr_base_and_unit_offset_1 (TREE_OPERAND (rhs, 0),
 							  &offset,
Index: gcc/ipa-prop.c
===================================================================
--- gcc/ipa-prop.c	2017-10-23 17:16:59.704267816 +0100
+++ gcc/ipa-prop.c	2017-10-23 17:17:01.431034628 +0100
@@ -4297,7 +4297,7 @@ ipa_modify_call_arguments (struct cgraph
 	    off = build_int_cst (adj->alias_ptr_type, byte_offset);
 	  else
 	    {
-	      HOST_WIDE_INT base_offset;
+	      poly_int64 base_offset;
 	      tree prev_base;
 	      bool addrof;
 
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2017-10-23 17:11:39.914313353 +0100
+++ gcc/match.pd	2017-10-23 17:17:01.431034628 +0100
@@ -3345,7 +3345,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (cmp (convert1?@2 addr@0) (convert2? addr@1))
   (with
    {
-     HOST_WIDE_INT off0, off1;
+     poly_int64 off0, off1;
      tree base0 = get_addr_base_and_unit_offset (TREE_OPERAND (@0, 0), &off0);
      tree base1 = get_addr_base_and_unit_offset (TREE_OPERAND (@1, 0), &off1);
      if (base0 && TREE_CODE (base0) == MEM_REF)
@@ -3384,23 +3384,23 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
      }
      (if (equal == 1)
       (switch
-       (if (cmp == EQ_EXPR)
-	{ constant_boolean_node (off0 == off1, type); })
-       (if (cmp == NE_EXPR)
-	{ constant_boolean_node (off0 != off1, type); })
-       (if (cmp == LT_EXPR)
-	{ constant_boolean_node (off0 < off1, type); })
-       (if (cmp == LE_EXPR)
-	{ constant_boolean_node (off0 <= off1, type); })
-       (if (cmp == GE_EXPR)
-	{ constant_boolean_node (off0 >= off1, type); })
-       (if (cmp == GT_EXPR)
-	{ constant_boolean_node (off0 > off1, type); }))
+       (if (cmp == EQ_EXPR && (must_eq (off0, off1) || must_ne (off0, off1)))
+	{ constant_boolean_node (must_eq (off0, off1), type); })
+       (if (cmp == NE_EXPR && (must_eq (off0, off1) || must_ne (off0, off1)))
+	{ constant_boolean_node (must_ne (off0, off1), type); })
+       (if (cmp == LT_EXPR && (must_lt (off0, off1) || must_ge (off0, off1)))
+	{ constant_boolean_node (must_lt (off0, off1), type); })
+       (if (cmp == LE_EXPR && (must_le (off0, off1) || must_gt (off0, off1)))
+	{ constant_boolean_node (must_le (off0, off1), type); })
+       (if (cmp == GE_EXPR && (must_ge (off0, off1) || must_lt (off0, off1)))
+	{ constant_boolean_node (must_ge (off0, off1), type); })
+       (if (cmp == GT_EXPR && (must_gt (off0, off1) || must_le (off0, off1)))
+	{ constant_boolean_node (must_gt (off0, off1), type); }))
       (if (equal == 0
 	   && DECL_P (base0) && DECL_P (base1)
 	   /* If we compare this as integers require equal offset.  */
 	   && (!INTEGRAL_TYPE_P (TREE_TYPE (@2))
-	       || off0 == off1))
+	       || must_eq (off0, off1)))
        (switch
 	(if (cmp == EQ_EXPR)
 	 { constant_boolean_node (false, type); })
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	2017-10-23 17:11:39.972424406 +0100
+++ gcc/omp-low.c	2017-10-23 17:17:01.432034493 +0100
@@ -8397,7 +8397,7 @@ lower_omp_target (gimple_stmt_iterator *
 		|| OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_FIRSTPRIVATE_REFERENCE)
 	      {
 		location_t clause_loc = OMP_CLAUSE_LOCATION (c);
-		HOST_WIDE_INT offset = 0;
+		poly_int64 offset = 0;
 		gcc_assert (prev);
 		var = OMP_CLAUSE_DECL (c);
 		if (DECL_P (var)
Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c	2017-10-23 17:16:59.705267681 +0100
+++ gcc/tree-sra.c	2017-10-23 17:17:01.433034358 +0100
@@ -1678,7 +1678,7 @@ build_ref_for_offset (location_t loc, tr
   tree prev_base = base;
   tree off;
   tree mem_ref;
-  HOST_WIDE_INT base_offset;
+  poly_int64 base_offset;
   unsigned HOST_WIDE_INT misalign;
   unsigned int align;
 
@@ -1786,7 +1786,7 @@ build_ref_for_model (location_t loc, tre
 build_debug_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset,
 			   struct access *model)
 {
-  HOST_WIDE_INT base_offset;
+  poly_int64 base_offset;
   tree off;
 
   if (TREE_CODE (model->expr) == COMPONENT_REF
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	2017-10-23 17:11:40.248952867 +0100
+++ gcc/tree-ssa-address.c	2017-10-23 17:17:01.433034358 +0100
@@ -1061,7 +1061,7 @@ maybe_fold_tmr (tree ref)
   else if (addr.symbol
 	   && handled_component_p (TREE_OPERAND (addr.symbol, 0)))
     {
-      HOST_WIDE_INT offset;
+      poly_int64 offset;
       addr.symbol = build_fold_addr_expr
 		      (get_addr_base_and_unit_offset
 		         (TREE_OPERAND (addr.symbol, 0), &offset));
Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c	2017-10-23 17:16:59.705267681 +0100
+++ gcc/tree-ssa-alias.c	2017-10-23 17:17:01.433034358 +0100
@@ -683,8 +683,7 @@ ao_ref_alias_set (ao_ref *ref)
 void
 ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree size)
 {
-  HOST_WIDE_INT t;
-  poly_int64 size_hwi, extra_offset = 0;
+  poly_int64 t, size_hwi, extra_offset = 0;
   ref->ref = NULL_TREE;
   if (TREE_CODE (ptr) == SSA_NAME)
     {
Index: gcc/tree-ssa-ccp.c
===================================================================
--- gcc/tree-ssa-ccp.c	2017-10-23 17:07:40.843798706 +0100
+++ gcc/tree-ssa-ccp.c	2017-10-23 17:17:01.433034358 +0100
@@ -3003,7 +3003,7 @@ optimize_memcpy (gimple_stmt_iterator *g
 
   gimple *defstmt = SSA_NAME_DEF_STMT (vuse);
   tree src2 = NULL_TREE, len2 = NULL_TREE;
-  HOST_WIDE_INT offset, offset2;
+  poly_int64 offset, offset2;
   tree val = integer_zero_node;
   if (gimple_store_p (defstmt)
       && gimple_assign_single_p (defstmt)
@@ -3035,16 +3035,16 @@ optimize_memcpy (gimple_stmt_iterator *g
 	    ? DECL_SIZE_UNIT (TREE_OPERAND (src2, 1))
 	    : TYPE_SIZE_UNIT (TREE_TYPE (src2)));
   if (len == NULL_TREE
-      || TREE_CODE (len) != INTEGER_CST
+      || !poly_int_tree_p (len)
       || len2 == NULL_TREE
-      || TREE_CODE (len2) != INTEGER_CST)
+      || !poly_int_tree_p (len2))
     return;
 
   src = get_addr_base_and_unit_offset (src, &offset);
   src2 = get_addr_base_and_unit_offset (src2, &offset2);
   if (src == NULL_TREE
       || src2 == NULL_TREE
-      || offset < offset2)
+      || may_lt (offset, offset2))
     return;
 
   if (!operand_equal_p (src, src2, 0))
@@ -3053,7 +3053,8 @@ optimize_memcpy (gimple_stmt_iterator *g
   /* [ src + offset2, src + offset2 + len2 - 1 ] is set to val.
      Make sure that
      [ src + offset, src + offset + len - 1 ] is a subset of that.  */
-  if (wi::to_offset (len) + (offset - offset2) > wi::to_offset (len2))
+  if (may_gt (wi::to_poly_offset (len) + (offset - offset2),
+	      wi::to_poly_offset (len2)))
     return;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2017-10-23 17:07:40.843798706 +0100
+++ gcc/tree-ssa-forwprop.c	2017-10-23 17:17:01.434034223 +0100
@@ -758,12 +758,12 @@ forward_propagate_addr_expr_1 (tree name
       && TREE_OPERAND (lhs, 0) == name)
     {
       tree def_rhs_base;
-      HOST_WIDE_INT def_rhs_offset;
+      poly_int64 def_rhs_offset;
       /* If the address is invariant we can always fold it.  */
       if ((def_rhs_base = get_addr_base_and_unit_offset (TREE_OPERAND (def_rhs, 0),
 							 &def_rhs_offset)))
 	{
-	  offset_int off = mem_ref_offset (lhs);
+	  poly_offset_int off = mem_ref_offset (lhs);
 	  tree new_ptr;
 	  off += def_rhs_offset;
 	  if (TREE_CODE (def_rhs_base) == MEM_REF)
@@ -850,11 +850,11 @@ forward_propagate_addr_expr_1 (tree name
       && TREE_OPERAND (rhs, 0) == name)
     {
       tree def_rhs_base;
-      HOST_WIDE_INT def_rhs_offset;
+      poly_int64 def_rhs_offset;
       if ((def_rhs_base = get_addr_base_and_unit_offset (TREE_OPERAND (def_rhs, 0),
 							 &def_rhs_offset)))
 	{
-	  offset_int off = mem_ref_offset (rhs);
+	  poly_offset_int off = mem_ref_offset (rhs);
 	  tree new_ptr;
 	  off += def_rhs_offset;
 	  if (TREE_CODE (def_rhs_base) == MEM_REF)
@@ -1169,12 +1169,12 @@ #define CPD_ITERATIONS 5
 	  if (TREE_CODE (p) == ADDR_EXPR)
 	    {
 	      tree q = TREE_OPERAND (p, 0);
-	      HOST_WIDE_INT offset;
+	      poly_int64 offset;
 	      tree base = get_addr_base_and_unit_offset (q, &offset);
 	      if (base)
 		{
 		  q = base;
-		  if (offset)
+		  if (maybe_nonzero (offset))
 		    off = size_binop (PLUS_EXPR, off, size_int (offset));
 		}
 	      if (TREE_CODE (q) == MEM_REF
Index: gcc/tree-ssa-loop-niter.c
===================================================================
--- gcc/tree-ssa-loop-niter.c	2017-10-23 17:07:40.843798706 +0100
+++ gcc/tree-ssa-loop-niter.c	2017-10-23 17:17:01.434034223 +0100
@@ -1987,7 +1987,7 @@ expand_simple_operations (tree expr, tre
 	return expand_simple_operations (e, stop);
       else if (code == ADDR_EXPR)
 	{
-	  HOST_WIDE_INT offset;
+	  poly_int64 offset;
 	  tree base = get_addr_base_and_unit_offset (TREE_OPERAND (e, 0),
 						     &offset);
 	  if (base
Index: gcc/tree-ssa-phiopt.c
===================================================================
--- gcc/tree-ssa-phiopt.c	2017-10-23 17:07:40.843798706 +0100
+++ gcc/tree-ssa-phiopt.c	2017-10-23 17:17:01.434034223 +0100
@@ -692,12 +692,12 @@ jump_function_from_stmt (tree *arg, gimp
     {
       /* For arg = &p->i transform it to p, if possible.  */
       tree rhs1 = gimple_assign_rhs1 (stmt);
-      HOST_WIDE_INT offset;
+      poly_int64 offset;
       tree tem = get_addr_base_and_unit_offset (TREE_OPERAND (rhs1, 0),
 						&offset);
       if (tem
 	  && TREE_CODE (tem) == MEM_REF
-	  && (mem_ref_offset (tem) + offset) == 0)
+	  && known_zero (mem_ref_offset (tem) + offset))
 	{
 	  *arg = TREE_OPERAND (tem, 0);
 	  return true;
Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c	2017-10-23 17:11:39.943368879 +0100
+++ gcc/tree-ssa-pre.c	2017-10-23 17:17:01.435034088 +0100
@@ -2504,7 +2504,7 @@ create_component_ref_by_pieces_1 (basic_
 	if (TREE_CODE (baseop) == ADDR_EXPR
 	    && handled_component_p (TREE_OPERAND (baseop, 0)))
 	  {
-	    HOST_WIDE_INT off;
+	    poly_int64 off;
 	    tree base;
 	    base = get_addr_base_and_unit_offset (TREE_OPERAND (baseop, 0),
 						  &off);
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	2017-10-23 17:16:59.706267546 +0100
+++ gcc/tree-ssa-sccvn.c	2017-10-23 17:17:01.435034088 +0100
@@ -1154,7 +1154,7 @@ vn_reference_fold_indirect (vec<vn_refer
   vn_reference_op_t op = &(*ops)[i];
   vn_reference_op_t mem_op = &(*ops)[i - 1];
   tree addr_base;
-  HOST_WIDE_INT addr_offset = 0;
+  poly_int64 addr_offset = 0;
 
   /* The only thing we have to do is from &OBJ.foo.bar add the offset
      from .foo.bar to the preceding MEM_REF offset and replace the
@@ -1164,8 +1164,10 @@ vn_reference_fold_indirect (vec<vn_refer
   gcc_checking_assert (addr_base && TREE_CODE (addr_base) != MEM_REF);
   if (addr_base != TREE_OPERAND (op->op0, 0))
     {
-      offset_int off = offset_int::from (wi::to_wide (mem_op->op0), SIGNED);
-      off += addr_offset;
+      poly_offset_int off
+	= (poly_offset_int::from (wi::to_poly_wide (mem_op->op0),
+				  SIGNED)
+	   + addr_offset);
       mem_op->op0 = wide_int_to_tree (TREE_TYPE (mem_op->op0), off);
       op->op0 = build_fold_addr_expr (addr_base);
       if (tree_fits_shwi_p (mem_op->op0))
@@ -1188,7 +1190,7 @@ vn_reference_maybe_forwprop_address (vec
   vn_reference_op_t mem_op = &(*ops)[i - 1];
   gimple *def_stmt;
   enum tree_code code;
-  offset_int off;
+  poly_offset_int off;
 
   def_stmt = SSA_NAME_DEF_STMT (op->op0);
   if (!is_gimple_assign (def_stmt))
@@ -1199,7 +1201,7 @@ vn_reference_maybe_forwprop_address (vec
       && code != POINTER_PLUS_EXPR)
     return false;
 
-  off = offset_int::from (wi::to_wide (mem_op->op0), SIGNED);
+  off = poly_offset_int::from (wi::to_poly_wide (mem_op->op0), SIGNED);
 
   /* The only thing we have to do is from &OBJ.foo.bar add the offset
      from .foo.bar to the preceding MEM_REF offset and replace the
@@ -1207,7 +1209,7 @@ vn_reference_maybe_forwprop_address (vec
   if (code == ADDR_EXPR)
     {
       tree addr, addr_base;
-      HOST_WIDE_INT addr_offset;
+      poly_int64 addr_offset;
 
       addr = gimple_assign_rhs1 (def_stmt);
       addr_base = get_addr_base_and_unit_offset (TREE_OPERAND (addr, 0),
@@ -1217,7 +1219,7 @@ vn_reference_maybe_forwprop_address (vec
 	 dereference isn't offsetted.  */
       if (!addr_base
 	  && *i_p == ops->length () - 1
-	  && off == 0
+	  && known_zero (off)
 	  /* This makes us disable this transform for PRE where the
 	     reference ops might be also used for code insertion which
 	     is invalid.  */
@@ -1234,7 +1236,7 @@ vn_reference_maybe_forwprop_address (vec
 	      vn_reference_op_t new_mem_op = &tem[tem.length () - 2];
 	      new_mem_op->op0
 		= wide_int_to_tree (TREE_TYPE (mem_op->op0),
-				    wi::to_wide (new_mem_op->op0));
+				    wi::to_poly_wide (new_mem_op->op0));
 	    }
 	  else
 	    gcc_assert (tem.last ().opcode == STRING_CST);
@@ -2242,10 +2244,8 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	}
       if (TREE_CODE (lhs) == ADDR_EXPR)
 	{
-	  HOST_WIDE_INT tmp_lhs_offset;
 	  tree tem = get_addr_base_and_unit_offset (TREE_OPERAND (lhs, 0),
-						    &tmp_lhs_offset);
-	  lhs_offset = tmp_lhs_offset;
+						    &lhs_offset);
 	  if (!tem)
 	    return (void *)-1;
 	  if (TREE_CODE (tem) == MEM_REF
@@ -2272,10 +2272,8 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	rhs = SSA_VAL (rhs);
       if (TREE_CODE (rhs) == ADDR_EXPR)
 	{
-	  HOST_WIDE_INT tmp_rhs_offset;
 	  tree tem = get_addr_base_and_unit_offset (TREE_OPERAND (rhs, 0),
-						    &tmp_rhs_offset);
-	  rhs_offset = tmp_rhs_offset;
+						    &rhs_offset);
 	  if (!tem)
 	    return (void *)-1;
 	  if (TREE_CODE (tem) == MEM_REF
@@ -3282,7 +3280,7 @@ dominated_by_p_w_unex (basic_block bb1,
 set_ssa_val_to (tree from, tree to)
 {
   tree currval = SSA_VAL (from);
-  HOST_WIDE_INT toff, coff;
+  poly_int64 toff, coff;
 
   /* The only thing we allow as value numbers are ssa_names
      and invariants.  So assert that here.  We don't allow VN_TOP
@@ -3364,7 +3362,7 @@ set_ssa_val_to (tree from, tree to)
 	   && TREE_CODE (to) == ADDR_EXPR
 	   && (get_addr_base_and_unit_offset (TREE_OPERAND (currval, 0), &coff)
 	       == get_addr_base_and_unit_offset (TREE_OPERAND (to, 0), &toff))
-	   && coff == toff))
+	   && must_eq (coff, toff)))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, " (changed)\n");
Index: gcc/tree-ssa-strlen.c
===================================================================
--- gcc/tree-ssa-strlen.c	2017-10-23 17:07:40.843798706 +0100
+++ gcc/tree-ssa-strlen.c	2017-10-23 17:17:01.436033953 +0100
@@ -227,8 +227,9 @@ get_addr_stridx (tree exp, tree ptr, uns
   if (!decl_to_stridxlist_htab)
     return 0;
 
-  base = get_addr_base_and_unit_offset (exp, &off);
-  if (base == NULL || !DECL_P (base))
+  poly_int64 poff;
+  base = get_addr_base_and_unit_offset (exp, &poff);
+  if (base == NULL || !DECL_P (base) || !poff.is_constant (&off))
     return 0;
 
   list = decl_to_stridxlist_htab->get (base);
@@ -368,8 +369,9 @@ addr_stridxptr (tree exp)
 {
   HOST_WIDE_INT off;
 
-  tree base = get_addr_base_and_unit_offset (exp, &off);
-  if (base == NULL_TREE || !DECL_P (base))
+  poly_int64 poff;
+  tree base = get_addr_base_and_unit_offset (exp, &poff);
+  if (base == NULL_TREE || !DECL_P (base) || !poff.is_constant (&off))
     return NULL;
 
   if (!decl_to_stridxlist_htab)
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 17:11:40.252960525 +0100
+++ gcc/tree.c	2017-10-23 17:17:01.436033953 +0100
@@ -4903,7 +4903,7 @@ build5 (enum tree_code code, tree tt, tr
 tree
 build_simple_mem_ref_loc (location_t loc, tree ptr)
 {
-  HOST_WIDE_INT offset = 0;
+  poly_int64 offset = 0;
   tree ptype = TREE_TYPE (ptr);
   tree tem;
   /* For convenience allow addresses that collapse to a simple base

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [032/nnn] poly_int: symbolic_number
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (31 preceding siblings ...)
  2017-10-23 17:13 ` [031/nnn] poly_int: aff_tree Richard Sandiford
@ 2017-10-23 17:13 ` Richard Sandiford
  2017-11-28 17:45   ` Jeff Law
  2017-10-23 17:14 ` [035/nnn] poly_int: expand_debug_expr Richard Sandiford
                   ` (74 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:13 UTC (permalink / raw)
  To: gcc-patches

This patch changes symbol_number::bytepos from a HOST_WIDE_INT
to a poly_int64.  perform_symbolic_merge can cope with symbolic
offsets as long as the difference between the two offsets is
constant.  (This could happen for a constant-sized field that
occurs at a variable offset, for example.)


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-ssa-math-opts.c (symbolic_number::bytepos): Change from
	HOST_WIDE_INT to poly_int64.
	(perform_symbolic_merge): Update accordingly.

Index: gcc/tree-ssa-math-opts.c
===================================================================
--- gcc/tree-ssa-math-opts.c	2017-10-23 17:11:39.997472274 +0100
+++ gcc/tree-ssa-math-opts.c	2017-10-23 17:17:04.541614564 +0100
@@ -1967,7 +1967,7 @@ struct symbolic_number {
   tree type;
   tree base_addr;
   tree offset;
-  HOST_WIDE_INT bytepos;
+  poly_int64 bytepos;
   tree src;
   tree alias_set;
   tree vuse;
@@ -2198,7 +2198,7 @@ perform_symbolic_merge (gimple *source_s
   if (rhs1 != rhs2)
     {
       uint64_t inc;
-      HOST_WIDE_INT start_sub, end_sub, end1, end2, end;
+      HOST_WIDE_INT start1, start2, start_sub, end_sub, end1, end2, end;
       struct symbolic_number *toinc_n_ptr, *n_end;
       basic_block bb1, bb2;
 
@@ -2210,15 +2210,19 @@ perform_symbolic_merge (gimple *source_s
 	  || (n1->offset && !operand_equal_p (n1->offset, n2->offset, 0)))
 	return NULL;
 
-      if (n1->bytepos < n2->bytepos)
+      start1 = 0;
+      if (!(n2->bytepos - n1->bytepos).is_constant (&start2))
+	return NULL;
+
+      if (start1 < start2)
 	{
 	  n_start = n1;
-	  start_sub = n2->bytepos - n1->bytepos;
+	  start_sub = start2 - start1;
 	}
       else
 	{
 	  n_start = n2;
-	  start_sub = n1->bytepos - n2->bytepos;
+	  start_sub = start1 - start2;
 	}
 
       bb1 = gimple_bb (source_stmt1);
@@ -2230,8 +2234,8 @@ perform_symbolic_merge (gimple *source_s
 
       /* Find the highest address at which a load is performed and
 	 compute related info.  */
-      end1 = n1->bytepos + (n1->range - 1);
-      end2 = n2->bytepos + (n2->range - 1);
+      end1 = start1 + (n1->range - 1);
+      end2 = start2 + (n2->range - 1);
       if (end1 < end2)
 	{
 	  end = end2;
@@ -2250,7 +2254,7 @@ perform_symbolic_merge (gimple *source_s
       else
 	toinc_n_ptr = (n_start == n1) ? n2 : n1;
 
-      n->range = end - n_start->bytepos + 1;
+      n->range = end - MIN (start1, start2) + 1;
 
       /* Check that the range of memory covered can be represented by
 	 a symbolic number.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [031/nnn] poly_int: aff_tree
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (30 preceding siblings ...)
  2017-10-23 17:13 ` [033/nnn] poly_int: pointer_may_wrap_p Richard Sandiford
@ 2017-10-23 17:13 ` Richard Sandiford
  2017-12-06  0:04   ` Jeff Law
  2017-10-23 17:13 ` [032/nnn] poly_int: symbolic_number Richard Sandiford
                   ` (75 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:13 UTC (permalink / raw)
  To: gcc-patches

This patch changes the type of aff_tree::offset from widest_int to
poly_widest_int and adjusts the function interfaces in the same way.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-affine.h (aff_tree::offset): Change from widest_int
	to poly_widest_int.
	(wide_int_ext_for_comb): Delete.
	(aff_combination_const, aff_comb_cannot_overlap_p): Take the
	constants as poly_widest_int rather than widest_int.
	(aff_combination_constant_multiple_p): Return the multiplier
	as a poly_widest_int.
	(aff_combination_zero_p, aff_combination_singleton_var_p): Handle
	polynomial offsets.
	* tree-affine.c (wide_int_ext_for_comb): Make original widest_int
	version static and add an overload for poly_widest_int.
	(aff_combination_const, aff_combination_add_cst)
	(wide_int_constant_multiple_p, aff_comb_cannot_overlap_p): Take
	the constants as poly_widest_int rather than widest_int.
	(tree_to_aff_combination): Generalize INTEGER_CST case to
	poly_int_tree_p.
	(aff_combination_to_tree): Track offsets as poly_widest_ints.
	(aff_combination_add_product, aff_combination_mult): Handle
	polynomial offsets.
	(aff_combination_constant_multiple_p): Return the multiplier
	as a poly_widest_int.
	* tree-predcom.c (determine_offset): Return the offset as a
	poly_widest_int.
	(split_data_refs_to_components, suitable_component_p): Update
	accordingly.
	(valid_initializer_p): Update call to
	aff_combination_constant_multiple_p.
	* tree-ssa-address.c (addr_to_parts): Handle polynomial offsets.
	* tree-ssa-loop-ivopts.c (get_address_cost_ainc): Take the step
	as a poly_int64 rather than a HOST_WIDE_INT.
	(get_address_cost): Handle polynomial offsets.
	(iv_elimination_compare_lt): Likewise.
	(rewrite_use_nonlinear_expr): Likewise.

Index: gcc/tree-affine.h
===================================================================
--- gcc/tree-affine.h	2017-10-23 17:07:40.771877812 +0100
+++ gcc/tree-affine.h	2017-10-23 17:17:03.206794823 +0100
@@ -43,7 +43,7 @@ struct aff_tree
   tree type;
 
   /* Constant offset.  */
-  widest_int offset;
+  poly_widest_int offset;
 
   /* Number of elements of the combination.  */
   unsigned n;
@@ -64,8 +64,7 @@ struct aff_tree
 
 struct name_expansion;
 
-widest_int wide_int_ext_for_comb (const widest_int &, aff_tree *);
-void aff_combination_const (aff_tree *, tree, const widest_int &);
+void aff_combination_const (aff_tree *, tree, const poly_widest_int &);
 void aff_combination_elt (aff_tree *, tree, tree);
 void aff_combination_scale (aff_tree *, const widest_int &);
 void aff_combination_mult (aff_tree *, aff_tree *, aff_tree *);
@@ -76,14 +75,15 @@ void aff_combination_convert (aff_tree *
 void tree_to_aff_combination (tree, tree, aff_tree *);
 tree aff_combination_to_tree (aff_tree *);
 void unshare_aff_combination (aff_tree *);
-bool aff_combination_constant_multiple_p (aff_tree *, aff_tree *, widest_int *);
+bool aff_combination_constant_multiple_p (aff_tree *, aff_tree *,
+					  poly_widest_int *);
 void aff_combination_expand (aff_tree *, hash_map<tree, name_expansion *> **);
 void tree_to_aff_combination_expand (tree, tree, aff_tree *,
 				     hash_map<tree, name_expansion *> **);
 tree get_inner_reference_aff (tree, aff_tree *, widest_int *);
 void free_affine_expand_cache (hash_map<tree, name_expansion *> **);
-bool aff_comb_cannot_overlap_p (aff_tree *, const widest_int &,
-				const widest_int &);
+bool aff_comb_cannot_overlap_p (aff_tree *, const poly_widest_int &,
+				const poly_widest_int &);
 
 /* Debugging functions.  */
 void debug_aff (aff_tree *);
@@ -102,7 +102,7 @@ aff_combination_zero_p (aff_tree *aff)
   if (!aff)
     return true;
 
-  if (aff->n == 0 && aff->offset == 0)
+  if (aff->n == 0 && known_zero (aff->offset))
     return true;
 
   return false;
@@ -121,7 +121,7 @@ aff_combination_const_p (aff_tree *aff)
 aff_combination_singleton_var_p (aff_tree *aff)
 {
   return (aff->n == 1
-	  && aff->offset == 0
+	  && known_zero (aff->offset)
 	  && (aff->elts[0].coef == 1 || aff->elts[0].coef == -1));
 }
 #endif /* GCC_TREE_AFFINE_H */
Index: gcc/tree-affine.c
===================================================================
--- gcc/tree-affine.c	2017-10-23 17:07:40.771877812 +0100
+++ gcc/tree-affine.c	2017-10-23 17:17:03.206794823 +0100
@@ -34,12 +34,20 @@ Free Software Foundation; either version
 
 /* Extends CST as appropriate for the affine combinations COMB.  */
 
-widest_int
+static widest_int
 wide_int_ext_for_comb (const widest_int &cst, tree type)
 {
   return wi::sext (cst, TYPE_PRECISION (type));
 }
 
+/* Likewise for polynomial offsets.  */
+
+static poly_widest_int
+wide_int_ext_for_comb (const poly_widest_int &cst, tree type)
+{
+  return wi::sext (cst, TYPE_PRECISION (type));
+}
+
 /* Initializes affine combination COMB so that its value is zero in TYPE.  */
 
 static void
@@ -57,7 +65,7 @@ aff_combination_zero (aff_tree *comb, tr
 /* Sets COMB to CST.  */
 
 void
-aff_combination_const (aff_tree *comb, tree type, const widest_int &cst)
+aff_combination_const (aff_tree *comb, tree type, const poly_widest_int &cst)
 {
   aff_combination_zero (comb, type);
   comb->offset = wide_int_ext_for_comb (cst, comb->type);;
@@ -190,7 +198,7 @@ aff_combination_add_elt (aff_tree *comb,
 /* Adds CST to C.  */
 
 static void
-aff_combination_add_cst (aff_tree *c, const widest_int &cst)
+aff_combination_add_cst (aff_tree *c, const poly_widest_int &cst)
 {
   c->offset = wide_int_ext_for_comb (c->offset + cst, c->type);
 }
@@ -268,10 +276,6 @@ tree_to_aff_combination (tree expr, tree
   code = TREE_CODE (expr);
   switch (code)
     {
-    case INTEGER_CST:
-      aff_combination_const (comb, type, wi::to_widest (expr));
-      return;
-
     case POINTER_PLUS_EXPR:
       tree_to_aff_combination (TREE_OPERAND (expr, 0), type, comb);
       tree_to_aff_combination (TREE_OPERAND (expr, 1), sizetype, &tmp);
@@ -423,7 +427,14 @@ tree_to_aff_combination (tree expr, tree
       break;
 
     default:
-      break;
+      {
+	if (poly_int_tree_p (expr))
+	  {
+	    aff_combination_const (comb, type, wi::to_poly_widest (expr));
+	    return;
+	  }
+	break;
+      }
     }
 
   aff_combination_elt (comb, type, expr);
@@ -478,7 +489,8 @@ aff_combination_to_tree (aff_tree *comb)
 {
   tree type = comb->type, base = NULL_TREE, expr = NULL_TREE;
   unsigned i;
-  widest_int off, sgn;
+  poly_widest_int off;
+  int sgn;
 
   gcc_assert (comb->n == MAX_AFF_ELTS || comb->rest == NULL_TREE);
 
@@ -502,7 +514,7 @@ aff_combination_to_tree (aff_tree *comb)
 
   /* Ensure that we get x - 1, not x + (-1) or x + 0xff..f if x is
      unsigned.  */
-  if (wi::neg_p (comb->offset))
+  if (must_lt (comb->offset, 0))
     {
       off = -comb->offset;
       sgn = -1;
@@ -588,7 +600,19 @@ aff_combination_add_product (aff_tree *c
     }
 
   if (val)
-    aff_combination_add_elt (r, val, coef * c->offset);
+    {
+      if (c->offset.is_constant ())
+	/* Access coeffs[0] directly, for efficiency.  */
+	aff_combination_add_elt (r, val, coef * c->offset.coeffs[0]);
+      else
+	{
+	  /* c->offset is polynomial, so multiply VAL rather than COEF
+	     by it.  */
+	  tree offset = wide_int_to_tree (TREE_TYPE (val), c->offset);
+	  val = fold_build2 (MULT_EXPR, TREE_TYPE (val), val, offset);
+	  aff_combination_add_elt (r, val, coef);
+	}
+    }
   else
     aff_combination_add_cst (r, coef * c->offset);
 }
@@ -607,7 +631,15 @@ aff_combination_mult (aff_tree *c1, aff_
     aff_combination_add_product (c1, c2->elts[i].coef, c2->elts[i].val, r);
   if (c2->rest)
     aff_combination_add_product (c1, 1, c2->rest, r);
-  aff_combination_add_product (c1, c2->offset, NULL, r);
+  if (c2->offset.is_constant ())
+    /* Access coeffs[0] directly, for efficiency.  */
+    aff_combination_add_product (c1, c2->offset.coeffs[0], NULL, r);
+  else
+    {
+      /* c2->offset is polynomial, so do the multiplication in tree form.  */
+      tree offset = wide_int_to_tree (c2->type, c2->offset);
+      aff_combination_add_product (c1, 1, offset, r);
+    }
 }
 
 /* Returns the element of COMB whose value is VAL, or NULL if no such
@@ -776,27 +808,28 @@ free_affine_expand_cache (hash_map<tree,
    is set to true.  */
 
 static bool
-wide_int_constant_multiple_p (const widest_int &val, const widest_int &div,
-			      bool *mult_set, widest_int *mult)
+wide_int_constant_multiple_p (const poly_widest_int &val,
+			      const poly_widest_int &div,
+			      bool *mult_set, poly_widest_int *mult)
 {
-  widest_int rem, cst;
+  poly_widest_int rem, cst;
 
-  if (val == 0)
+  if (known_zero (val))
     {
-      if (*mult_set && *mult != 0)
+      if (*mult_set && maybe_nonzero (*mult))
 	return false;
       *mult_set = true;
       *mult = 0;
       return true;
     }
 
-  if (div == 0)
+  if (maybe_zero (div))
     return false;
 
-  if (!wi::multiple_of_p (val, div, SIGNED, &cst))
+  if (!multiple_p (val, div, &cst))
     return false;
 
-  if (*mult_set && *mult != cst)
+  if (*mult_set && may_ne (*mult, cst))
     return false;
 
   *mult_set = true;
@@ -809,12 +842,12 @@ wide_int_constant_multiple_p (const wide
 
 bool
 aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div,
-				     widest_int *mult)
+				     poly_widest_int *mult)
 {
   bool mult_set = false;
   unsigned i;
 
-  if (val->n == 0 && val->offset == 0)
+  if (val->n == 0 && known_zero (val->offset))
     {
       *mult = 0;
       return true;
@@ -927,23 +960,26 @@ get_inner_reference_aff (tree ref, aff_t
    size SIZE2 at position DIFF cannot overlap.  */
 
 bool
-aff_comb_cannot_overlap_p (aff_tree *diff, const widest_int &size1,
-			   const widest_int &size2)
+aff_comb_cannot_overlap_p (aff_tree *diff, const poly_widest_int &size1,
+			   const poly_widest_int &size2)
 {
   /* Unless the difference is a constant, we fail.  */
   if (diff->n != 0)
     return false;
 
-  if (wi::neg_p (diff->offset))
+  if (!ordered_p (diff->offset, 0))
+    return false;
+
+  if (may_lt (diff->offset, 0))
     {
       /* The second object is before the first one, we succeed if the last
 	 element of the second object is before the start of the first one.  */
-      return wi::neg_p (diff->offset + size2 - 1);
+      return must_le (diff->offset + size2, 0);
     }
   else
     {
       /* We succeed if the second object starts after the first one ends.  */
-      return size1 <= diff->offset;
+      return must_le (size1, diff->offset);
     }
 }
 
Index: gcc/tree-predcom.c
===================================================================
--- gcc/tree-predcom.c	2017-10-23 17:07:40.771877812 +0100
+++ gcc/tree-predcom.c	2017-10-23 17:17:03.207794688 +0100
@@ -688,7 +688,7 @@ aff_combination_dr_offset (struct data_r
 
 static bool
 determine_offset (struct data_reference *a, struct data_reference *b,
-		  widest_int *off)
+		  poly_widest_int *off)
 {
   aff_tree diff, baseb, step;
   tree typea, typeb;
@@ -797,7 +797,7 @@ split_data_refs_to_components (struct lo
 
   FOR_EACH_VEC_ELT (depends, i, ddr)
     {
-      widest_int dummy_off;
+      poly_widest_int dummy_off;
 
       if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
 	continue;
@@ -956,7 +956,11 @@ suitable_component_p (struct loop *loop,
 
   for (i = 1; comp->refs.iterate (i, &a); i++)
     {
-      if (!determine_offset (first->ref, a->ref, &a->offset))
+      /* Polynomial offsets are no use, since we need to know the
+	 gap between iteration numbers at compile time.  */
+      poly_widest_int offset;
+      if (!determine_offset (first->ref, a->ref, &offset)
+	  || !offset.is_constant (&a->offset))
 	return false;
 
       enum ref_step_type a_step;
@@ -1158,7 +1162,7 @@ valid_initializer_p (struct data_referen
 		     unsigned distance, struct data_reference *root)
 {
   aff_tree diff, base, step;
-  widest_int off;
+  poly_widest_int off;
 
   /* Both REF and ROOT must be accessing the same object.  */
   if (!operand_equal_p (DR_BASE_ADDRESS (ref), DR_BASE_ADDRESS (root), 0))
@@ -1186,7 +1190,7 @@ valid_initializer_p (struct data_referen
   if (!aff_combination_constant_multiple_p (&diff, &step, &off))
     return false;
 
-  if (off != distance)
+  if (may_ne (off, distance))
     return false;
 
   return true;
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	2017-10-23 17:17:01.433034358 +0100
+++ gcc/tree-ssa-address.c	2017-10-23 17:17:03.207794688 +0100
@@ -693,7 +693,7 @@ addr_to_parts (tree type, aff_tree *addr
   parts->index = NULL_TREE;
   parts->step = NULL_TREE;
 
-  if (addr->offset != 0)
+  if (maybe_nonzero (addr->offset))
     parts->offset = wide_int_to_tree (sizetype, addr->offset);
   else
     parts->offset = NULL_TREE;
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	2017-10-23 17:11:40.249954781 +0100
+++ gcc/tree-ssa-loop-ivopts.c	2017-10-23 17:17:03.208794553 +0100
@@ -4232,7 +4232,7 @@ struct ainc_cost_data
 };
 
 static comp_cost
-get_address_cost_ainc (HOST_WIDE_INT ainc_step, HOST_WIDE_INT ainc_offset,
+get_address_cost_ainc (poly_int64 ainc_step, poly_int64 ainc_offset,
 		       machine_mode addr_mode, machine_mode mem_mode,
 		       addr_space_t as, bool speed)
 {
@@ -4306,13 +4306,13 @@ get_address_cost_ainc (HOST_WIDE_INT ain
     }
 
   HOST_WIDE_INT msize = GET_MODE_SIZE (mem_mode);
-  if (ainc_offset == 0 && msize == ainc_step)
+  if (known_zero (ainc_offset) && must_eq (msize, ainc_step))
     return comp_cost (data->costs[AINC_POST_INC], 0);
-  if (ainc_offset == 0 && msize == -ainc_step)
+  if (known_zero (ainc_offset) && must_eq (msize, -ainc_step))
     return comp_cost (data->costs[AINC_POST_DEC], 0);
-  if (ainc_offset == msize && msize == ainc_step)
+  if (must_eq (ainc_offset, msize) && must_eq (msize, ainc_step))
     return comp_cost (data->costs[AINC_PRE_INC], 0);
-  if (ainc_offset == -msize && msize == -ainc_step)
+  if (must_eq (ainc_offset, -msize) && must_eq (msize, -ainc_step))
     return comp_cost (data->costs[AINC_PRE_DEC], 0);
 
   return infinite_cost;
@@ -4355,7 +4355,7 @@ get_address_cost (struct ivopts_data *da
 	  if (ratio != 1 && !valid_mem_ref_p (mem_mode, as, &parts))
 	    parts.step = NULL_TREE;
 
-	  if (aff_inv->offset != 0)
+	  if (maybe_nonzero (aff_inv->offset))
 	    {
 	      parts.offset = wide_int_to_tree (sizetype, aff_inv->offset);
 	      /* Addressing mode "base + index [<< scale] + offset".  */
@@ -4388,10 +4388,12 @@ get_address_cost (struct ivopts_data *da
     }
   else
     {
-      if (can_autoinc && ratio == 1 && cst_and_fits_in_hwi (cand->iv->step))
+      poly_int64 ainc_step;
+      if (can_autoinc
+	  && ratio == 1
+	  && ptrdiff_tree_p (cand->iv->step, &ainc_step))
 	{
-	  HOST_WIDE_INT ainc_step = int_cst_value (cand->iv->step);
-	  HOST_WIDE_INT ainc_offset = (aff_inv->offset).to_shwi ();
+	  poly_int64 ainc_offset = (aff_inv->offset).force_shwi ();
 
 	  if (stmt_after_increment (data->current_loop, cand, use->stmt))
 	    ainc_offset += ainc_step;
@@ -4949,7 +4951,7 @@ iv_elimination_compare_lt (struct ivopts
   aff_combination_scale (&tmpa, -1);
   aff_combination_add (&tmpb, &tmpa);
   aff_combination_add (&tmpb, &nit);
-  if (tmpb.n != 0 || tmpb.offset != 1)
+  if (tmpb.n != 0 || may_ne (tmpb.offset, 1))
     return false;
 
   /* Finally, check that CAND->IV->BASE - CAND->IV->STEP * A does not
@@ -6846,7 +6848,7 @@ rewrite_use_nonlinear_expr (struct ivopt
   unshare_aff_combination (&aff_var);
   /* Prefer CSE opportunity than loop invariant by adding offset at last
      so that iv_uses have different offsets can be CSEed.  */
-  widest_int offset = aff_inv.offset;
+  poly_widest_int offset = aff_inv.offset;
   aff_inv.offset = 0;
 
   gimple_seq stmt_list = NULL, seq = NULL;

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [033/nnn] poly_int: pointer_may_wrap_p
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (29 preceding siblings ...)
  2017-10-23 17:12 ` [028/nnn] poly_int: ipa_parm_adjustment Richard Sandiford
@ 2017-10-23 17:13 ` Richard Sandiford
  2017-11-28 17:44   ` Jeff Law
  2017-10-23 17:13 ` [031/nnn] poly_int: aff_tree Richard Sandiford
                   ` (76 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:13 UTC (permalink / raw)
  To: gcc-patches

This patch changes the bitpos argument to pointer_may_wrap_p from
HOST_WIDE_INT to poly_int64.  A later patch makes the callers track
polynomial offsets.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* fold-const.c (pointer_may_wrap_p): Take the offset as a
	HOST_WIDE_INT rather than a poly_int64.

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 17:17:01.429034898 +0100
+++ gcc/fold-const.c	2017-10-23 17:17:05.755450644 +0100
@@ -8421,48 +8421,50 @@ maybe_canonicalize_comparison (location_
    expressions like &p->x which can not wrap.  */
 
 static bool
-pointer_may_wrap_p (tree base, tree offset, HOST_WIDE_INT bitpos)
+pointer_may_wrap_p (tree base, tree offset, poly_int64 bitpos)
 {
   if (!POINTER_TYPE_P (TREE_TYPE (base)))
     return true;
 
-  if (bitpos < 0)
+  if (may_lt (bitpos, 0))
     return true;
 
-  wide_int wi_offset;
+  poly_wide_int wi_offset;
   int precision = TYPE_PRECISION (TREE_TYPE (base));
   if (offset == NULL_TREE)
     wi_offset = wi::zero (precision);
-  else if (TREE_CODE (offset) != INTEGER_CST || TREE_OVERFLOW (offset))
+  else if (!poly_int_tree_p (offset) || TREE_OVERFLOW (offset))
     return true;
   else
-    wi_offset = wi::to_wide (offset);
+    wi_offset = wi::to_poly_wide (offset);
 
   bool overflow;
-  wide_int units = wi::shwi (bitpos / BITS_PER_UNIT, precision);
-  wide_int total = wi::add (wi_offset, units, UNSIGNED, &overflow);
+  poly_wide_int units = wi::shwi (bits_to_bytes_round_down (bitpos),
+				  precision);
+  poly_wide_int total = wi::add (wi_offset, units, UNSIGNED, &overflow);
   if (overflow)
     return true;
 
-  if (!wi::fits_uhwi_p (total))
+  poly_uint64 total_hwi, size;
+  if (!total.to_uhwi (&total_hwi)
+      || !poly_int_tree_p (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (base))),
+			   &size)
+      || known_zero (size))
     return true;
 
-  HOST_WIDE_INT size = int_size_in_bytes (TREE_TYPE (TREE_TYPE (base)));
-  if (size <= 0)
-    return true;
+  if (must_le (total_hwi, size))
+    return false;
 
   /* We can do slightly better for SIZE if we have an ADDR_EXPR of an
      array.  */
-  if (TREE_CODE (base) == ADDR_EXPR)
-    {
-      HOST_WIDE_INT base_size;
-
-      base_size = int_size_in_bytes (TREE_TYPE (TREE_OPERAND (base, 0)));
-      if (base_size > 0 && size < base_size)
-	size = base_size;
-    }
+  if (TREE_CODE (base) == ADDR_EXPR
+      && poly_int_tree_p (TYPE_SIZE_UNIT (TREE_TYPE (TREE_OPERAND (base, 0))),
+			  &size)
+      && maybe_nonzero (size)
+      && must_le (total_hwi, size))
+    return false;
 
-  return total.to_uhwi () > (unsigned HOST_WIDE_INT) size;
+  return true;
 }
 
 /* Return a positive integer when the symbol DECL is known to have

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [036/nnn] poly_int: get_object_alignment_2
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (34 preceding siblings ...)
  2017-10-23 17:14 ` [034/nnn] poly_int: get_inner_reference_aff Richard Sandiford
@ 2017-10-23 17:14 ` Richard Sandiford
  2017-11-28 17:37   ` Jeff Law
  2017-10-23 17:16 ` [037/nnn] poly_int: get_bit_range Richard Sandiford
                   ` (71 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:14 UTC (permalink / raw)
  To: gcc-patches

This patch makes get_object_alignment_2 track polynomial offsets
and sizes.  The real work is done by get_inner_reference, but we
then need to handle the alignment correctly.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* builtins.c (get_object_alignment_2): Track polynomial offsets
	and sizes.  Update the alignment handling.

Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	2017-10-23 17:11:39.984447382 +0100
+++ gcc/builtins.c	2017-10-23 17:18:42.394520412 +0100
@@ -252,7 +252,7 @@ called_as_built_in (tree node)
 get_object_alignment_2 (tree exp, unsigned int *alignp,
 			unsigned HOST_WIDE_INT *bitposp, bool addr_p)
 {
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, bitpos;
   tree offset;
   machine_mode mode;
   int unsignedp, reversep, volatilep;
@@ -377,8 +377,17 @@ get_object_alignment_2 (tree exp, unsign
 	}
     }
 
+  /* Account for the alignment of runtime coefficients, so that the constant
+     bitpos is guaranteed to be accurate.  */
+  unsigned int alt_align = ::known_alignment (bitpos - bitpos.coeffs[0]);
+  if (alt_align != 0 && alt_align < align)
+    {
+      align = alt_align;
+      known_alignment = false;
+    }
+
   *alignp = align;
-  *bitposp = bitpos & (*alignp - 1);
+  *bitposp = bitpos.coeffs[0] & (align - 1);
   return known_alignment;
 }
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [035/nnn] poly_int: expand_debug_expr
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (32 preceding siblings ...)
  2017-10-23 17:13 ` [032/nnn] poly_int: symbolic_number Richard Sandiford
@ 2017-10-23 17:14 ` Richard Sandiford
  2017-12-05 17:08   ` Jeff Law
  2017-10-23 17:14 ` [034/nnn] poly_int: get_inner_reference_aff Richard Sandiford
                   ` (73 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:14 UTC (permalink / raw)
  To: gcc-patches

This patch makes expand_debug_expr track polynomial memory offsets.
It simplifies the handling of the case in which the reference is not
to the first byte of the base, which seemed non-trivial enough to
make it worth splitting out as a separate patch.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree.h (get_inner_reference): Add a version that returns the
	offset and size as poly_int64_pods rather than HOST_WIDE_INTs.
	* cfgexpand.c (expand_debug_expr): Track polynomial offsets.  Simply
	the case in which bitpos is not associated with the first byte.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 17:11:40.253962440 +0100
+++ gcc/tree.h	2017-10-23 17:18:40.711668346 +0100
@@ -5610,6 +5610,17 @@ extern bool complete_ctor_at_level_p (co
    the access position and size.  */
 extern tree get_inner_reference (tree, HOST_WIDE_INT *, HOST_WIDE_INT *,
 				 tree *, machine_mode *, int *, int *, int *);
+/* Temporary.  */
+inline tree
+get_inner_reference (tree exp, poly_int64_pod *pbitsize,
+		     poly_int64_pod *pbitpos, tree *poffset,
+		     machine_mode *pmode, int *punsignedp,
+		     int *preversep, int *pvolatilep)
+{
+  return get_inner_reference (exp, &pbitsize->coeffs[0], &pbitpos->coeffs[0],
+			      poffset, pmode, punsignedp, preversep,
+			      pvolatilep);
+}
 
 extern tree build_personality_function (const char *);
 
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-10-23 17:16:59.700268356 +0100
+++ gcc/cfgexpand.c	2017-10-23 17:18:40.711668346 +0100
@@ -4450,7 +4450,7 @@ expand_debug_expr (tree exp)
     case VIEW_CONVERT_EXPR:
       {
 	machine_mode mode1;
-	HOST_WIDE_INT bitsize, bitpos;
+	poly_int64 bitsize, bitpos;
 	tree offset;
 	int reversep, volatilep = 0;
 	tree tem
@@ -4458,7 +4458,7 @@ expand_debug_expr (tree exp)
 				 &unsignedp, &reversep, &volatilep);
 	rtx orig_op0;
 
-	if (bitsize == 0)
+	if (known_zero (bitsize))
 	  return NULL;
 
 	orig_op0 = op0 = expand_debug_expr (tem);
@@ -4501,19 +4501,14 @@ expand_debug_expr (tree exp)
 	    if (mode1 == VOIDmode)
 	      /* Bitfield.  */
 	      mode1 = smallest_int_mode_for_size (bitsize);
-	    if (bitpos >= BITS_PER_UNIT)
+	    poly_int64 bytepos = bits_to_bytes_round_down (bitpos);
+	    if (maybe_nonzero (bytepos))
 	      {
-		op0 = adjust_address_nv (op0, mode1, bitpos / BITS_PER_UNIT);
-		bitpos %= BITS_PER_UNIT;
+		op0 = adjust_address_nv (op0, mode1, bytepos);
+		bitpos = num_trailing_bits (bitpos);
 	      }
-	    else if (bitpos < 0)
-	      {
-		HOST_WIDE_INT units
-		  = (-bitpos + BITS_PER_UNIT - 1) / BITS_PER_UNIT;
-		op0 = adjust_address_nv (op0, mode1, -units);
-		bitpos += units * BITS_PER_UNIT;
-	      }
-	    else if (bitpos == 0 && bitsize == GET_MODE_BITSIZE (mode))
+	    else if (known_zero (bitpos)
+		     && must_eq (bitsize, GET_MODE_BITSIZE (mode)))
 	      op0 = adjust_address_nv (op0, mode, 0);
 	    else if (GET_MODE (op0) != mode1)
 	      op0 = adjust_address_nv (op0, mode1, 0);
@@ -4524,17 +4519,18 @@ expand_debug_expr (tree exp)
 	    set_mem_attributes (op0, exp, 0);
 	  }
 
-	if (bitpos == 0 && mode == GET_MODE (op0))
+	if (known_zero (bitpos) && mode == GET_MODE (op0))
 	  return op0;
 
-        if (bitpos < 0)
+	if (may_lt (bitpos, 0))
           return NULL;
 
 	if (GET_MODE (op0) == BLKmode)
 	  return NULL;
 
-	if ((bitpos % BITS_PER_UNIT) == 0
-	    && bitsize == GET_MODE_BITSIZE (mode1))
+	poly_int64 bytepos;
+	if (multiple_p (bitpos, BITS_PER_UNIT, &bytepos)
+	    && must_eq (bitsize, GET_MODE_BITSIZE (mode1)))
 	  {
 	    machine_mode opmode = GET_MODE (op0);
 
@@ -4547,12 +4543,11 @@ expand_debug_expr (tree exp)
 	       debug stmts).  The gen_subreg below would rightfully
 	       crash, and the address doesn't really exist, so just
 	       drop it.  */
-	    if (bitpos >= GET_MODE_BITSIZE (opmode))
+	    if (must_ge (bitpos, GET_MODE_BITSIZE (opmode)))
 	      return NULL;
 
-	    if ((bitpos % GET_MODE_BITSIZE (mode)) == 0)
-	      return simplify_gen_subreg (mode, op0, opmode,
-					  bitpos / BITS_PER_UNIT);
+	    if (multiple_p (bitpos, GET_MODE_BITSIZE (mode)))
+	      return simplify_gen_subreg (mode, op0, opmode, bytepos);
 	  }
 
 	return simplify_gen_ternary (SCALAR_INT_MODE_P (GET_MODE (op0))
@@ -4562,7 +4557,8 @@ expand_debug_expr (tree exp)
 				     GET_MODE (op0) != VOIDmode
 				     ? GET_MODE (op0)
 				     : TYPE_MODE (TREE_TYPE (tem)),
-				     op0, GEN_INT (bitsize), GEN_INT (bitpos));
+				     op0, gen_int_mode (bitsize, word_mode),
+				     gen_int_mode (bitpos, word_mode));
       }
 
     case ABS_EXPR:

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [034/nnn] poly_int: get_inner_reference_aff
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (33 preceding siblings ...)
  2017-10-23 17:14 ` [035/nnn] poly_int: expand_debug_expr Richard Sandiford
@ 2017-10-23 17:14 ` Richard Sandiford
  2017-11-28 17:56   ` Jeff Law
  2017-10-23 17:14 ` [036/nnn] poly_int: get_object_alignment_2 Richard Sandiford
                   ` (72 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:14 UTC (permalink / raw)
  To: gcc-patches

This patch makes get_inner_reference_aff return the size as a
poly_widest_int rather than a widest_int.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-affine.h (get_inner_reference_aff): Return the size as a
	poly_widest_int.
	* tree-affine.c (get_inner_reference_aff): Likewise.
	* tree-data-ref.c (dr_may_alias_p): Update accordingly.
	* tree-ssa-loop-im.c (mem_refs_may_alias_p): Likewise.

Index: gcc/tree-affine.h
===================================================================
--- gcc/tree-affine.h	2017-10-23 17:17:16.129993616 +0100
+++ gcc/tree-affine.h	2017-10-23 17:18:30.290584430 +0100
@@ -80,7 +80,7 @@ bool aff_combination_constant_multiple_p
 void aff_combination_expand (aff_tree *, hash_map<tree, name_expansion *> **);
 void tree_to_aff_combination_expand (tree, tree, aff_tree *,
 				     hash_map<tree, name_expansion *> **);
-tree get_inner_reference_aff (tree, aff_tree *, widest_int *);
+tree get_inner_reference_aff (tree, aff_tree *, poly_widest_int *);
 void free_affine_expand_cache (hash_map<tree, name_expansion *> **);
 bool aff_comb_cannot_overlap_p (aff_tree *, const poly_widest_int &,
 				const poly_widest_int &);
Index: gcc/tree-affine.c
===================================================================
--- gcc/tree-affine.c	2017-10-23 17:17:16.129993616 +0100
+++ gcc/tree-affine.c	2017-10-23 17:18:30.290584430 +0100
@@ -927,7 +927,7 @@ debug_aff (aff_tree *val)
    which REF refers.  */
 
 tree
-get_inner_reference_aff (tree ref, aff_tree *addr, widest_int *size)
+get_inner_reference_aff (tree ref, aff_tree *addr, poly_widest_int *size)
 {
   HOST_WIDE_INT bitsize, bitpos;
   tree toff;
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2017-10-23 17:17:16.129993616 +0100
+++ gcc/tree-data-ref.c	2017-10-23 17:18:30.290584430 +0100
@@ -2134,7 +2134,7 @@ dr_may_alias_p (const struct data_refere
   if (!loop_nest)
     {
       aff_tree off1, off2;
-      widest_int size1, size2;
+      poly_widest_int size1, size2;
       get_inner_reference_aff (DR_REF (a), &off1, &size1);
       get_inner_reference_aff (DR_REF (b), &off2, &size2);
       aff_combination_scale (&off1, -1);
Index: gcc/tree-ssa-loop-im.c
===================================================================
--- gcc/tree-ssa-loop-im.c	2017-10-23 17:17:16.129993616 +0100
+++ gcc/tree-ssa-loop-im.c	2017-10-23 17:18:30.291584342 +0100
@@ -1581,7 +1581,7 @@ mem_refs_may_alias_p (im_mem_ref *mem1,
   /* Perform BASE + OFFSET analysis -- if MEM1 and MEM2 are based on the same
      object and their offset differ in such a way that the locations cannot
      overlap, then they cannot alias.  */
-  widest_int size1, size2;
+  poly_widest_int size1, size2;
   aff_tree off1, off2;
 
   /* Perform basic offset and type-based disambiguation.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [037/nnn] poly_int: get_bit_range
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (35 preceding siblings ...)
  2017-10-23 17:14 ` [036/nnn] poly_int: get_object_alignment_2 Richard Sandiford
@ 2017-10-23 17:16 ` Richard Sandiford
  2017-12-05 23:19   ` Jeff Law
  2017-10-23 17:17 ` [038/nnn] poly_int: fold_comparison Richard Sandiford
                   ` (70 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:16 UTC (permalink / raw)
  To: gcc-patches

This patch makes get_bit_range return the range and position as poly_ints.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expr.h (get_bit_range): Return the bitstart and bitend as
	poly_uint64s rather than unsigned HOST_WIDE_INTs.  Return the bitpos
	as a poly_int64 rather than a HOST_WIDE_INT.
	* expr.c (get_bit_range): Likewise.
	(expand_assignment): Update call accordingly.
	* fold-const.c (optimize_bit_field_compare): Likewise.

Index: gcc/expr.h
===================================================================
--- gcc/expr.h	2017-10-23 17:07:40.476203026 +0100
+++ gcc/expr.h	2017-10-23 17:18:43.842393134 +0100
@@ -240,8 +240,8 @@ extern bool emit_push_insn (rtx, machine
 			    int, rtx, int, rtx, rtx, int, rtx, bool);
 
 /* Extract the accessible bit-range from a COMPONENT_REF.  */
-extern void get_bit_range (unsigned HOST_WIDE_INT *, unsigned HOST_WIDE_INT *,
-			   tree, HOST_WIDE_INT *, tree *);
+extern void get_bit_range (poly_uint64_pod *, poly_uint64_pod *, tree,
+			   poly_int64_pod *, tree *);
 
 /* Expand an assignment that stores the value of FROM into TO.  */
 extern void expand_assignment (tree, tree, bool);
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:16:50.364529087 +0100
+++ gcc/expr.c	2017-10-23 17:18:43.842393134 +0100
@@ -4804,13 +4804,10 @@ optimize_bitfield_assignment_op (poly_ui
    *BITSTART and *BITEND.  */
 
 void
-get_bit_range (unsigned HOST_WIDE_INT *bitstart,
-	       unsigned HOST_WIDE_INT *bitend,
-	       tree exp,
-	       HOST_WIDE_INT *bitpos,
-	       tree *offset)
+get_bit_range (poly_uint64_pod *bitstart, poly_uint64_pod *bitend, tree exp,
+	       poly_int64_pod *bitpos, tree *offset)
 {
-  HOST_WIDE_INT bitoffset;
+  poly_int64 bitoffset;
   tree field, repr;
 
   gcc_assert (TREE_CODE (exp) == COMPONENT_REF);
@@ -4831,13 +4828,13 @@ get_bit_range (unsigned HOST_WIDE_INT *b
   if (handled_component_p (TREE_OPERAND (exp, 0)))
     {
       machine_mode rmode;
-      HOST_WIDE_INT rbitsize, rbitpos;
+      poly_int64 rbitsize, rbitpos;
       tree roffset;
       int unsignedp, reversep, volatilep = 0;
       get_inner_reference (TREE_OPERAND (exp, 0), &rbitsize, &rbitpos,
 			   &roffset, &rmode, &unsignedp, &reversep,
 			   &volatilep);
-      if ((rbitpos % BITS_PER_UNIT) != 0)
+      if (!multiple_p (rbitpos, BITS_PER_UNIT))
 	{
 	  *bitstart = *bitend = 0;
 	  return;
@@ -4848,10 +4845,10 @@ get_bit_range (unsigned HOST_WIDE_INT *b
      relative to the representative.  DECL_FIELD_OFFSET of field and
      repr are the same by construction if they are not constants,
      see finish_bitfield_layout.  */
-  if (tree_fits_uhwi_p (DECL_FIELD_OFFSET (field))
-      && tree_fits_uhwi_p (DECL_FIELD_OFFSET (repr)))
-    bitoffset = (tree_to_uhwi (DECL_FIELD_OFFSET (field))
-		 - tree_to_uhwi (DECL_FIELD_OFFSET (repr))) * BITS_PER_UNIT;
+  poly_uint64 field_offset, repr_offset;
+  if (poly_int_tree_p (DECL_FIELD_OFFSET (field), &field_offset)
+      && poly_int_tree_p (DECL_FIELD_OFFSET (repr), &repr_offset))
+    bitoffset = (field_offset - repr_offset) * BITS_PER_UNIT;
   else
     bitoffset = 0;
   bitoffset += (tree_to_uhwi (DECL_FIELD_BIT_OFFSET (field))
@@ -4860,17 +4857,16 @@ get_bit_range (unsigned HOST_WIDE_INT *b
   /* If the adjustment is larger than bitpos, we would have a negative bit
      position for the lower bound and this may wreak havoc later.  Adjust
      offset and bitpos to make the lower bound non-negative in that case.  */
-  if (bitoffset > *bitpos)
+  if (may_gt (bitoffset, *bitpos))
     {
-      HOST_WIDE_INT adjust = bitoffset - *bitpos;
-      gcc_assert ((adjust % BITS_PER_UNIT) == 0);
+      poly_int64 adjust_bits = upper_bound (bitoffset, *bitpos) - *bitpos;
+      poly_int64 adjust_bytes = exact_div (adjust_bits, BITS_PER_UNIT);
 
-      *bitpos += adjust;
+      *bitpos += adjust_bits;
       if (*offset == NULL_TREE)
-	*offset = size_int (-adjust / BITS_PER_UNIT);
+	*offset = size_int (-adjust_bytes);
       else
-	*offset
-	  = size_binop (MINUS_EXPR, *offset, size_int (adjust / BITS_PER_UNIT));
+	*offset = size_binop (MINUS_EXPR, *offset, size_int (adjust_bytes));
       *bitstart = 0;
     }
   else
@@ -4983,9 +4979,9 @@ expand_assignment (tree to, tree from, b
       || TREE_CODE (TREE_TYPE (to)) == ARRAY_TYPE)
     {
       machine_mode mode1;
-      HOST_WIDE_INT bitsize, bitpos;
-      unsigned HOST_WIDE_INT bitregion_start = 0;
-      unsigned HOST_WIDE_INT bitregion_end = 0;
+      poly_int64 bitsize, bitpos;
+      poly_uint64 bitregion_start = 0;
+      poly_uint64 bitregion_end = 0;
       tree offset;
       int unsignedp, reversep, volatilep = 0;
       tree tem;
@@ -4995,11 +4991,11 @@ expand_assignment (tree to, tree from, b
 				 &unsignedp, &reversep, &volatilep);
 
       /* Make sure bitpos is not negative, it can wreak havoc later.  */
-      if (bitpos < 0)
+      if (may_lt (bitpos, 0))
 	{
 	  gcc_assert (offset == NULL_TREE);
-	  offset = size_int (bitpos >> LOG2_BITS_PER_UNIT);
-	  bitpos &= BITS_PER_UNIT - 1;
+	  offset = size_int (bits_to_bytes_round_down (bitpos));
+	  bitpos = num_trailing_bits (bitpos);
 	}
 
       if (TREE_CODE (to) == COMPONENT_REF
@@ -5009,9 +5005,9 @@ expand_assignment (tree to, tree from, b
 	 However, if we do not have a DECL_BIT_FIELD_TYPE but BITPOS or
 	 BITSIZE are not byte-aligned, there is no need to limit the range
 	 we can access.  This can occur with packed structures in Ada.  */
-      else if (bitsize > 0
-	       && bitsize % BITS_PER_UNIT == 0
-	       && bitpos % BITS_PER_UNIT == 0)
+      else if (may_gt (bitsize, 0)
+	       && multiple_p (bitsize, BITS_PER_UNIT)
+	       && multiple_p (bitpos, BITS_PER_UNIT))
 	{
 	  bitregion_start = bitpos;
 	  bitregion_end = bitpos + bitsize - 1;
@@ -5073,16 +5069,18 @@ expand_assignment (tree to, tree from, b
 
 	     This is only done for aligned data values, as these can
 	     be expected to result in single move instructions.  */
+	  poly_int64 bytepos;
 	  if (mode1 != VOIDmode
-	      && bitpos != 0
-	      && bitsize > 0
-	      && (bitpos % bitsize) == 0
-	      && (bitsize % GET_MODE_ALIGNMENT (mode1)) == 0
+	      && maybe_nonzero (bitpos)
+	      && may_gt (bitsize, 0)
+	      && multiple_p (bitpos, BITS_PER_UNIT, &bytepos)
+	      && multiple_p (bitpos, bitsize)
+	      && multiple_p (bitsize, GET_MODE_ALIGNMENT (mode1))
 	      && MEM_ALIGN (to_rtx) >= GET_MODE_ALIGNMENT (mode1))
 	    {
-	      to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT);
+	      to_rtx = adjust_address (to_rtx, mode1, bytepos);
 	      bitregion_start = 0;
-	      if (bitregion_end >= (unsigned HOST_WIDE_INT) bitpos)
+	      if (must_ge (bitregion_end, poly_uint64 (bitpos)))
 		bitregion_end -= bitpos;
 	      bitpos = 0;
 	    }
@@ -5097,8 +5095,7 @@ expand_assignment (tree to, tree from, b
 	 code contains an out-of-bounds access to a small array.  */
       if (!MEM_P (to_rtx)
 	  && GET_MODE (to_rtx) != BLKmode
-	  && (unsigned HOST_WIDE_INT) bitpos
-	     >= GET_MODE_PRECISION (GET_MODE (to_rtx)))
+	  && must_ge (bitpos, GET_MODE_PRECISION (GET_MODE (to_rtx))))
 	{
 	  expand_normal (from);
 	  result = NULL;
@@ -5108,25 +5105,26 @@ expand_assignment (tree to, tree from, b
 	{
 	  unsigned short mode_bitsize = GET_MODE_BITSIZE (GET_MODE (to_rtx));
 	  if (COMPLEX_MODE_P (TYPE_MODE (TREE_TYPE (from)))
-	      && bitpos == 0
-	      && bitsize == mode_bitsize)
+	      && known_zero (bitpos)
+	      && must_eq (bitsize, mode_bitsize))
 	    result = store_expr (from, to_rtx, false, nontemporal, reversep);
-	  else if (bitsize == mode_bitsize / 2
-		   && (bitpos == 0 || bitpos == mode_bitsize / 2))
-	    result = store_expr (from, XEXP (to_rtx, bitpos != 0), false,
-				 nontemporal, reversep);
-	  else if (bitpos + bitsize <= mode_bitsize / 2)
+	  else if (must_eq (bitsize, mode_bitsize / 2)
+		   && (known_zero (bitpos)
+		       || must_eq (bitpos, mode_bitsize / 2)))
+	    result = store_expr (from, XEXP (to_rtx, maybe_nonzero (bitpos)),
+				 false, nontemporal, reversep);
+	  else if (must_le (bitpos + bitsize, mode_bitsize / 2))
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
 				  bitregion_start, bitregion_end,
 				  mode1, from, get_alias_set (to),
 				  nontemporal, reversep);
-	  else if (bitpos >= mode_bitsize / 2)
+	  else if (must_ge (bitpos, mode_bitsize / 2))
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
 				  bitpos - mode_bitsize / 2,
 				  bitregion_start, bitregion_end,
 				  mode1, from, get_alias_set (to),
 				  nontemporal, reversep);
-	  else if (bitpos == 0 && bitsize == mode_bitsize)
+	  else if (known_zero (bitpos) && must_eq (bitsize, mode_bitsize))
 	    {
 	      rtx from_rtx;
 	      result = expand_normal (from);
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 17:17:05.755450644 +0100
+++ gcc/fold-const.c	2017-10-23 17:18:43.843393046 +0100
@@ -4168,12 +4168,13 @@ optimize_bit_field_compare (location_t l
    }
 
   /* Honor the C++ memory model and mimic what RTL expansion does.  */
-  unsigned HOST_WIDE_INT bitstart = 0;
-  unsigned HOST_WIDE_INT bitend = 0;
+  poly_uint64 bitstart = 0;
+  poly_uint64 bitend = 0;
   if (TREE_CODE (lhs) == COMPONENT_REF)
     {
-      get_bit_range (&bitstart, &bitend, lhs, &lbitpos, &offset);
-      if (offset != NULL_TREE)
+      poly_int64 plbitpos;
+      get_bit_range (&bitstart, &bitend, lhs, &plbitpos, &offset);
+      if (!plbitpos.is_constant (&lbitpos) || offset != NULL_TREE)
 	return 0;
     }
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [038/nnn] poly_int: fold_comparison
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (36 preceding siblings ...)
  2017-10-23 17:16 ` [037/nnn] poly_int: get_bit_range Richard Sandiford
@ 2017-10-23 17:17 ` Richard Sandiford
  2017-11-28 21:47   ` Jeff Law
  2017-10-23 17:17 ` [039/nnn] poly_int: pass_store_merging::execute Richard Sandiford
                   ` (69 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:17 UTC (permalink / raw)
  To: gcc-patches

This patch makes fold_comparison track polynomial offsets when
folding address comparisons.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* fold-const.c (fold_comparison): Track sizes and offsets as
	poly_int64s rather than HOST_WIDE_INTs when folding address
	comparisons.

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 17:18:43.843393046 +0100
+++ gcc/fold-const.c	2017-10-23 17:18:44.902299961 +0100
@@ -8522,7 +8522,7 @@ fold_comparison (location_t loc, enum tr
 	  || TREE_CODE (arg1) == POINTER_PLUS_EXPR))
     {
       tree base0, base1, offset0 = NULL_TREE, offset1 = NULL_TREE;
-      HOST_WIDE_INT bitsize, bitpos0 = 0, bitpos1 = 0;
+      poly_int64 bitsize, bitpos0 = 0, bitpos1 = 0;
       machine_mode mode;
       int volatilep, reversep, unsignedp;
       bool indirect_base0 = false, indirect_base1 = false;
@@ -8563,17 +8563,14 @@ fold_comparison (location_t loc, enum tr
 	  else
 	    offset0 = size_binop (PLUS_EXPR, offset0,
 				  TREE_OPERAND (arg0, 1));
-	  if (TREE_CODE (offset0) == INTEGER_CST)
+	  if (poly_int_tree_p (offset0))
 	    {
-	      offset_int tem = wi::sext (wi::to_offset (offset0),
-					 TYPE_PRECISION (sizetype));
+	      poly_offset_int tem = wi::sext (wi::to_poly_offset (offset0),
+					      TYPE_PRECISION (sizetype));
 	      tem <<= LOG2_BITS_PER_UNIT;
 	      tem += bitpos0;
-	      if (wi::fits_shwi_p (tem))
-		{
-		  bitpos0 = tem.to_shwi ();
-		  offset0 = NULL_TREE;
-		}
+	      if (tem.to_shwi (&bitpos0))
+		offset0 = NULL_TREE;
 	    }
 	}
 
@@ -8609,17 +8606,14 @@ fold_comparison (location_t loc, enum tr
 	  else
 	    offset1 = size_binop (PLUS_EXPR, offset1,
 				  TREE_OPERAND (arg1, 1));
-	  if (TREE_CODE (offset1) == INTEGER_CST)
+	  if (poly_int_tree_p (offset1))
 	    {
-	      offset_int tem = wi::sext (wi::to_offset (offset1),
-					 TYPE_PRECISION (sizetype));
+	      poly_offset_int tem = wi::sext (wi::to_poly_offset (offset1),
+					      TYPE_PRECISION (sizetype));
 	      tem <<= LOG2_BITS_PER_UNIT;
 	      tem += bitpos1;
-	      if (wi::fits_shwi_p (tem))
-		{
-		  bitpos1 = tem.to_shwi ();
-		  offset1 = NULL_TREE;
-		}
+	      if (tem.to_shwi (&bitpos1))
+		offset1 = NULL_TREE;
 	    }
 	}
 
@@ -8635,7 +8629,7 @@ fold_comparison (location_t loc, enum tr
 		  && operand_equal_p (offset0, offset1, 0)))
 	    {
 	      if (!equality_code
-		  && bitpos0 != bitpos1
+		  && may_ne (bitpos0, bitpos1)
 		  && (pointer_may_wrap_p (base0, offset0, bitpos0)
 		      || pointer_may_wrap_p (base1, offset1, bitpos1)))
 		fold_overflow_warning (("assuming pointer wraparound does not "
@@ -8646,17 +8640,41 @@ fold_comparison (location_t loc, enum tr
 	      switch (code)
 		{
 		case EQ_EXPR:
-		  return constant_boolean_node (bitpos0 == bitpos1, type);
+		  if (must_eq (bitpos0, bitpos1))
+		    return boolean_true_node;
+		  if (must_ne (bitpos0, bitpos1))
+		    return boolean_false_node;
+		  break;
 		case NE_EXPR:
-		  return constant_boolean_node (bitpos0 != bitpos1, type);
+		  if (must_ne (bitpos0, bitpos1))
+		    return boolean_true_node;
+		  if (must_eq (bitpos0, bitpos1))
+		    return boolean_false_node;
+		  break;
 		case LT_EXPR:
-		  return constant_boolean_node (bitpos0 < bitpos1, type);
+		  if (must_lt (bitpos0, bitpos1))
+		    return boolean_true_node;
+		  if (must_ge (bitpos0, bitpos1))
+		    return boolean_false_node;
+		  break;
 		case LE_EXPR:
-		  return constant_boolean_node (bitpos0 <= bitpos1, type);
+		  if (must_le (bitpos0, bitpos1))
+		    return boolean_true_node;
+		  if (must_gt (bitpos0, bitpos1))
+		    return boolean_false_node;
+		  break;
 		case GE_EXPR:
-		  return constant_boolean_node (bitpos0 >= bitpos1, type);
+		  if (must_ge (bitpos0, bitpos1))
+		    return boolean_true_node;
+		  if (must_lt (bitpos0, bitpos1))
+		    return boolean_false_node;
+		  break;
 		case GT_EXPR:
-		  return constant_boolean_node (bitpos0 > bitpos1, type);
+		  if (must_gt (bitpos0, bitpos1))
+		    return boolean_true_node;
+		  if (must_le (bitpos0, bitpos1))
+		    return boolean_false_node;
+		  break;
 		default:;
 		}
 	    }
@@ -8667,7 +8685,7 @@ fold_comparison (location_t loc, enum tr
 	     because pointer arithmetic is restricted to retain within an
 	     object and overflow on pointer differences is undefined as of
 	     6.5.6/8 and /9 with respect to the signed ptrdiff_t.  */
-	  else if (bitpos0 == bitpos1)
+	  else if (must_eq (bitpos0, bitpos1))
 	    {
 	      /* By converting to signed sizetype we cover middle-end pointer
 	         arithmetic which operates on unsigned pointer types of size
@@ -8696,7 +8714,7 @@ fold_comparison (location_t loc, enum tr
 	}
       /* For equal offsets we can simplify to a comparison of the
 	 base addresses.  */
-      else if (bitpos0 == bitpos1
+      else if (must_eq (bitpos0, bitpos1)
 	       && (indirect_base0
 		   ? base0 != TREE_OPERAND (arg0, 0) : base0 != arg0)
 	       && (indirect_base1
@@ -8725,7 +8743,7 @@ fold_comparison (location_t loc, enum tr
 		    eliminated.  When ptr is null, although the -> expression
 		    is strictly speaking invalid, GCC retains it as a matter
 		    of QoI.  See PR c/44555. */
-		 && (offset0 == NULL_TREE && bitpos0 != 0))
+		 && (offset0 == NULL_TREE && known_nonzero (bitpos0)))
 		|| CONSTANT_CLASS_P (base0))
 	       && indirect_base0
 	       /* The caller guarantees that when one of the arguments is

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [039/nnn] poly_int: pass_store_merging::execute
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (37 preceding siblings ...)
  2017-10-23 17:17 ` [038/nnn] poly_int: fold_comparison Richard Sandiford
@ 2017-10-23 17:17 ` Richard Sandiford
  2017-11-28 18:00   ` Jeff Law
  2017-10-23 17:18 ` [040/nnn] poly_int: get_inner_reference & co Richard Sandiford
                   ` (68 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:17 UTC (permalink / raw)
  To: gcc-patches

This patch makes pass_store_merging::execute track polynomial sizes
and offsets.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* gimple-ssa-store-merging.c (pass_store_merging::execute): Track
	polynomial sizes and offsets.

Index: gcc/gimple-ssa-store-merging.c
===================================================================
--- gcc/gimple-ssa-store-merging.c	2017-10-23 17:11:39.971422491 +0100
+++ gcc/gimple-ssa-store-merging.c	2017-10-23 17:18:46.178187802 +0100
@@ -1389,7 +1389,7 @@ pass_store_merging::execute (function *f
 	      tree lhs = gimple_assign_lhs (stmt);
 	      tree rhs = gimple_assign_rhs1 (stmt);
 
-	      HOST_WIDE_INT bitsize, bitpos;
+	      poly_int64 bitsize, bitpos;
 	      machine_mode mode;
 	      int unsignedp = 0, reversep = 0, volatilep = 0;
 	      tree offset, base_addr;
@@ -1399,8 +1399,6 @@ pass_store_merging::execute (function *f
 	      /* As a future enhancement we could handle stores with the same
 		 base and offset.  */
 	      bool invalid = reversep
-			     || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
-				  && (TREE_CODE (rhs) != INTEGER_CST))
 			     || !rhs_valid_for_store_merging_p (rhs);
 
 	      /* We do not want to rewrite TARGET_MEM_REFs.  */
@@ -1413,23 +1411,17 @@ pass_store_merging::execute (function *f
 		 PR 23684 and this way we can catch more chains.  */
 	      else if (TREE_CODE (base_addr) == MEM_REF)
 		{
-		  offset_int bit_off, byte_off = mem_ref_offset (base_addr);
-		  bit_off = byte_off << LOG2_BITS_PER_UNIT;
+		  poly_offset_int byte_off = mem_ref_offset (base_addr);
+		  poly_offset_int bit_off = byte_off << LOG2_BITS_PER_UNIT;
 		  bit_off += bitpos;
-		  if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off))
-		    bitpos = bit_off.to_shwi ();
-		  else
+		  if (!bit_off.to_shwi (&bitpos))
 		    invalid = true;
 		  base_addr = TREE_OPERAND (base_addr, 0);
 		}
 	      /* get_inner_reference returns the base object, get at its
 	         address now.  */
 	      else
-		{
-		  if (bitpos < 0)
-		    invalid = true;
-		  base_addr = build_fold_addr_expr (base_addr);
-		}
+		base_addr = build_fold_addr_expr (base_addr);
 
 	      if (! invalid
 		  && offset != NULL_TREE)
@@ -1455,13 +1447,19 @@ pass_store_merging::execute (function *f
 	      struct imm_store_chain_info **chain_info
 		= m_stores.get (base_addr);
 
-	      if (!invalid)
+	      HOST_WIDE_INT const_bitsize, const_bitpos;
+	      if (!invalid
+		  && bitsize.is_constant (&const_bitsize)
+		  && bitpos.is_constant (&const_bitpos)
+		  && (const_bitsize <= MAX_BITSIZE_MODE_ANY_INT
+		      || TREE_CODE (rhs) == INTEGER_CST)
+		  && const_bitpos >= 0)
 		{
 		  store_immediate_info *info;
 		  if (chain_info)
 		    {
 		      info = new store_immediate_info (
-			bitsize, bitpos, stmt,
+			const_bitsize, const_bitpos, stmt,
 			(*chain_info)->m_store_info.length ());
 		      if (dump_file && (dump_flags & TDF_DETAILS))
 			{
@@ -1490,7 +1488,7 @@ pass_store_merging::execute (function *f
 		  /* Start a new chain.  */
 		  struct imm_store_chain_info *new_chain
 		    = new imm_store_chain_info (m_stores_head, base_addr);
-		  info = new store_immediate_info (bitsize, bitpos,
+		  info = new store_immediate_info (const_bitsize, const_bitpos,
 						   stmt, 0);
 		  new_chain->m_store_info.safe_push (info);
 		  m_stores.put (base_addr, new_chain);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [042/nnn] poly_int: reload1.c
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (39 preceding siblings ...)
  2017-10-23 17:18 ` [040/nnn] poly_int: get_inner_reference & co Richard Sandiford
@ 2017-10-23 17:18 ` Richard Sandiford
  2017-12-05 17:23   ` Jeff Law
  2017-10-23 17:18 ` [041/nnn] poly_int: reload.c Richard Sandiford
                   ` (66 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:18 UTC (permalink / raw)
  To: gcc-patches

This patch makes a few small poly_int64 changes to reload1.c,
mostly related to eliminations.  Again, there's no real expectation
that reload will be used for targets that have polynomial-sized modes,
but it seemed easier to convert it anyway.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* reload1.c (elim_table): Change initial_offset, offset and
	previous_offset from HOST_WIDE_INT to poly_int64_pod.
	(offsets_at): Change the target array's element type from
	HOST_WIDE_INT to poly_int64_pod.
	(set_label_offsets, eliminate_regs_1, eliminate_regs_in_insn)
	(elimination_costs_in_insn, update_eliminable_offsets)
	(verify_initial_elim_offsets, set_offsets_for_label)
	(init_eliminable_invariants): Update after above changes.

Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2017-10-23 17:18:51.486721146 +0100
+++ gcc/reload1.c	2017-10-23 17:18:52.641619623 +0100
@@ -261,13 +261,13 @@ struct elim_table
 {
   int from;			/* Register number to be eliminated.  */
   int to;			/* Register number used as replacement.  */
-  HOST_WIDE_INT initial_offset;	/* Initial difference between values.  */
+  poly_int64_pod initial_offset; /* Initial difference between values.  */
   int can_eliminate;		/* Nonzero if this elimination can be done.  */
   int can_eliminate_previous;	/* Value returned by TARGET_CAN_ELIMINATE
 				   target hook in previous scan over insns
 				   made by reload.  */
-  HOST_WIDE_INT offset;		/* Current offset between the two regs.  */
-  HOST_WIDE_INT previous_offset;/* Offset at end of previous insn.  */
+  poly_int64_pod offset;	/* Current offset between the two regs.  */
+  poly_int64_pod previous_offset; /* Offset at end of previous insn.  */
   int ref_outside_mem;		/* "to" has been referenced outside a MEM.  */
   rtx from_rtx;			/* REG rtx for the register to be eliminated.
 				   We cannot simply compare the number since
@@ -313,7 +313,7 @@ #define NUM_ELIMINABLE_REGS ARRAY_SIZE (
 
 static int first_label_num;
 static char *offsets_known_at;
-static HOST_WIDE_INT (*offsets_at)[NUM_ELIMINABLE_REGS];
+static poly_int64_pod (*offsets_at)[NUM_ELIMINABLE_REGS];
 
 vec<reg_equivs_t, va_gc> *reg_equivs;
 
@@ -2351,9 +2351,9 @@ set_label_offsets (rtx x, rtx_insn *insn
 	   where the offsets disagree.  */
 
 	for (i = 0; i < NUM_ELIMINABLE_REGS; i++)
-	  if (offsets_at[CODE_LABEL_NUMBER (x) - first_label_num][i]
-	      != (initial_p ? reg_eliminate[i].initial_offset
-		  : reg_eliminate[i].offset))
+	  if (may_ne (offsets_at[CODE_LABEL_NUMBER (x) - first_label_num][i],
+		      (initial_p ? reg_eliminate[i].initial_offset
+		       : reg_eliminate[i].offset)))
 	    reg_eliminate[i].can_eliminate = 0;
 
       return;
@@ -2436,7 +2436,7 @@ set_label_offsets (rtx x, rtx_insn *insn
       /* If we reach here, all eliminations must be at their initial
 	 offset because we are doing a jump to a variable address.  */
       for (p = reg_eliminate; p < &reg_eliminate[NUM_ELIMINABLE_REGS]; p++)
-	if (p->offset != p->initial_offset)
+	if (may_ne (p->offset, p->initial_offset))
 	  p->can_eliminate = 0;
       break;
 
@@ -2593,8 +2593,9 @@ eliminate_regs_1 (rtx x, machine_mode me
 		   We special-case the commonest situation in
 		   eliminate_regs_in_insn, so just replace a PLUS with a
 		   PLUS here, unless inside a MEM.  */
-		if (mem_mode != 0 && CONST_INT_P (XEXP (x, 1))
-		    && INTVAL (XEXP (x, 1)) == - ep->previous_offset)
+		if (mem_mode != 0
+		    && CONST_INT_P (XEXP (x, 1))
+		    && must_eq (INTVAL (XEXP (x, 1)), -ep->previous_offset))
 		  return ep->to_rtx;
 		else
 		  return gen_rtx_PLUS (Pmode, ep->to_rtx,
@@ -3344,7 +3345,7 @@ eliminate_regs_in_insn (rtx_insn *insn,
   if (plus_cst_src)
     {
       rtx reg = XEXP (plus_cst_src, 0);
-      HOST_WIDE_INT offset = INTVAL (XEXP (plus_cst_src, 1));
+      poly_int64 offset = INTVAL (XEXP (plus_cst_src, 1));
 
       if (GET_CODE (reg) == SUBREG)
 	reg = SUBREG_REG (reg);
@@ -3364,7 +3365,7 @@ eliminate_regs_in_insn (rtx_insn *insn,
 	       increase the cost of the insn by replacing a simple REG
 	       with (plus (reg sp) CST).  So try only when we already
 	       had a PLUS before.  */
-	    if (offset == 0 || plus_src)
+	    if (known_zero (offset) || plus_src)
 	      {
 		rtx new_src = plus_constant (GET_MODE (to_rtx),
 					     to_rtx, offset);
@@ -3562,12 +3563,12 @@ eliminate_regs_in_insn (rtx_insn *insn,
 
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     {
-      if (ep->previous_offset != ep->offset && ep->ref_outside_mem)
+      if (may_ne (ep->previous_offset, ep->offset) && ep->ref_outside_mem)
 	ep->can_eliminate = 0;
 
       ep->ref_outside_mem = 0;
 
-      if (ep->previous_offset != ep->offset)
+      if (may_ne (ep->previous_offset, ep->offset))
 	val = 1;
     }
 
@@ -3733,7 +3734,7 @@ elimination_costs_in_insn (rtx_insn *ins
 
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     {
-      if (ep->previous_offset != ep->offset && ep->ref_outside_mem)
+      if (may_ne (ep->previous_offset, ep->offset) && ep->ref_outside_mem)
 	ep->can_eliminate = 0;
 
       ep->ref_outside_mem = 0;
@@ -3758,7 +3759,7 @@ update_eliminable_offsets (void)
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     {
       ep->previous_offset = ep->offset;
-      if (ep->can_eliminate && ep->offset != ep->initial_offset)
+      if (ep->can_eliminate && may_ne (ep->offset, ep->initial_offset))
 	num_not_at_initial_offset++;
     }
 }
@@ -3812,7 +3813,7 @@ mark_not_eliminable (rtx dest, const_rtx
 static bool
 verify_initial_elim_offsets (void)
 {
-  HOST_WIDE_INT t;
+  poly_int64 t;
   struct elim_table *ep;
 
   if (!num_eliminable)
@@ -3822,7 +3823,7 @@ verify_initial_elim_offsets (void)
   for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
     {
       INITIAL_ELIMINATION_OFFSET (ep->from, ep->to, t);
-      if (t != ep->initial_offset)
+      if (may_ne (t, ep->initial_offset))
 	return false;
     }
 
@@ -3893,7 +3894,7 @@ set_offsets_for_label (rtx_insn *insn)
     {
       ep->offset = ep->previous_offset
 		 = offsets_at[label_nr - first_label_num][i];
-      if (ep->can_eliminate && ep->offset != ep->initial_offset)
+      if (ep->can_eliminate && may_ne (ep->offset, ep->initial_offset))
 	num_not_at_initial_offset++;
     }
 }
@@ -4095,7 +4096,8 @@ init_eliminable_invariants (rtx_insn *fi
 
   /* Allocate the tables used to store offset information at labels.  */
   offsets_known_at = XNEWVEC (char, num_labels);
-  offsets_at = (HOST_WIDE_INT (*)[NUM_ELIMINABLE_REGS]) xmalloc (num_labels * NUM_ELIMINABLE_REGS * sizeof (HOST_WIDE_INT));
+  offsets_at = (poly_int64_pod (*)[NUM_ELIMINABLE_REGS])
+    xmalloc (num_labels * NUM_ELIMINABLE_REGS * sizeof (poly_int64));
 
 /* Look for REG_EQUIV notes; record what each pseudo is equivalent
    to.  If DO_SUBREGS is true, also find all paradoxical subregs and

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [041/nnn] poly_int: reload.c
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (40 preceding siblings ...)
  2017-10-23 17:18 ` [042/nnn] poly_int: reload1.c Richard Sandiford
@ 2017-10-23 17:18 ` Richard Sandiford
  2017-12-05 17:10   ` Jeff Law
  2017-10-23 17:19 ` [045/nnn] poly_int: REG_ARGS_SIZE Richard Sandiford
                   ` (65 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:18 UTC (permalink / raw)
  To: gcc-patches

This patch makes a few small poly_int64 changes to reload.c,
such as in the "decomposition" structure.  In practice, any
port with polynomial-sized modes should be using LRA rather
than reload, but it's easier to convert reload anyway than
to sprinkle to_constants everywhere.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* reload.h (reload::inc): Change from an int to a poly_int64_pod.
	* reload.c (combine_reloads, debug_reload_to_stream): Likewise.
	(decomposition): Change start and end from HOST_WIDE_INT
	to poly_int64_pod.
	(decompose, immune_p): Update accordingly.
	(find_inc_amount): Return a poly_int64 rather than an int.
	* reload1.c (inc_for_reload): Take the inc_amount as a poly_int64
	rather than an int.

Index: gcc/reload.h
===================================================================
--- gcc/reload.h	2017-10-23 17:07:40.266433752 +0100
+++ gcc/reload.h	2017-10-23 17:18:51.485721234 +0100
@@ -97,7 +97,7 @@ struct reload
   /* Positive amount to increment or decrement by if
      reload_in is a PRE_DEC, PRE_INC, POST_DEC, POST_INC.
      Ignored otherwise (don't assume it is zero).  */
-  int inc;
+  poly_int64_pod inc;
   /* A reg for which reload_in is the equivalent.
      If reload_in is a symbol_ref which came from
      reg_equiv_constant, then this is the pseudo
Index: gcc/reload.c
===================================================================
--- gcc/reload.c	2017-10-23 17:16:50.373527872 +0100
+++ gcc/reload.c	2017-10-23 17:18:51.485721234 +0100
@@ -168,8 +168,8 @@ struct decomposition
   int reg_flag;		/* Nonzero if referencing a register.  */
   int safe;		/* Nonzero if this can't conflict with anything.  */
   rtx base;		/* Base address for MEM.  */
-  HOST_WIDE_INT start;	/* Starting offset or register number.  */
-  HOST_WIDE_INT end;	/* Ending offset or register number.  */
+  poly_int64_pod start;	/* Starting offset or register number.  */
+  poly_int64_pod end;	/* Ending offset or register number.  */
 };
 
 /* Save MEMs needed to copy from one class of registers to another.  One MEM
@@ -278,7 +278,7 @@ static void find_reloads_address_part (r
 static rtx find_reloads_subreg_address (rtx, int, enum reload_type,
 					int, rtx_insn *, int *);
 static void copy_replacements_1 (rtx *, rtx *, int);
-static int find_inc_amount (rtx, rtx);
+static poly_int64 find_inc_amount (rtx, rtx);
 static int refers_to_mem_for_reload_p (rtx);
 static int refers_to_regno_for_reload_p (unsigned int, unsigned int,
 					 rtx, rtx *);
@@ -1772,7 +1772,7 @@ combine_reloads (void)
 	&& (ira_reg_class_max_nregs [(int)rld[i].rclass][(int) rld[i].inmode]
 	    == ira_reg_class_max_nregs [(int) rld[output_reload].rclass]
 				       [(int) rld[output_reload].outmode])
-	&& rld[i].inc == 0
+	&& known_zero (rld[i].inc)
 	&& rld[i].reg_rtx == 0
 	/* Don't combine two reloads with different secondary
 	   memory locations.  */
@@ -2360,7 +2360,7 @@ operands_match_p (rtx x, rtx y)
 decompose (rtx x)
 {
   struct decomposition val;
-  int all_const = 0;
+  int all_const = 0, regno;
 
   memset (&val, 0, sizeof (val));
 
@@ -2458,29 +2458,33 @@ decompose (rtx x)
 
     case REG:
       val.reg_flag = 1;
-      val.start = true_regnum (x);
-      if (val.start < 0 || val.start >= FIRST_PSEUDO_REGISTER)
+      regno = true_regnum (x);
+      if (regno < 0 || regno >= FIRST_PSEUDO_REGISTER)
 	{
 	  /* A pseudo with no hard reg.  */
 	  val.start = REGNO (x);
 	  val.end = val.start + 1;
 	}
       else
-	/* A hard reg.  */
-	val.end = end_hard_regno (GET_MODE (x), val.start);
+	{
+	  /* A hard reg.  */
+	  val.start = regno;
+	  val.end = end_hard_regno (GET_MODE (x), regno);
+	}
       break;
 
     case SUBREG:
       if (!REG_P (SUBREG_REG (x)))
 	/* This could be more precise, but it's good enough.  */
 	return decompose (SUBREG_REG (x));
-      val.reg_flag = 1;
-      val.start = true_regnum (x);
-      if (val.start < 0 || val.start >= FIRST_PSEUDO_REGISTER)
+      regno = true_regnum (x);
+      if (regno < 0 || regno >= FIRST_PSEUDO_REGISTER)
 	return decompose (SUBREG_REG (x));
-      else
-	/* A hard reg.  */
-	val.end = val.start + subreg_nregs (x);
+
+      /* A hard reg.  */
+      val.reg_flag = 1;
+      val.start = regno;
+      val.end = regno + subreg_nregs (x);
       break;
 
     case SCRATCH:
@@ -2505,7 +2509,11 @@ immune_p (rtx x, rtx y, struct decomposi
   struct decomposition xdata;
 
   if (ydata.reg_flag)
-    return !refers_to_regno_for_reload_p (ydata.start, ydata.end, x, (rtx*) 0);
+    /* In this case the decomposition structure contains register
+       numbers rather than byte offsets.  */
+    return !refers_to_regno_for_reload_p (ydata.start.to_constant (),
+					  ydata.end.to_constant (),
+					  x, (rtx *) 0);
   if (ydata.safe)
     return 1;
 
@@ -2536,7 +2544,7 @@ immune_p (rtx x, rtx y, struct decomposi
       return 0;
     }
 
-  return (xdata.start >= ydata.end || ydata.start >= xdata.end);
+  return must_ge (xdata.start, ydata.end) || must_ge (ydata.start, xdata.end);
 }
 
 /* Similar, but calls decompose.  */
@@ -7063,7 +7071,7 @@ find_equiv_reg (rtx goal, rtx_insn *insn
    within X, and return the amount INCED is incremented or decremented by.
    The value is always positive.  */
 
-static int
+static poly_int64
 find_inc_amount (rtx x, rtx inced)
 {
   enum rtx_code code = GET_CODE (x);
@@ -7096,8 +7104,8 @@ find_inc_amount (rtx x, rtx inced)
     {
       if (fmt[i] == 'e')
 	{
-	  int tem = find_inc_amount (XEXP (x, i), inced);
-	  if (tem != 0)
+	  poly_int64 tem = find_inc_amount (XEXP (x, i), inced);
+	  if (maybe_nonzero (tem))
 	    return tem;
 	}
       if (fmt[i] == 'E')
@@ -7105,8 +7113,8 @@ find_inc_amount (rtx x, rtx inced)
 	  int j;
 	  for (j = XVECLEN (x, i) - 1; j >= 0; j--)
 	    {
-	      int tem = find_inc_amount (XVECEXP (x, i, j), inced);
-	      if (tem != 0)
+	      poly_int64 tem = find_inc_amount (XVECEXP (x, i, j), inced);
+	      if (maybe_nonzero (tem))
 		return tem;
 	    }
 	}
@@ -7267,8 +7275,11 @@ debug_reload_to_stream (FILE *f)
       if (rld[r].nongroup)
 	fprintf (f, ", nongroup");
 
-      if (rld[r].inc != 0)
-	fprintf (f, ", inc by %d", rld[r].inc);
+      if (maybe_nonzero (rld[r].inc))
+	{
+	  fprintf (f, ", inc by ");
+	  print_dec (rld[r].inc, f, SIGNED);
+	}
 
       if (rld[r].nocombine)
 	fprintf (f, ", can't combine");
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2017-10-23 17:16:50.373527872 +0100
+++ gcc/reload1.c	2017-10-23 17:18:51.486721146 +0100
@@ -398,7 +398,7 @@ static void emit_reload_insns (struct in
 static void delete_output_reload (rtx_insn *, int, int, rtx);
 static void delete_address_reloads (rtx_insn *, rtx_insn *);
 static void delete_address_reloads_1 (rtx_insn *, rtx, rtx_insn *);
-static void inc_for_reload (rtx, rtx, rtx, int);
+static void inc_for_reload (rtx, rtx, rtx, poly_int64);
 static void add_auto_inc_notes (rtx_insn *, rtx);
 static void substitute (rtx *, const_rtx, rtx);
 static bool gen_reload_chain_without_interm_reg_p (int, int);
@@ -9075,7 +9075,7 @@ delete_address_reloads_1 (rtx_insn *dead
    This cannot be deduced from VALUE.  */
 
 static void
-inc_for_reload (rtx reloadreg, rtx in, rtx value, int inc_amount)
+inc_for_reload (rtx reloadreg, rtx in, rtx value, poly_int64 inc_amount)
 {
   /* REG or MEM to be copied and incremented.  */
   rtx incloc = find_replacement (&XEXP (value, 0));
@@ -9105,7 +9105,7 @@ inc_for_reload (rtx reloadreg, rtx in, r
       if (GET_CODE (value) == PRE_DEC || GET_CODE (value) == POST_DEC)
 	inc_amount = -inc_amount;
 
-      inc = GEN_INT (inc_amount);
+      inc = gen_int_mode (inc_amount, Pmode);
     }
 
   /* If this is post-increment, first copy the location to the reload reg.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [040/nnn] poly_int: get_inner_reference & co.
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (38 preceding siblings ...)
  2017-10-23 17:17 ` [039/nnn] poly_int: pass_store_merging::execute Richard Sandiford
@ 2017-10-23 17:18 ` Richard Sandiford
  2017-12-06 17:26   ` Jeff Law
  2018-12-21 11:17   ` Thomas Schwinge
  2017-10-23 17:18 ` [042/nnn] poly_int: reload1.c Richard Sandiford
                   ` (67 subsequent siblings)
  107 siblings, 2 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:18 UTC (permalink / raw)
  To: gcc-patches

This patch makes get_inner_reference and ptr_difference_const return the
bit size and bit position as poly_int64s rather than HOST_WIDE_INTS.
The non-mechanical changes were handled by previous patches.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree.h (get_inner_reference): Return the bitsize and bitpos
	as poly_int64_pods rather than HOST_WIDE_INT.
	* fold-const.h (ptr_difference_const): Return the pointer difference
	as a poly_int64_pod rather than a HOST_WIDE_INT.
	* expr.c (get_inner_reference): Return the bitsize and bitpos
	as poly_int64_pods rather than HOST_WIDE_INT.
	(expand_expr_addr_expr_1, expand_expr_real_1): Track polynomial
	offsets and sizes.
	* fold-const.c (make_bit_field_ref): Take the bitpos as a poly_int64
	rather than a HOST_WIDE_INT.  Update call to get_inner_reference.
	(optimize_bit_field_compare): Update call to get_inner_reference.
	(decode_field_reference): Likewise.
	(fold_unary_loc): Track polynomial offsets and sizes.
	(split_address_to_core_and_offset): Return the bitpos as a
	poly_int64_pod rather than a HOST_WIDE_INT.
	(ptr_difference_const): Likewise for the pointer difference.
	* asan.c (instrument_derefs): Track polynomial offsets and sizes.
	* config/mips/mips.c (r10k_safe_mem_expr_p): Likewise.
	* dbxout.c (dbxout_expand_expr): Likewise.
	* dwarf2out.c (loc_list_for_address_of_addr_expr_of_indirect_ref)
	(loc_list_from_tree_1, fortran_common): Likewise.
	* gimple-laddress.c (pass_laddress::execute): Likewise.
	* gimplify.c (gimplify_scan_omp_clauses): Likewise.
	* simplify-rtx.c (delegitimize_mem_from_attrs): Likewise.
	* tree-affine.c (tree_to_aff_combination): Likewise.
	(get_inner_reference_aff): Likewise.
	* tree-data-ref.c (split_constant_offset_1): Likewise.
	(dr_analyze_innermost): Likewise.
	* tree-scalar-evolution.c (interpret_rhs_expr): Likewise.
	* tree-sra.c (ipa_sra_check_caller): Likewise.
	* tree-ssa-math-opts.c (find_bswap_or_nop_load): Likewise.
	* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
	* ubsan.c (maybe_instrument_pointer_overflow): Likewise.
	(instrument_bool_enum_load, instrument_object_size): Likewise.
	* gimple-ssa-strength-reduction.c (slsr_process_ref): Update call
	to get_inner_reference.
	* hsa-gen.c (gen_hsa_addr): Likewise.
	* sanopt.c (maybe_optimize_ubsan_ptr_ifn): Likewise.
	* tsan.c (instrument_expr): Likewise.
	* match.pd: Update call to ptr_difference_const.

gcc/ada/
	* gcc-interface/trans.c (Attribute_to_gnu): Track polynomial
	offsets and sizes.
	* gcc-interface/utils2.c (build_unary_op): Likewise.

gcc/cp/
	* constexpr.c (check_automatic_or_tls): Track polynomial
	offsets and sizes.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 17:18:40.711668346 +0100
+++ gcc/tree.h	2017-10-23 17:18:47.668056833 +0100
@@ -5608,19 +5608,8 @@ extern bool complete_ctor_at_level_p (co
 /* Given an expression EXP that is a handled_component_p,
    look for the ultimate containing object, which is returned and specify
    the access position and size.  */
-extern tree get_inner_reference (tree, HOST_WIDE_INT *, HOST_WIDE_INT *,
+extern tree get_inner_reference (tree, poly_int64_pod *, poly_int64_pod *,
 				 tree *, machine_mode *, int *, int *, int *);
-/* Temporary.  */
-inline tree
-get_inner_reference (tree exp, poly_int64_pod *pbitsize,
-		     poly_int64_pod *pbitpos, tree *poffset,
-		     machine_mode *pmode, int *punsignedp,
-		     int *preversep, int *pvolatilep)
-{
-  return get_inner_reference (exp, &pbitsize->coeffs[0], &pbitpos->coeffs[0],
-			      poffset, pmode, punsignedp, preversep,
-			      pvolatilep);
-}
 
 extern tree build_personality_function (const char *);
 
Index: gcc/fold-const.h
===================================================================
--- gcc/fold-const.h	2017-10-23 17:11:40.244945208 +0100
+++ gcc/fold-const.h	2017-10-23 17:18:47.662057360 +0100
@@ -122,7 +122,7 @@ extern tree div_if_zero_remainder (const
 extern bool tree_swap_operands_p (const_tree, const_tree);
 extern enum tree_code swap_tree_comparison (enum tree_code);
 
-extern bool ptr_difference_const (tree, tree, HOST_WIDE_INT *);
+extern bool ptr_difference_const (tree, tree, poly_int64_pod *);
 extern enum tree_code invert_tree_comparison (enum tree_code, bool);
 
 extern bool tree_unary_nonzero_warnv_p (enum tree_code, tree, tree, bool *);
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:18:43.842393134 +0100
+++ gcc/expr.c	2017-10-23 17:18:47.661057448 +0100
@@ -7015,8 +7015,8 @@ store_field (rtx target, poly_int64 bits
    this case, but the address of the object can be found.  */
 
 tree
-get_inner_reference (tree exp, HOST_WIDE_INT *pbitsize,
-		     HOST_WIDE_INT *pbitpos, tree *poffset,
+get_inner_reference (tree exp, poly_int64_pod *pbitsize,
+		     poly_int64_pod *pbitpos, tree *poffset,
 		     machine_mode *pmode, int *punsignedp,
 		     int *preversep, int *pvolatilep)
 {
@@ -7024,7 +7024,7 @@ get_inner_reference (tree exp, HOST_WIDE
   machine_mode mode = VOIDmode;
   bool blkmode_bitfield = false;
   tree offset = size_zero_node;
-  offset_int bit_offset = 0;
+  poly_offset_int bit_offset = 0;
 
   /* First get the mode, signedness, storage order and size.  We do this from
      just the outermost expression.  */
@@ -7089,7 +7089,7 @@ get_inner_reference (tree exp, HOST_WIDE
       switch (TREE_CODE (exp))
 	{
 	case BIT_FIELD_REF:
-	  bit_offset += wi::to_offset (TREE_OPERAND (exp, 2));
+	  bit_offset += wi::to_poly_offset (TREE_OPERAND (exp, 2));
 	  break;
 
 	case COMPONENT_REF:
@@ -7104,7 +7104,7 @@ get_inner_reference (tree exp, HOST_WIDE
 	      break;
 
 	    offset = size_binop (PLUS_EXPR, offset, this_offset);
-	    bit_offset += wi::to_offset (DECL_FIELD_BIT_OFFSET (field));
+	    bit_offset += wi::to_poly_offset (DECL_FIELD_BIT_OFFSET (field));
 
 	    /* ??? Right now we don't do anything with DECL_OFFSET_ALIGN.  */
 	  }
@@ -7172,44 +7172,36 @@ get_inner_reference (tree exp, HOST_WIDE
   /* If OFFSET is constant, see if we can return the whole thing as a
      constant bit position.  Make sure to handle overflow during
      this conversion.  */
-  if (TREE_CODE (offset) == INTEGER_CST)
+  if (poly_int_tree_p (offset))
     {
-      offset_int tem = wi::sext (wi::to_offset (offset),
-				 TYPE_PRECISION (sizetype));
+      poly_offset_int tem = wi::sext (wi::to_poly_offset (offset),
+				      TYPE_PRECISION (sizetype));
       tem <<= LOG2_BITS_PER_UNIT;
       tem += bit_offset;
-      if (wi::fits_shwi_p (tem))
-	{
-	  *pbitpos = tem.to_shwi ();
-	  *poffset = offset = NULL_TREE;
-	}
+      if (tem.to_shwi (pbitpos))
+	*poffset = offset = NULL_TREE;
     }
 
   /* Otherwise, split it up.  */
   if (offset)
     {
       /* Avoid returning a negative bitpos as this may wreak havoc later.  */
-      if (wi::neg_p (bit_offset) || !wi::fits_shwi_p (bit_offset))
+      if (!bit_offset.to_shwi (pbitpos) || may_lt (*pbitpos, 0))
         {
-	  offset_int mask = wi::mask <offset_int> (LOG2_BITS_PER_UNIT, false);
-	  offset_int tem = wi::bit_and_not (bit_offset, mask);
-	  /* TEM is the bitpos rounded to BITS_PER_UNIT towards -Inf.
-	     Subtract it to BIT_OFFSET and add it (scaled) to OFFSET.  */
-	  bit_offset -= tem;
-	  tem >>= LOG2_BITS_PER_UNIT;
+	  *pbitpos = num_trailing_bits (bit_offset.force_shwi ());
+	  poly_offset_int bytes = bits_to_bytes_round_down (bit_offset);
 	  offset = size_binop (PLUS_EXPR, offset,
-			       wide_int_to_tree (sizetype, tem));
+			       build_int_cst (sizetype, bytes.force_shwi ()));
 	}
 
-      *pbitpos = bit_offset.to_shwi ();
       *poffset = offset;
     }
 
   /* We can use BLKmode for a byte-aligned BLKmode bitfield.  */
   if (mode == VOIDmode
       && blkmode_bitfield
-      && (*pbitpos % BITS_PER_UNIT) == 0
-      && (*pbitsize % BITS_PER_UNIT) == 0)
+      && multiple_p (*pbitpos, BITS_PER_UNIT)
+      && multiple_p (*pbitsize, BITS_PER_UNIT))
     *pmode = BLKmode;
   else
     *pmode = mode;
@@ -7760,7 +7752,7 @@ expand_expr_addr_expr_1 (tree exp, rtx t
 {
   rtx result, subtarget;
   tree inner, offset;
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, bitpos;
   int unsignedp, reversep, volatilep = 0;
   machine_mode mode1;
 
@@ -7876,7 +7868,7 @@ expand_expr_addr_expr_1 (tree exp, rtx t
   /* We must have made progress.  */
   gcc_assert (inner != exp);
 
-  subtarget = offset || bitpos ? NULL_RTX : target;
+  subtarget = offset || maybe_nonzero (bitpos) ? NULL_RTX : target;
   /* For VIEW_CONVERT_EXPR, where the outer alignment is bigger than
      inner alignment, force the inner to be sufficiently aligned.  */
   if (CONSTANT_CLASS_P (inner)
@@ -7911,20 +7903,19 @@ expand_expr_addr_expr_1 (tree exp, rtx t
 	result = simplify_gen_binary (PLUS, tmode, result, tmp);
       else
 	{
-	  subtarget = bitpos ? NULL_RTX : target;
+	  subtarget = maybe_nonzero (bitpos) ? NULL_RTX : target;
 	  result = expand_simple_binop (tmode, PLUS, result, tmp, subtarget,
 					1, OPTAB_LIB_WIDEN);
 	}
     }
 
-  if (bitpos)
+  if (maybe_nonzero (bitpos))
     {
       /* Someone beforehand should have rejected taking the address
-	 of such an object.  */
-      gcc_assert ((bitpos % BITS_PER_UNIT) == 0);
-
+	 of an object that isn't byte-aligned.  */
+      poly_int64 bytepos = exact_div (bitpos, BITS_PER_UNIT);
       result = convert_memory_address_addr_space (tmode, result, as);
-      result = plus_constant (tmode, result, bitpos / BITS_PER_UNIT);
+      result = plus_constant (tmode, result, bytepos);
       if (modifier < EXPAND_SUM)
 	result = force_operand (result, target);
     }
@@ -10529,7 +10520,7 @@ expand_expr_real_1 (tree exp, rtx target
     normal_inner_ref:
       {
 	machine_mode mode1, mode2;
-	HOST_WIDE_INT bitsize, bitpos;
+	poly_int64 bitsize, bitpos, bytepos;
 	tree offset;
 	int reversep, volatilep = 0, must_force_mem;
 	tree tem
@@ -10582,13 +10573,14 @@ expand_expr_real_1 (tree exp, rtx target
 	   to a larger size.  */
 	must_force_mem = (offset
 			  || mode1 == BLKmode
-			  || bitpos + bitsize > GET_MODE_BITSIZE (mode2));
+			  || may_gt (bitpos + bitsize,
+				     GET_MODE_BITSIZE (mode2)));
 
 	/* Handle CONCAT first.  */
 	if (GET_CODE (op0) == CONCAT && !must_force_mem)
 	  {
-	    if (bitpos == 0
-		&& bitsize == GET_MODE_BITSIZE (GET_MODE (op0))
+	    if (known_zero (bitpos)
+		&& must_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (op0)))
 		&& COMPLEX_MODE_P (mode1)
 		&& COMPLEX_MODE_P (GET_MODE (op0))
 		&& (GET_MODE_PRECISION (GET_MODE_INNER (mode1))
@@ -10620,17 +10612,20 @@ expand_expr_real_1 (tree exp, rtx target
 		  }
 		return op0;
 	      }
-	    if (bitpos == 0
-		&& bitsize == GET_MODE_BITSIZE (GET_MODE (XEXP (op0, 0)))
-		&& bitsize)
+	    if (known_zero (bitpos)
+		&& must_eq (bitsize,
+			    GET_MODE_BITSIZE (GET_MODE (XEXP (op0, 0))))
+		&& maybe_nonzero (bitsize))
 	      {
 		op0 = XEXP (op0, 0);
 		mode2 = GET_MODE (op0);
 	      }
-	    else if (bitpos == GET_MODE_BITSIZE (GET_MODE (XEXP (op0, 0)))
-		     && bitsize == GET_MODE_BITSIZE (GET_MODE (XEXP (op0, 1)))
-		     && bitpos
-		     && bitsize)
+	    else if (must_eq (bitpos,
+			      GET_MODE_BITSIZE (GET_MODE (XEXP (op0, 0))))
+		     && must_eq (bitsize,
+				 GET_MODE_BITSIZE (GET_MODE (XEXP (op0, 1))))
+		     && maybe_nonzero (bitpos)
+		     && maybe_nonzero (bitsize))
 	      {
 		op0 = XEXP (op0, 1);
 		bitpos = 0;
@@ -10685,13 +10680,14 @@ expand_expr_real_1 (tree exp, rtx target
 
 	    /* See the comment in expand_assignment for the rationale.  */
 	    if (mode1 != VOIDmode
-		&& bitpos != 0
-		&& bitsize > 0
-		&& (bitpos % bitsize) == 0
-		&& (bitsize % GET_MODE_ALIGNMENT (mode1)) == 0
+		&& maybe_nonzero (bitpos)
+		&& may_gt (bitsize, 0)
+		&& multiple_p (bitpos, BITS_PER_UNIT, &bytepos)
+		&& multiple_p (bitpos, bitsize)
+		&& multiple_p (bitsize, GET_MODE_ALIGNMENT (mode1))
 		&& MEM_ALIGN (op0) >= GET_MODE_ALIGNMENT (mode1))
 	      {
-		op0 = adjust_address (op0, mode1, bitpos / BITS_PER_UNIT);
+		op0 = adjust_address (op0, mode1, bytepos);
 		bitpos = 0;
 	      }
 
@@ -10701,7 +10697,9 @@ expand_expr_real_1 (tree exp, rtx target
 
 	/* If OFFSET is making OP0 more aligned than BIGGEST_ALIGNMENT,
 	   record its alignment as BIGGEST_ALIGNMENT.  */
-	if (MEM_P (op0) && bitpos == 0 && offset != 0
+	if (MEM_P (op0)
+	    && known_zero (bitpos)
+	    && offset != 0
 	    && is_aligning_offset (offset, tem))
 	  set_mem_align (op0, BIGGEST_ALIGNMENT);
 
@@ -10734,37 +10732,37 @@ expand_expr_real_1 (tree exp, rtx target
 	    || (volatilep && TREE_CODE (exp) == COMPONENT_REF
 		&& DECL_BIT_FIELD_TYPE (TREE_OPERAND (exp, 1))
 		&& mode1 != BLKmode
-		&& bitsize < GET_MODE_SIZE (mode1) * BITS_PER_UNIT)
+		&& may_lt (bitsize, GET_MODE_SIZE (mode1) * BITS_PER_UNIT))
 	    /* If the field isn't aligned enough to fetch as a memref,
 	       fetch it as a bit field.  */
 	    || (mode1 != BLKmode
 		&& (((MEM_P (op0)
 		      ? MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode1)
-		        || (bitpos % GET_MODE_ALIGNMENT (mode1) != 0)
+			|| !multiple_p (bitpos, GET_MODE_ALIGNMENT (mode1))
 		      : TYPE_ALIGN (TREE_TYPE (tem)) < GET_MODE_ALIGNMENT (mode)
-		        || (bitpos % GET_MODE_ALIGNMENT (mode) != 0))
+			|| !multiple_p (bitpos, GET_MODE_ALIGNMENT (mode)))
 		     && modifier != EXPAND_MEMORY
 		     && ((modifier == EXPAND_CONST_ADDRESS
 			  || modifier == EXPAND_INITIALIZER)
 			 ? STRICT_ALIGNMENT
 			 : targetm.slow_unaligned_access (mode1,
 							  MEM_ALIGN (op0))))
-		    || (bitpos % BITS_PER_UNIT != 0)))
+		    || !multiple_p (bitpos, BITS_PER_UNIT)))
 	    /* If the type and the field are a constant size and the
 	       size of the type isn't the same size as the bitfield,
 	       we must use bitfield operations.  */
-	    || (bitsize >= 0
+	    || (known_size_p (bitsize)
 		&& TYPE_SIZE (TREE_TYPE (exp))
-		&& TREE_CODE (TYPE_SIZE (TREE_TYPE (exp))) == INTEGER_CST
-		&& 0 != compare_tree_int (TYPE_SIZE (TREE_TYPE (exp)),
-					  bitsize)))
+		&& poly_int_tree_p (TYPE_SIZE (TREE_TYPE (exp)))
+		&& may_ne (wi::to_poly_offset (TYPE_SIZE (TREE_TYPE (exp))),
+			   bitsize)))
 	  {
 	    machine_mode ext_mode = mode;
 
 	    if (ext_mode == BLKmode
 		&& ! (target != 0 && MEM_P (op0)
 		      && MEM_P (target)
-		      && bitpos % BITS_PER_UNIT == 0))
+		      && multiple_p (bitpos, BITS_PER_UNIT)))
 	      ext_mode = int_mode_for_size (bitsize, 1).else_blk ();
 
 	    if (ext_mode == BLKmode)
@@ -10774,20 +10772,19 @@ expand_expr_real_1 (tree exp, rtx target
 
 		/* ??? Unlike the similar test a few lines below, this one is
 		   very likely obsolete.  */
-		if (bitsize == 0)
+		if (known_zero (bitsize))
 		  return target;
 
 		/* In this case, BITPOS must start at a byte boundary and
 		   TARGET, if specified, must be a MEM.  */
 		gcc_assert (MEM_P (op0)
-			    && (!target || MEM_P (target))
-			    && !(bitpos % BITS_PER_UNIT));
+			    && (!target || MEM_P (target)));
 
+		bytepos = exact_div (bitpos, BITS_PER_UNIT);
+		poly_int64 bytesize = bits_to_bytes_round_up (bitsize);
 		emit_block_move (target,
-				 adjust_address (op0, VOIDmode,
-						 bitpos / BITS_PER_UNIT),
-				 GEN_INT ((bitsize + BITS_PER_UNIT - 1)
-					  / BITS_PER_UNIT),
+				 adjust_address (op0, VOIDmode, bytepos),
+				 gen_int_mode (bytesize, Pmode),
 				 (modifier == EXPAND_STACK_PARM
 				  ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
 
@@ -10798,7 +10795,7 @@ expand_expr_real_1 (tree exp, rtx target
 	       with SHIFT_COUNT_TRUNCATED == 0 and garbage otherwise.  Always
 	       return 0 for the sake of consistency, as reading a zero-sized
 	       bitfield is valid in Ada and the value is fully specified.  */
-	    if (bitsize == 0)
+	    if (known_zero (bitsize))
 	      return const0_rtx;
 
 	    op0 = validize_mem (op0);
@@ -10831,7 +10828,8 @@ expand_expr_real_1 (tree exp, rtx target
 	      {
 		HOST_WIDE_INT size = GET_MODE_BITSIZE (op0_mode);
 
-		if (bitsize < size
+		gcc_checking_assert (must_le (bitsize, size));
+		if (may_lt (bitsize, size)
 		    && reversep ? !BYTES_BIG_ENDIAN : BYTES_BIG_ENDIAN)
 		  op0 = expand_shift (LSHIFT_EXPR, op0_mode, op0,
 				      size - bitsize, op0, 1);
@@ -10863,11 +10861,12 @@ expand_expr_real_1 (tree exp, rtx target
 	  mode1 = BLKmode;
 
 	/* Get a reference to just this component.  */
+	bytepos = bits_to_bytes_round_down (bitpos);
 	if (modifier == EXPAND_CONST_ADDRESS
 	    || modifier == EXPAND_SUM || modifier == EXPAND_INITIALIZER)
-	  op0 = adjust_address_nv (op0, mode1, bitpos / BITS_PER_UNIT);
+	  op0 = adjust_address_nv (op0, mode1, bytepos);
 	else
-	  op0 = adjust_address (op0, mode1, bitpos / BITS_PER_UNIT);
+	  op0 = adjust_address (op0, mode1, bytepos);
 
 	if (op0 == orig_op0)
 	  op0 = copy_rtx (op0);
@@ -10950,12 +10949,12 @@ expand_expr_real_1 (tree exp, rtx target
       /* If we are converting to BLKmode, try to avoid an intermediate
 	 temporary by fetching an inner memory reference.  */
       if (mode == BLKmode
-	  && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
+	  && poly_int_tree_p (TYPE_SIZE (type))
 	  && TYPE_MODE (TREE_TYPE (treeop0)) != BLKmode
 	  && handled_component_p (treeop0))
       {
 	machine_mode mode1;
-	HOST_WIDE_INT bitsize, bitpos;
+	poly_int64 bitsize, bitpos, bytepos;
 	tree offset;
 	int unsignedp, reversep, volatilep = 0;
 	tree tem
@@ -10965,10 +10964,10 @@ expand_expr_real_1 (tree exp, rtx target
 
 	/* ??? We should work harder and deal with non-zero offsets.  */
 	if (!offset
-	    && (bitpos % BITS_PER_UNIT) == 0
+	    && multiple_p (bitpos, BITS_PER_UNIT, &bytepos)
 	    && !reversep
-	    && bitsize >= 0
-	    && compare_tree_int (TYPE_SIZE (type), bitsize) == 0)
+	    && known_size_p (bitsize)
+	    && must_eq (wi::to_poly_offset (TYPE_SIZE (type)), bitsize))
 	  {
 	    /* See the normal_inner_ref case for the rationale.  */
 	    orig_op0
@@ -10990,9 +10989,9 @@ expand_expr_real_1 (tree exp, rtx target
 		if (modifier == EXPAND_CONST_ADDRESS
 		    || modifier == EXPAND_SUM
 		    || modifier == EXPAND_INITIALIZER)
-		  op0 = adjust_address_nv (op0, mode, bitpos / BITS_PER_UNIT);
+		  op0 = adjust_address_nv (op0, mode, bytepos);
 		else
-		  op0 = adjust_address (op0, mode, bitpos / BITS_PER_UNIT);
+		  op0 = adjust_address (op0, mode, bytepos);
 
 		if (op0 == orig_op0)
 		  op0 = copy_rtx (op0);
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 17:18:44.902299961 +0100
+++ gcc/fold-const.c	2017-10-23 17:18:47.662057360 +0100
@@ -4042,7 +4042,7 @@ distribute_real_division (location_t loc
 
 static tree
 make_bit_field_ref (location_t loc, tree inner, tree orig_inner, tree type,
-		    HOST_WIDE_INT bitsize, HOST_WIDE_INT bitpos,
+		    HOST_WIDE_INT bitsize, poly_int64 bitpos,
 		    int unsignedp, int reversep)
 {
   tree result, bftype;
@@ -4052,7 +4052,7 @@ make_bit_field_ref (location_t loc, tree
     {
       tree ninner = TREE_OPERAND (orig_inner, 0);
       machine_mode nmode;
-      HOST_WIDE_INT nbitsize, nbitpos;
+      poly_int64 nbitsize, nbitpos;
       tree noffset;
       int nunsignedp, nreversep, nvolatilep = 0;
       tree base = get_inner_reference (ninner, &nbitsize, &nbitpos,
@@ -4060,9 +4060,7 @@ make_bit_field_ref (location_t loc, tree
 				       &nreversep, &nvolatilep);
       if (base == inner
 	  && noffset == NULL_TREE
-	  && nbitsize >= bitsize
-	  && nbitpos <= bitpos
-	  && bitpos + bitsize <= nbitpos + nbitsize
+	  && known_subrange_p (bitpos, bitsize, nbitpos, nbitsize)
 	  && !reversep
 	  && !nreversep
 	  && !nvolatilep)
@@ -4078,7 +4076,7 @@ make_bit_field_ref (location_t loc, tree
 			 build_fold_addr_expr (inner),
 			 build_int_cst (ptr_type_node, 0));
 
-  if (bitpos == 0 && !reversep)
+  if (known_zero (bitpos) && !reversep)
     {
       tree size = TYPE_SIZE (TREE_TYPE (inner));
       if ((INTEGRAL_TYPE_P (TREE_TYPE (inner))
@@ -4127,7 +4125,8 @@ make_bit_field_ref (location_t loc, tree
 optimize_bit_field_compare (location_t loc, enum tree_code code,
 			    tree compare_type, tree lhs, tree rhs)
 {
-  HOST_WIDE_INT lbitpos, lbitsize, rbitpos, rbitsize, nbitpos, nbitsize;
+  poly_int64 plbitpos, plbitsize, rbitpos, rbitsize;
+  HOST_WIDE_INT lbitpos, lbitsize, nbitpos, nbitsize;
   tree type = TREE_TYPE (lhs);
   tree unsigned_type;
   int const_p = TREE_CODE (rhs) == INTEGER_CST;
@@ -4141,14 +4140,20 @@ optimize_bit_field_compare (location_t l
   tree offset;
 
   /* Get all the information about the extractions being done.  If the bit size
-     if the same as the size of the underlying object, we aren't doing an
+     is the same as the size of the underlying object, we aren't doing an
      extraction at all and so can do nothing.  We also don't want to
      do anything if the inner expression is a PLACEHOLDER_EXPR since we
      then will no longer be able to replace it.  */
-  linner = get_inner_reference (lhs, &lbitsize, &lbitpos, &offset, &lmode,
+  linner = get_inner_reference (lhs, &plbitsize, &plbitpos, &offset, &lmode,
 				&lunsignedp, &lreversep, &lvolatilep);
-  if (linner == lhs || lbitsize == GET_MODE_BITSIZE (lmode) || lbitsize < 0
-      || offset != 0 || TREE_CODE (linner) == PLACEHOLDER_EXPR || lvolatilep)
+  if (linner == lhs
+      || !known_size_p (plbitsize)
+      || !plbitsize.is_constant (&lbitsize)
+      || !plbitpos.is_constant (&lbitpos)
+      || lbitsize == GET_MODE_BITSIZE (lmode)
+      || offset != 0
+      || TREE_CODE (linner) == PLACEHOLDER_EXPR
+      || lvolatilep)
     return 0;
 
   if (const_p)
@@ -4161,9 +4166,14 @@ optimize_bit_field_compare (location_t l
        = get_inner_reference (rhs, &rbitsize, &rbitpos, &offset, &rmode,
 			      &runsignedp, &rreversep, &rvolatilep);
 
-     if (rinner == rhs || lbitpos != rbitpos || lbitsize != rbitsize
-	 || lunsignedp != runsignedp || lreversep != rreversep || offset != 0
-	 || TREE_CODE (rinner) == PLACEHOLDER_EXPR || rvolatilep)
+     if (rinner == rhs
+	 || may_ne (lbitpos, rbitpos)
+	 || may_ne (lbitsize, rbitsize)
+	 || lunsignedp != runsignedp
+	 || lreversep != rreversep
+	 || offset != 0
+	 || TREE_CODE (rinner) == PLACEHOLDER_EXPR
+	 || rvolatilep)
        return 0;
    }
 
@@ -4172,7 +4182,6 @@ optimize_bit_field_compare (location_t l
   poly_uint64 bitend = 0;
   if (TREE_CODE (lhs) == COMPONENT_REF)
     {
-      poly_int64 plbitpos;
       get_bit_range (&bitstart, &bitend, lhs, &plbitpos, &offset);
       if (!plbitpos.is_constant (&lbitpos) || offset != NULL_TREE)
 	return 0;
@@ -4340,10 +4349,14 @@ decode_field_reference (location_t loc,
 	return 0;
     }
 
-  inner = get_inner_reference (exp, pbitsize, pbitpos, &offset, pmode,
-			       punsignedp, preversep, pvolatilep);
+  poly_int64 poly_bitsize, poly_bitpos;
+  inner = get_inner_reference (exp, &poly_bitsize, &poly_bitpos, &offset,
+			       pmode, punsignedp, preversep, pvolatilep);
   if ((inner == exp && and_mask == 0)
-      || *pbitsize < 0 || offset != 0
+      || !poly_bitsize.is_constant (pbitsize)
+      || !poly_bitpos.is_constant (pbitpos)
+      || *pbitsize < 0
+      || offset != 0
       || TREE_CODE (inner) == PLACEHOLDER_EXPR
       /* Reject out-of-bound accesses (PR79731).  */
       || (! AGGREGATE_TYPE_P (TREE_TYPE (inner))
@@ -7915,7 +7928,7 @@ fold_unary_loc (location_t loc, enum tre
 	  && POINTER_TYPE_P (type)
 	  && handled_component_p (TREE_OPERAND (op0, 0)))
         {
-	  HOST_WIDE_INT bitsize, bitpos;
+	  poly_int64 bitsize, bitpos;
 	  tree offset;
 	  machine_mode mode;
 	  int unsignedp, reversep, volatilep;
@@ -7926,7 +7939,8 @@ fold_unary_loc (location_t loc, enum tre
 	  /* If the reference was to a (constant) zero offset, we can use
 	     the address of the base if it has the same base type
 	     as the result type and the pointer type is unqualified.  */
-	  if (! offset && bitpos == 0
+	  if (!offset
+	      && known_zero (bitpos)
 	      && (TYPE_MAIN_VARIANT (TREE_TYPE (type))
 		  == TYPE_MAIN_VARIANT (TREE_TYPE (base)))
 	      && TYPE_QUALS (type) == TYPE_UNQUALIFIED)
@@ -14450,12 +14464,12 @@ round_down_loc (location_t loc, tree val
 
 static tree
 split_address_to_core_and_offset (tree exp,
-				  HOST_WIDE_INT *pbitpos, tree *poffset)
+				  poly_int64_pod *pbitpos, tree *poffset)
 {
   tree core;
   machine_mode mode;
   int unsignedp, reversep, volatilep;
-  HOST_WIDE_INT bitsize;
+  poly_int64 bitsize;
   location_t loc = EXPR_LOCATION (exp);
 
   if (TREE_CODE (exp) == ADDR_EXPR)
@@ -14471,16 +14485,14 @@ split_address_to_core_and_offset (tree e
       STRIP_NOPS (core);
       *pbitpos = 0;
       *poffset = TREE_OPERAND (exp, 1);
-      if (TREE_CODE (*poffset) == INTEGER_CST)
+      if (poly_int_tree_p (*poffset))
 	{
-	  offset_int tem = wi::sext (wi::to_offset (*poffset),
-				     TYPE_PRECISION (TREE_TYPE (*poffset)));
+	  poly_offset_int tem
+	    = wi::sext (wi::to_poly_offset (*poffset),
+			TYPE_PRECISION (TREE_TYPE (*poffset)));
 	  tem <<= LOG2_BITS_PER_UNIT;
-	  if (wi::fits_shwi_p (tem))
-	    {
-	      *pbitpos = tem.to_shwi ();
-	      *poffset = NULL_TREE;
-	    }
+	  if (tem.to_shwi (pbitpos))
+	    *poffset = NULL_TREE;
 	}
     }
   else
@@ -14497,17 +14509,18 @@ split_address_to_core_and_offset (tree e
    otherwise.  If they do, E1 - E2 is stored in *DIFF.  */
 
 bool
-ptr_difference_const (tree e1, tree e2, HOST_WIDE_INT *diff)
+ptr_difference_const (tree e1, tree e2, poly_int64_pod *diff)
 {
   tree core1, core2;
-  HOST_WIDE_INT bitpos1, bitpos2;
+  poly_int64 bitpos1, bitpos2;
   tree toffset1, toffset2, tdiff, type;
 
   core1 = split_address_to_core_and_offset (e1, &bitpos1, &toffset1);
   core2 = split_address_to_core_and_offset (e2, &bitpos2, &toffset2);
 
-  if (bitpos1 % BITS_PER_UNIT != 0
-      || bitpos2 % BITS_PER_UNIT != 0
+  poly_int64 bytepos1, bytepos2;
+  if (!multiple_p (bitpos1, BITS_PER_UNIT, &bytepos1)
+      || !multiple_p (bitpos2, BITS_PER_UNIT, &bytepos2)
       || !operand_equal_p (core1, core2, 0))
     return false;
 
@@ -14532,7 +14545,7 @@ ptr_difference_const (tree e1, tree e2,
   else
     *diff = 0;
 
-  *diff += (bitpos1 - bitpos2) / BITS_PER_UNIT;
+  *diff += bytepos1 - bytepos2;
   return true;
 }
 
Index: gcc/asan.c
===================================================================
--- gcc/asan.c	2017-10-23 17:11:40.240937549 +0100
+++ gcc/asan.c	2017-10-23 17:18:47.654058063 +0100
@@ -2068,7 +2068,7 @@ instrument_derefs (gimple_stmt_iterator
   if (size_in_bytes <= 0)
     return;
 
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, bitpos;
   tree offset;
   machine_mode mode;
   int unsignedp, reversep, volatilep = 0;
@@ -2086,19 +2086,19 @@ instrument_derefs (gimple_stmt_iterator
       return;
     }
 
-  if (bitpos % BITS_PER_UNIT
-      || bitsize != size_in_bytes * BITS_PER_UNIT)
+  if (!multiple_p (bitpos, BITS_PER_UNIT)
+      || may_ne (bitsize, size_in_bytes * BITS_PER_UNIT))
     return;
 
   if (VAR_P (inner) && DECL_HARD_REGISTER (inner))
     return;
 
+  poly_int64 decl_size;
   if (VAR_P (inner)
       && offset == NULL_TREE
-      && bitpos >= 0
       && DECL_SIZE (inner)
-      && tree_fits_shwi_p (DECL_SIZE (inner))
-      && bitpos + bitsize <= tree_to_shwi (DECL_SIZE (inner)))
+      && poly_int_tree_p (DECL_SIZE (inner), &decl_size)
+      && known_subrange_p (bitpos, bitsize, 0, decl_size))
     {
       if (DECL_THREAD_LOCAL_P (inner))
 	return;
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2017-10-23 17:11:40.283017967 +0100
+++ gcc/config/mips/mips.c	2017-10-23 17:18:47.656057887 +0100
@@ -17613,7 +17613,7 @@ r10k_safe_address_p (rtx x, rtx_insn *in
 static bool
 r10k_safe_mem_expr_p (tree expr, unsigned HOST_WIDE_INT offset)
 {
-  HOST_WIDE_INT bitoffset, bitsize;
+  poly_int64 bitoffset, bitsize;
   tree inner, var_offset;
   machine_mode mode;
   int unsigned_p, reverse_p, volatile_p;
Index: gcc/dbxout.c
===================================================================
--- gcc/dbxout.c	2017-10-23 17:11:39.968416747 +0100
+++ gcc/dbxout.c	2017-10-23 17:18:47.657057799 +0100
@@ -2469,7 +2469,7 @@ dbxout_expand_expr (tree expr)
     case BIT_FIELD_REF:
       {
 	machine_mode mode;
-	HOST_WIDE_INT bitsize, bitpos;
+	poly_int64 bitsize, bitpos;
 	tree offset, tem;
 	int unsignedp, reversep, volatilep = 0;
 	rtx x;
@@ -2486,8 +2486,8 @@ dbxout_expand_expr (tree expr)
 	      return NULL;
 	    x = adjust_address_nv (x, mode, tree_to_shwi (offset));
 	  }
-	if (bitpos != 0)
-	  x = adjust_address_nv (x, mode, bitpos / BITS_PER_UNIT);
+	if (maybe_nonzero (bitpos))
+	  x = adjust_address_nv (x, mode, bits_to_bytes_round_down (bitpos));
 
 	return x;
       }
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 17:16:59.703267951 +0100
+++ gcc/dwarf2out.c	2017-10-23 17:18:47.659057624 +0100
@@ -16624,7 +16624,7 @@ loc_list_for_address_of_addr_expr_of_ind
 						   loc_descr_context *context)
 {
   tree obj, offset;
-  HOST_WIDE_INT bitsize, bitpos, bytepos;
+  poly_int64 bitsize, bitpos, bytepos;
   machine_mode mode;
   int unsignedp, reversep, volatilep = 0;
   dw_loc_list_ref list_ret = NULL, list_ret1 = NULL;
@@ -16633,7 +16633,7 @@ loc_list_for_address_of_addr_expr_of_ind
 			     &bitsize, &bitpos, &offset, &mode,
 			     &unsignedp, &reversep, &volatilep);
   STRIP_NOPS (obj);
-  if (bitpos % BITS_PER_UNIT)
+  if (!multiple_p (bitpos, BITS_PER_UNIT, &bytepos))
     {
       expansion_failed (loc, NULL_RTX, "bitfield access");
       return 0;
@@ -16644,7 +16644,7 @@ loc_list_for_address_of_addr_expr_of_ind
 			NULL_RTX, "no indirect ref in inner refrence");
       return 0;
     }
-  if (!offset && !bitpos)
+  if (!offset && known_zero (bitpos))
     list_ret = loc_list_from_tree (TREE_OPERAND (obj, 0), toplev ? 2 : 1,
 				   context);
   else if (toplev
@@ -16666,12 +16666,11 @@ loc_list_for_address_of_addr_expr_of_ind
 	  add_loc_descr_to_each (list_ret,
 				 new_loc_descr (DW_OP_plus, 0, 0));
 	}
-      bytepos = bitpos / BITS_PER_UNIT;
-      if (bytepos > 0)
+      HOST_WIDE_INT value;
+      if (bytepos.is_constant (&value) && value > 0)
 	add_loc_descr_to_each (list_ret,
-			       new_loc_descr (DW_OP_plus_uconst,
-					      bytepos, 0));
-      else if (bytepos < 0)
+			       new_loc_descr (DW_OP_plus_uconst, value, 0));
+      else if (maybe_nonzero (bytepos))
 	loc_list_plus_const (list_ret, bytepos);
       add_loc_descr_to_each (list_ret,
 			     new_loc_descr (DW_OP_stack_value, 0, 0));
@@ -17641,7 +17640,7 @@ loc_list_from_tree_1 (tree loc, int want
     case IMAGPART_EXPR:
       {
 	tree obj, offset;
-	HOST_WIDE_INT bitsize, bitpos, bytepos;
+	poly_int64 bitsize, bitpos, bytepos;
 	machine_mode mode;
 	int unsignedp, reversep, volatilep = 0;
 
@@ -17652,13 +17651,15 @@ loc_list_from_tree_1 (tree loc, int want
 
 	list_ret = loc_list_from_tree_1 (obj,
 					 want_address == 2
-					 && !bitpos && !offset ? 2 : 1,
+					 && known_zero (bitpos)
+					 && !offset ? 2 : 1,
 					 context);
 	/* TODO: We can extract value of the small expression via shifting even
 	   for nonzero bitpos.  */
 	if (list_ret == 0)
 	  return 0;
-	if (bitpos % BITS_PER_UNIT != 0 || bitsize % BITS_PER_UNIT != 0)
+	if (!multiple_p (bitpos, BITS_PER_UNIT, &bytepos)
+	    || !multiple_p (bitsize, BITS_PER_UNIT))
 	  {
 	    expansion_failed (loc, NULL_RTX,
 			      "bitfield access");
@@ -17677,10 +17678,11 @@ loc_list_from_tree_1 (tree loc, int want
 	    add_loc_descr_to_each (list_ret, new_loc_descr (DW_OP_plus, 0, 0));
 	  }
 
-	bytepos = bitpos / BITS_PER_UNIT;
-	if (bytepos > 0)
-	  add_loc_descr_to_each (list_ret, new_loc_descr (DW_OP_plus_uconst, bytepos, 0));
-	else if (bytepos < 0)
+	HOST_WIDE_INT value;
+	if (bytepos.is_constant (&value) && value > 0)
+	  add_loc_descr_to_each (list_ret, new_loc_descr (DW_OP_plus_uconst,
+							  value, 0));
+	else if (maybe_nonzero (bytepos))
 	  loc_list_plus_const (list_ret, bytepos);
 
 	have_address = 1;
@@ -19212,8 +19214,9 @@ fortran_common (tree decl, HOST_WIDE_INT
 {
   tree val_expr, cvar;
   machine_mode mode;
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, bitpos;
   tree offset;
+  HOST_WIDE_INT cbitpos;
   int unsignedp, reversep, volatilep = 0;
 
   /* If the decl isn't a VAR_DECL, or if it isn't static, or if
@@ -19236,7 +19239,10 @@ fortran_common (tree decl, HOST_WIDE_INT
   if (cvar == NULL_TREE
       || !VAR_P (cvar)
       || DECL_ARTIFICIAL (cvar)
-      || !TREE_PUBLIC (cvar))
+      || !TREE_PUBLIC (cvar)
+      /* We don't expect to have to cope with variable offsets,
+	 since at present all static data must have a constant size.  */
+      || !bitpos.is_constant (&cbitpos))
     return NULL_TREE;
 
   *value = 0;
@@ -19246,8 +19252,8 @@ fortran_common (tree decl, HOST_WIDE_INT
 	return NULL_TREE;
       *value = tree_to_shwi (offset);
     }
-  if (bitpos != 0)
-    *value += bitpos / BITS_PER_UNIT;
+  if (cbitpos != 0)
+    *value += cbitpos / BITS_PER_UNIT;
 
   return cvar;
 }
Index: gcc/gimple-laddress.c
===================================================================
--- gcc/gimple-laddress.c	2017-10-23 17:07:40.354337067 +0100
+++ gcc/gimple-laddress.c	2017-10-23 17:18:47.662057360 +0100
@@ -100,19 +100,19 @@ pass_laddress::execute (function *fun)
 	  */
 
 	  tree expr = gimple_assign_rhs1 (stmt);
-	  HOST_WIDE_INT bitsize, bitpos;
+	  poly_int64 bitsize, bitpos;
 	  tree base, offset;
 	  machine_mode mode;
 	  int volatilep = 0, reversep, unsignedp = 0;
 	  base = get_inner_reference (TREE_OPERAND (expr, 0), &bitsize,
 				      &bitpos, &offset, &mode, &unsignedp,
 				      &reversep, &volatilep);
-	  gcc_assert (base != NULL_TREE && (bitpos % BITS_PER_UNIT) == 0);
+	  gcc_assert (base != NULL_TREE);
+	  poly_int64 bytepos = exact_div (bitpos, BITS_PER_UNIT);
 	  if (offset != NULL_TREE)
 	    {
-	      if (bitpos != 0)
-		offset = size_binop (PLUS_EXPR, offset,
-				     size_int (bitpos / BITS_PER_UNIT));
+	      if (maybe_nonzero (bytepos))
+		offset = size_binop (PLUS_EXPR, offset, size_int (bytepos));
 	      offset = force_gimple_operand_gsi (&gsi, offset, true, NULL,
 						 true, GSI_SAME_STMT);
 	      base = build_fold_addr_expr (base);
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	2017-10-23 17:11:40.246949037 +0100
+++ gcc/gimplify.c	2017-10-23 17:18:47.663057272 +0100
@@ -7865,7 +7865,7 @@ gimplify_scan_omp_clauses (tree *list_p,
 		    }
 
 		  tree offset;
-		  HOST_WIDE_INT bitsize, bitpos;
+		  poly_int64 bitsize, bitpos;
 		  machine_mode mode;
 		  int unsignedp, reversep, volatilep = 0;
 		  tree base = OMP_CLAUSE_DECL (c);
@@ -7886,7 +7886,7 @@ gimplify_scan_omp_clauses (tree *list_p,
 		    base = TREE_OPERAND (base, 0);
 		  gcc_assert (base == decl
 			      && (offset == NULL_TREE
-				  || TREE_CODE (offset) == INTEGER_CST));
+				  || poly_int_tree_p (offset)));
 
 		  splay_tree_node n
 		    = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
@@ -7965,13 +7965,13 @@ gimplify_scan_omp_clauses (tree *list_p,
 		      tree *sc = NULL, *scp = NULL;
 		      if (GOMP_MAP_ALWAYS_P (OMP_CLAUSE_MAP_KIND (c)) || ptr)
 			n->value |= GOVD_SEEN;
-		      offset_int o1, o2;
+		      poly_offset_int o1, o2;
 		      if (offset)
-			o1 = wi::to_offset (offset);
+			o1 = wi::to_poly_offset (offset);
 		      else
 			o1 = 0;
-		      if (bitpos)
-			o1 = o1 + bitpos / BITS_PER_UNIT;
+		      if (maybe_nonzero (bitpos))
+			o1 += bits_to_bytes_round_down (bitpos);
 		      sc = &OMP_CLAUSE_CHAIN (*osc);
 		      if (*sc != c
 			  && (OMP_CLAUSE_MAP_KIND (*sc)
@@ -7990,7 +7990,7 @@ gimplify_scan_omp_clauses (tree *list_p,
 			else
 			  {
 			    tree offset2;
-			    HOST_WIDE_INT bitsize2, bitpos2;
+			    poly_int64 bitsize2, bitpos2;
 			    base = OMP_CLAUSE_DECL (*sc);
 			    if (TREE_CODE (base) == ARRAY_REF)
 			      {
@@ -8026,7 +8026,7 @@ gimplify_scan_omp_clauses (tree *list_p,
 			    if (scp)
 			      continue;
 			    gcc_assert (offset == NULL_TREE
-					|| TREE_CODE (offset) == INTEGER_CST);
+					|| poly_int_tree_p (offset));
 			    tree d1 = OMP_CLAUSE_DECL (*sc);
 			    tree d2 = OMP_CLAUSE_DECL (c);
 			    while (TREE_CODE (d1) == ARRAY_REF)
@@ -8056,13 +8056,13 @@ gimplify_scan_omp_clauses (tree *list_p,
 				break;
 			      }
 			    if (offset2)
-			      o2 = wi::to_offset (offset2);
+			      o2 = wi::to_poly_offset (offset2);
 			    else
 			      o2 = 0;
-			    if (bitpos2)
-			      o2 = o2 + bitpos2 / BITS_PER_UNIT;
-			    if (wi::ltu_p (o1, o2)
-				|| (wi::eq_p (o1, o2) && bitpos < bitpos2))
+			    o2 += bits_to_bytes_round_down (bitpos2);
+			    if (may_lt (o1, o2)
+				|| (must_eq (o1, 2)
+				    && may_lt (bitpos, bitpos2)))
 			      {
 				if (ptr)
 				  scp = sc;
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 17:16:50.376527466 +0100
+++ gcc/simplify-rtx.c	2017-10-23 17:18:47.665057096 +0100
@@ -308,23 +308,19 @@ delegitimize_mem_from_attrs (rtx x)
 	case IMAGPART_EXPR:
 	case VIEW_CONVERT_EXPR:
 	  {
-	    HOST_WIDE_INT bitsize, bitpos;
+	    poly_int64 bitsize, bitpos, bytepos, toffset_val = 0;
 	    tree toffset;
 	    int unsignedp, reversep, volatilep = 0;
 
 	    decl
 	      = get_inner_reference (decl, &bitsize, &bitpos, &toffset, &mode,
 				     &unsignedp, &reversep, &volatilep);
-	    if (bitsize != GET_MODE_BITSIZE (mode)
-		|| (bitpos % BITS_PER_UNIT)
-		|| (toffset && !tree_fits_shwi_p (toffset)))
+	    if (may_ne (bitsize, GET_MODE_BITSIZE (mode))
+		|| !multiple_p (bitpos, BITS_PER_UNIT, &bytepos)
+		|| (toffset && !poly_int_tree_p (toffset, &toffset_val)))
 	      decl = NULL;
 	    else
-	      {
-		offset += bitpos / BITS_PER_UNIT;
-		if (toffset)
-		  offset += tree_to_shwi (toffset);
-	      }
+	      offset += bytepos + toffset_val;
 	    break;
 	  }
 	}
Index: gcc/tree-affine.c
===================================================================
--- gcc/tree-affine.c	2017-10-23 17:18:30.290584430 +0100
+++ gcc/tree-affine.c	2017-10-23 17:18:47.665057096 +0100
@@ -267,7 +267,7 @@ tree_to_aff_combination (tree expr, tree
   aff_tree tmp;
   enum tree_code code;
   tree cst, core, toffset;
-  HOST_WIDE_INT bitpos, bitsize;
+  poly_int64 bitpos, bitsize, bytepos;
   machine_mode mode;
   int unsignedp, reversep, volatilep;
 
@@ -324,12 +324,13 @@ tree_to_aff_combination (tree expr, tree
       core = get_inner_reference (TREE_OPERAND (expr, 0), &bitsize, &bitpos,
 				  &toffset, &mode, &unsignedp, &reversep,
 				  &volatilep);
-      if (bitpos % BITS_PER_UNIT != 0)
+      if (!multiple_p (bitpos, BITS_PER_UNIT, &bytepos))
 	break;
-      aff_combination_const (comb, type, bitpos / BITS_PER_UNIT);
+      aff_combination_const (comb, type, bytepos);
       if (TREE_CODE (core) == MEM_REF)
 	{
-	  aff_combination_add_cst (comb, wi::to_widest (TREE_OPERAND (core, 1)));
+	  tree mem_offset = TREE_OPERAND (core, 1);
+	  aff_combination_add_cst (comb, wi::to_poly_widest (mem_offset));
 	  core = TREE_OPERAND (core, 0);
 	}
       else
@@ -929,7 +930,7 @@ debug_aff (aff_tree *val)
 tree
 get_inner_reference_aff (tree ref, aff_tree *addr, poly_widest_int *size)
 {
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, bitpos;
   tree toff;
   machine_mode mode;
   int uns, rev, vol;
@@ -948,10 +949,10 @@ get_inner_reference_aff (tree ref, aff_t
       aff_combination_add (addr, &tmp);
     }
 
-  aff_combination_const (&tmp, sizetype, bitpos / BITS_PER_UNIT);
+  aff_combination_const (&tmp, sizetype, bits_to_bytes_round_down (bitpos));
   aff_combination_add (addr, &tmp);
 
-  *size = (bitsize + BITS_PER_UNIT - 1) / BITS_PER_UNIT;
+  *size = bits_to_bytes_round_up (bitsize);
 
   return base;
 }
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2017-10-23 17:18:30.290584430 +0100
+++ gcc/tree-data-ref.c	2017-10-23 17:18:47.666057008 +0100
@@ -627,7 +627,7 @@ split_constant_offset_1 (tree type, tree
     case ADDR_EXPR:
       {
 	tree base, poffset;
-	HOST_WIDE_INT pbitsize, pbitpos;
+	poly_int64 pbitsize, pbitpos, pbytepos;
 	machine_mode pmode;
 	int punsignedp, preversep, pvolatilep;
 
@@ -636,10 +636,10 @@ split_constant_offset_1 (tree type, tree
 	  = get_inner_reference (op0, &pbitsize, &pbitpos, &poffset, &pmode,
 				 &punsignedp, &preversep, &pvolatilep);
 
-	if (pbitpos % BITS_PER_UNIT != 0)
+	if (!multiple_p (pbitpos, BITS_PER_UNIT, &pbytepos))
 	  return false;
 	base = build_fold_addr_expr (base);
-	off0 = ssize_int (pbitpos / BITS_PER_UNIT);
+	off0 = ssize_int (pbytepos);
 
 	if (poffset)
 	  {
@@ -789,7 +789,7 @@ canonicalize_base_object_address (tree a
 dr_analyze_innermost (innermost_loop_behavior *drb, tree ref,
 		      struct loop *loop)
 {
-  HOST_WIDE_INT pbitsize, pbitpos;
+  poly_int64 pbitsize, pbitpos;
   tree base, poffset;
   machine_mode pmode;
   int punsignedp, preversep, pvolatilep;
@@ -804,7 +804,8 @@ dr_analyze_innermost (innermost_loop_beh
 			      &punsignedp, &preversep, &pvolatilep);
   gcc_assert (base != NULL_TREE);
 
-  if (pbitpos % BITS_PER_UNIT != 0)
+  poly_int64 pbytepos;
+  if (!multiple_p (pbitpos, BITS_PER_UNIT, &pbytepos))
     {
       if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "failed: bit offset alignment.\n");
@@ -885,7 +886,7 @@ dr_analyze_innermost (innermost_loop_beh
         }
     }
 
-  init = ssize_int (pbitpos / BITS_PER_UNIT);
+  init = ssize_int (pbytepos);
 
   /* Subtract any constant component from the base and add it to INIT instead.
      Adjust the misalignment to reflect the amount we subtracted.  */
Index: gcc/tree-scalar-evolution.c
===================================================================
--- gcc/tree-scalar-evolution.c	2017-10-23 17:07:40.354337067 +0100
+++ gcc/tree-scalar-evolution.c	2017-10-23 17:18:47.666057008 +0100
@@ -1731,7 +1731,7 @@ interpret_rhs_expr (struct loop *loop, g
 	  || handled_component_p (TREE_OPERAND (rhs1, 0)))
         {
 	  machine_mode mode;
-	  HOST_WIDE_INT bitsize, bitpos;
+	  poly_int64 bitsize, bitpos;
 	  int unsignedp, reversep;
 	  int volatilep = 0;
 	  tree base, offset;
@@ -1770,11 +1770,9 @@ interpret_rhs_expr (struct loop *loop, g
 	      res = chrec_fold_plus (type, res, chrec2);
 	    }
 
-	  if (bitpos != 0)
+	  if (maybe_nonzero (bitpos))
 	    {
-	      gcc_assert ((bitpos % BITS_PER_UNIT) == 0);
-
-	      unitpos = size_int (bitpos / BITS_PER_UNIT);
+	      unitpos = size_int (exact_div (bitpos, BITS_PER_UNIT));
 	      chrec3 = analyze_scalar_evolution (loop, unitpos);
 	      chrec3 = chrec_convert (TREE_TYPE (unitpos), chrec3, at_stmt);
 	      chrec3 = instantiate_parameters (loop, chrec3);
Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c	2017-10-23 17:17:01.433034358 +0100
+++ gcc/tree-sra.c	2017-10-23 17:18:47.667056920 +0100
@@ -5389,12 +5389,12 @@ ipa_sra_check_caller (struct cgraph_node
 	      continue;
 
 	  tree offset;
-	  HOST_WIDE_INT bitsize, bitpos;
+	  poly_int64 bitsize, bitpos;
 	  machine_mode mode;
 	  int unsignedp, reversep, volatilep = 0;
 	  get_inner_reference (arg, &bitsize, &bitpos, &offset, &mode,
 			       &unsignedp, &reversep, &volatilep);
-	  if (bitpos % BITS_PER_UNIT)
+	  if (!multiple_p (bitpos, BITS_PER_UNIT))
 	    {
 	      iscc->bad_arg_alignment = true;
 	      return true;
Index: gcc/tree-ssa-math-opts.c
===================================================================
--- gcc/tree-ssa-math-opts.c	2017-10-23 17:17:04.541614564 +0100
+++ gcc/tree-ssa-math-opts.c	2017-10-23 17:18:47.667056920 +0100
@@ -2105,7 +2105,7 @@ find_bswap_or_nop_load (gimple *stmt, tr
 {
   /* Leaf node is an array or component ref. Memorize its base and
      offset from base to compare to other such leaf node.  */
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, bitpos, bytepos;
   machine_mode mode;
   int unsignedp, reversep, volatilep;
   tree offset, base_addr;
@@ -2153,9 +2153,9 @@ find_bswap_or_nop_load (gimple *stmt, tr
       bitpos += bit_offset.to_shwi ();
     }
 
-  if (bitpos % BITS_PER_UNIT)
+  if (!multiple_p (bitpos, BITS_PER_UNIT, &bytepos))
     return false;
-  if (bitsize % BITS_PER_UNIT)
+  if (!multiple_p (bitsize, BITS_PER_UNIT))
     return false;
   if (reversep)
     return false;
@@ -2164,7 +2164,7 @@ find_bswap_or_nop_load (gimple *stmt, tr
     return false;
   n->base_addr = base_addr;
   n->offset = offset;
-  n->bytepos = bitpos / BITS_PER_UNIT;
+  n->bytepos = bytepos;
   n->alias_set = reference_alias_ptr_type (ref);
   n->vuse = gimple_vuse (stmt);
   return true;
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 17:11:40.250956696 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:18:47.668056833 +0100
@@ -3215,7 +3215,8 @@ vect_prune_runtime_alias_test_list (loop
 vect_check_gather_scatter (gimple *stmt, loop_vec_info loop_vinfo,
 			   gather_scatter_info *info)
 {
-  HOST_WIDE_INT scale = 1, pbitpos, pbitsize;
+  HOST_WIDE_INT scale = 1;
+  poly_int64 pbitpos, pbitsize;
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
@@ -3256,7 +3257,8 @@ vect_check_gather_scatter (gimple *stmt,
      that can be gimplified before the loop.  */
   base = get_inner_reference (base, &pbitsize, &pbitpos, &off, &pmode,
 			      &punsignedp, &reversep, &pvolatilep);
-  gcc_assert (base && (pbitpos % BITS_PER_UNIT) == 0 && !reversep);
+  gcc_assert (base && !reversep);
+  poly_int64 pbytepos = exact_div (pbitpos, BITS_PER_UNIT);
 
   if (TREE_CODE (base) == MEM_REF)
     {
@@ -3289,14 +3291,14 @@ vect_check_gather_scatter (gimple *stmt,
       if (!integer_zerop (off))
 	return false;
       off = base;
-      base = size_int (pbitpos / BITS_PER_UNIT);
+      base = size_int (pbytepos);
     }
   /* Otherwise put base + constant offset into the loop invariant BASE
      and continue with OFF.  */
   else
     {
       base = fold_convert (sizetype, base);
-      base = size_binop (PLUS_EXPR, base, size_int (pbitpos / BITS_PER_UNIT));
+      base = size_binop (PLUS_EXPR, base, size_int (pbytepos));
     }
 
   /* OFF at this point may be either a SSA_NAME or some tree expression
Index: gcc/ubsan.c
===================================================================
--- gcc/ubsan.c	2017-10-23 17:07:40.354337067 +0100
+++ gcc/ubsan.c	2017-10-23 17:18:47.669056745 +0100
@@ -1429,7 +1429,7 @@ maybe_instrument_pointer_overflow (gimpl
   if (!handled_component_p (t) && TREE_CODE (t) != MEM_REF)
     return;
 
-  HOST_WIDE_INT bitsize, bitpos, bytepos;
+  poly_int64 bitsize, bitpos, bytepos;
   tree offset;
   machine_mode mode;
   int volatilep = 0, reversep, unsignedp = 0;
@@ -1447,14 +1447,14 @@ maybe_instrument_pointer_overflow (gimpl
       /* If BASE is a fixed size automatic variable or
 	 global variable defined in the current TU and bitpos
 	 fits, don't instrument anything.  */
+      poly_int64 base_size;
       if (offset == NULL_TREE
-	  && bitpos > 0
+	  && maybe_nonzero (bitpos)
 	  && (VAR_P (base)
 	      || TREE_CODE (base) == PARM_DECL
 	      || TREE_CODE (base) == RESULT_DECL)
-	  && DECL_SIZE (base)
-	  && TREE_CODE (DECL_SIZE (base)) == INTEGER_CST
-	  && compare_tree_int (DECL_SIZE (base), bitpos) >= 0
+	  && poly_int_tree_p (DECL_SIZE (base), &base_size)
+	  && must_ge (base_size, bitpos)
 	  && (!is_global_var (base) || decl_binds_to_current_def_p (base)))
 	return;
     }
@@ -1475,8 +1475,8 @@ maybe_instrument_pointer_overflow (gimpl
 
   if (!POINTER_TYPE_P (TREE_TYPE (base)) && !DECL_P (base))
     return;
-  bytepos = bitpos / BITS_PER_UNIT;
-  if (offset == NULL_TREE && bytepos == 0 && moff == NULL_TREE)
+  bytepos = bits_to_bytes_round_down (bitpos);
+  if (offset == NULL_TREE && known_zero (bytepos) && moff == NULL_TREE)
     return;
 
   tree base_addr = base;
@@ -1484,7 +1484,7 @@ maybe_instrument_pointer_overflow (gimpl
     base_addr = build1 (ADDR_EXPR,
 			build_pointer_type (TREE_TYPE (base)), base);
   t = offset;
-  if (bytepos)
+  if (maybe_nonzero (bytepos))
     {
       if (t)
 	t = fold_build2 (PLUS_EXPR, TREE_TYPE (t), t,
@@ -1667,7 +1667,7 @@ instrument_bool_enum_load (gimple_stmt_i
     return;
 
   int modebitsize = GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (type));
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, bitpos;
   tree offset;
   machine_mode mode;
   int volatilep = 0, reversep, unsignedp = 0;
@@ -1676,8 +1676,8 @@ instrument_bool_enum_load (gimple_stmt_i
   tree utype = build_nonstandard_integer_type (modebitsize, 1);
 
   if ((VAR_P (base) && DECL_HARD_REGISTER (base))
-      || (bitpos % modebitsize) != 0
-      || bitsize != modebitsize
+      || !multiple_p (bitpos, modebitsize)
+      || may_ne (bitsize, modebitsize)
       || GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (utype)) != modebitsize
       || TREE_CODE (gimple_assign_lhs (stmt)) != SSA_NAME)
     return;
@@ -2086,15 +2086,15 @@ instrument_object_size (gimple_stmt_iter
   if (size_in_bytes <= 0)
     return;
 
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, bitpos;
   tree offset;
   machine_mode mode;
   int volatilep = 0, reversep, unsignedp = 0;
   tree inner = get_inner_reference (t, &bitsize, &bitpos, &offset, &mode,
 				    &unsignedp, &reversep, &volatilep);
 
-  if (bitpos % BITS_PER_UNIT != 0
-      || bitsize != size_in_bytes * BITS_PER_UNIT)
+  if (!multiple_p (bitpos, BITS_PER_UNIT)
+      || may_ne (bitsize, size_in_bytes * BITS_PER_UNIT))
     return;
 
   bool decl_p = DECL_P (inner);
Index: gcc/gimple-ssa-strength-reduction.c
===================================================================
--- gcc/gimple-ssa-strength-reduction.c	2017-10-23 17:11:40.244945208 +0100
+++ gcc/gimple-ssa-strength-reduction.c	2017-10-23 17:18:47.663057272 +0100
@@ -1031,7 +1031,7 @@ restructure_reference (tree *pbase, tree
 slsr_process_ref (gimple *gs)
 {
   tree ref_expr, base, offset, type;
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, bitpos;
   machine_mode mode;
   int unsignedp, reversep, volatilep;
   slsr_cand_t c;
@@ -1049,9 +1049,10 @@ slsr_process_ref (gimple *gs)
 
   base = get_inner_reference (ref_expr, &bitsize, &bitpos, &offset, &mode,
 			      &unsignedp, &reversep, &volatilep);
-  if (reversep)
+  HOST_WIDE_INT cbitpos;
+  if (reversep || !bitpos.is_constant (&cbitpos))
     return;
-  widest_int index = bitpos;
+  widest_int index = cbitpos;
 
   if (!restructure_reference (&base, &offset, &index, &type))
     return;
Index: gcc/hsa-gen.c
===================================================================
--- gcc/hsa-gen.c	2017-10-23 17:07:40.354337067 +0100
+++ gcc/hsa-gen.c	2017-10-23 17:18:47.664057184 +0100
@@ -1972,12 +1972,22 @@ gen_hsa_addr (tree ref, hsa_bb *hbb, HOS
     {
       machine_mode mode;
       int unsignedp, volatilep, preversep;
+      poly_int64 pbitsize, pbitpos;
+      tree new_ref;
 
-      ref = get_inner_reference (ref, &bitsize, &bitpos, &varoffset, &mode,
-				 &unsignedp, &preversep, &volatilep);
-
-      offset = bitpos;
-      offset = wi::rshift (offset, LOG2_BITS_PER_UNIT, SIGNED);
+      new_ref = get_inner_reference (ref, &pbitsize, &pbitpos, &varoffset,
+				     &mode, &unsignedp, &preversep,
+				     &volatilep);
+      /* When this isn't true, the switch below will report an
+	 appropriate error.  */
+      if (pbitsize.is_constant () && pbitpos.is_constant ())
+	{
+	  bitsize = pbitsize.to_constant ();
+	  bitpos = pbitpos.to_constant ();
+	  ref = new_ref;
+	  offset = bitpos;
+	  offset = wi::rshift (offset, LOG2_BITS_PER_UNIT, SIGNED);
+	}
     }
 
   switch (TREE_CODE (ref))
Index: gcc/sanopt.c
===================================================================
--- gcc/sanopt.c	2017-10-23 17:07:40.354337067 +0100
+++ gcc/sanopt.c	2017-10-23 17:18:47.665057096 +0100
@@ -459,7 +459,7 @@ record_ubsan_ptr_check_stmt (sanopt_ctx
 static bool
 maybe_optimize_ubsan_ptr_ifn (sanopt_ctx *ctx, gimple *stmt)
 {
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, pbitpos;
   machine_mode mode;
   int volatilep = 0, reversep, unsignedp = 0;
   tree offset;
@@ -483,9 +483,12 @@ maybe_optimize_ubsan_ptr_ifn (sanopt_ctx
     {
       base = TREE_OPERAND (base, 0);
 
-      base = get_inner_reference (base, &bitsize, &bitpos, &offset, &mode,
+      HOST_WIDE_INT bitpos;
+      base = get_inner_reference (base, &bitsize, &pbitpos, &offset, &mode,
 				  &unsignedp, &reversep, &volatilep);
-      if (offset == NULL_TREE && DECL_P (base))
+      if (offset == NULL_TREE
+	  && DECL_P (base)
+	  && pbitpos.is_constant (&bitpos))
 	{
 	  gcc_assert (!DECL_REGISTER (base));
 	  offset_int expr_offset = bitpos / BITS_PER_UNIT;
Index: gcc/tsan.c
===================================================================
--- gcc/tsan.c	2017-10-23 17:07:40.354337067 +0100
+++ gcc/tsan.c	2017-10-23 17:18:47.669056745 +0100
@@ -110,12 +110,12 @@ instrument_expr (gimple_stmt_iterator gs
   if (size <= 0)
     return false;
 
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 unused_bitsize, unused_bitpos;
   tree offset;
   machine_mode mode;
   int unsignedp, reversep, volatilep = 0;
-  base = get_inner_reference (expr, &bitsize, &bitpos, &offset, &mode,
-			      &unsignedp, &reversep, &volatilep);
+  base = get_inner_reference (expr, &unused_bitsize, &unused_bitpos, &offset,
+			      &mode, &unsignedp, &reversep, &volatilep);
 
   /* No need to instrument accesses to decls that don't escape,
      they can't escape to other threads then.  */
@@ -142,6 +142,7 @@ instrument_expr (gimple_stmt_iterator gs
        && DECL_BIT_FIELD_TYPE (TREE_OPERAND (expr, 1)))
       || TREE_CODE (expr) == BIT_FIELD_REF)
     {
+      HOST_WIDE_INT bitpos, bitsize;
       base = TREE_OPERAND (expr, 0);
       if (TREE_CODE (expr) == COMPONENT_REF)
 	{
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2017-10-23 17:17:01.431034628 +0100
+++ gcc/match.pd	2017-10-23 17:18:47.664057184 +0100
@@ -1454,13 +1454,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (minus (convert ADDR_EXPR@0) (convert @1))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
-  (with { HOST_WIDE_INT diff; }
+  (with { poly_int64 diff; }
    (if (ptr_difference_const (@0, @1, &diff))
     { build_int_cst_type (type, diff); }))))
 (simplify
  (minus (convert @0) (convert ADDR_EXPR@1))
  (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
-  (with { HOST_WIDE_INT diff; }
+  (with { poly_int64 diff; }
    (if (ptr_difference_const (@0, @1, &diff))
     { build_int_cst_type (type, diff); }))))
 
Index: gcc/ada/gcc-interface/trans.c
===================================================================
--- gcc/ada/gcc-interface/trans.c	2017-10-23 17:07:40.354337067 +0100
+++ gcc/ada/gcc-interface/trans.c	2017-10-23 17:18:47.653058151 +0100
@@ -2186,8 +2186,8 @@ Attribute_to_gnu (Node_Id gnat_node, tre
     case Attr_Last_Bit:
     case Attr_Bit:
       {
-	HOST_WIDE_INT bitsize;
-	HOST_WIDE_INT bitpos;
+	poly_int64 bitsize;
+	poly_int64 bitpos;
 	tree gnu_offset;
 	tree gnu_field_bitpos;
 	tree gnu_field_offset;
@@ -2254,11 +2254,11 @@ Attribute_to_gnu (Node_Id gnat_node, tre
 
 	  case Attr_First_Bit:
 	  case Attr_Bit:
-	    gnu_result = size_int (bitpos % BITS_PER_UNIT);
+	    gnu_result = size_int (num_trailing_bits (bitpos));
 	    break;
 
 	  case Attr_Last_Bit:
-	    gnu_result = bitsize_int (bitpos % BITS_PER_UNIT);
+	    gnu_result = bitsize_int (num_trailing_bits (bitpos));
 	    gnu_result = size_binop (PLUS_EXPR, gnu_result,
 				     TYPE_SIZE (TREE_TYPE (gnu_prefix)));
 	    /* ??? Avoid a large unsigned result that will overflow when
Index: gcc/ada/gcc-interface/utils2.c
===================================================================
--- gcc/ada/gcc-interface/utils2.c	2017-10-23 17:07:40.354337067 +0100
+++ gcc/ada/gcc-interface/utils2.c	2017-10-23 17:18:47.654058063 +0100
@@ -1439,8 +1439,8 @@ build_unary_op (enum tree_code op_code,
 	       the offset to the field.  Otherwise, do this the normal way.  */
 	  if (op_code == ATTR_ADDR_EXPR)
 	    {
-	      HOST_WIDE_INT bitsize;
-	      HOST_WIDE_INT bitpos;
+	      poly_int64 bitsize;
+	      poly_int64 bitpos;
 	      tree offset, inner;
 	      machine_mode mode;
 	      int unsignedp, reversep, volatilep;
@@ -1460,8 +1460,9 @@ build_unary_op (enum tree_code op_code,
 	      if (!offset)
 		offset = size_zero_node;
 
-	      offset = size_binop (PLUS_EXPR, offset,
-				   size_int (bitpos / BITS_PER_UNIT));
+	      offset
+		= size_binop (PLUS_EXPR, offset,
+			      size_int (bits_to_bytes_round_down (bitpos)));
 
 	      /* Take the address of INNER, convert it to a pointer to our type
 		 and add the offset.  */
Index: gcc/cp/constexpr.c
===================================================================
--- gcc/cp/constexpr.c	2017-10-23 17:07:40.354337067 +0100
+++ gcc/cp/constexpr.c	2017-10-23 17:18:47.657057799 +0100
@@ -5037,7 +5037,7 @@ enum { ck_ok, ck_bad, ck_unknown };
 check_automatic_or_tls (tree ref)
 {
   machine_mode mode;
-  HOST_WIDE_INT bitsize, bitpos;
+  poly_int64 bitsize, bitpos;
   tree offset;
   int volatilep = 0, unsignedp = 0;
   tree decl = get_inner_reference (ref, &bitsize, &bitpos, &offset,

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [044/nnn] poly_int: push_block/emit_push_insn
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (42 preceding siblings ...)
  2017-10-23 17:19 ` [045/nnn] poly_int: REG_ARGS_SIZE Richard Sandiford
@ 2017-10-23 17:19 ` Richard Sandiford
  2017-11-28 22:18   ` Jeff Law
  2017-10-23 17:19 ` [043/nnn] poly_int: frame allocations Richard Sandiford
                   ` (63 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:19 UTC (permalink / raw)
  To: gcc-patches

This patch changes the "extra" parameters to push_block and
emit_push_insn from int to poly_int64.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expr.h (push_block, emit_push_insn): Change the "extra" parameter
	from HOST_WIDE_INT to poly_int64.
	* expr.c (push_block, emit_push_insn): Likewise.

Index: gcc/expr.h
===================================================================
--- gcc/expr.h	2017-10-23 17:18:43.842393134 +0100
+++ gcc/expr.h	2017-10-23 17:18:56.434286222 +0100
@@ -233,11 +233,11 @@ extern rtx emit_move_resolve_push (machi
 
 /* Push a block of length SIZE (perhaps variable)
    and return an rtx to address the beginning of the block.  */
-extern rtx push_block (rtx, int, int);
+extern rtx push_block (rtx, poly_int64, int);
 
 /* Generate code to push something onto the stack, given its mode and type.  */
 extern bool emit_push_insn (rtx, machine_mode, tree, rtx, unsigned int,
-			    int, rtx, int, rtx, rtx, int, rtx, bool);
+			    int, rtx, poly_int64, rtx, rtx, int, rtx, bool);
 
 /* Extract the accessible bit-range from a COMPONENT_REF.  */
 extern void get_bit_range (poly_uint64_pod *, poly_uint64_pod *, tree,
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:18:47.661057448 +0100
+++ gcc/expr.c	2017-10-23 17:18:56.434286222 +0100
@@ -3865,19 +3865,19 @@ compress_float_constant (rtx x, rtx y)
    otherwise, the padding comes at high addresses.  */
 
 rtx
-push_block (rtx size, int extra, int below)
+push_block (rtx size, poly_int64 extra, int below)
 {
   rtx temp;
 
   size = convert_modes (Pmode, ptr_mode, size, 1);
   if (CONSTANT_P (size))
     anti_adjust_stack (plus_constant (Pmode, size, extra));
-  else if (REG_P (size) && extra == 0)
+  else if (REG_P (size) && known_zero (extra))
     anti_adjust_stack (size);
   else
     {
       temp = copy_to_mode_reg (Pmode, size);
-      if (extra != 0)
+      if (maybe_nonzero (extra))
 	temp = expand_binop (Pmode, add_optab, temp,
 			     gen_int_mode (extra, Pmode),
 			     temp, 0, OPTAB_LIB_WIDEN);
@@ -3887,7 +3887,7 @@ push_block (rtx size, int extra, int bel
   if (STACK_GROWS_DOWNWARD)
     {
       temp = virtual_outgoing_args_rtx;
-      if (extra != 0 && below)
+      if (maybe_nonzero (extra) && below)
 	temp = plus_constant (Pmode, temp, extra);
     }
   else
@@ -3895,7 +3895,7 @@ push_block (rtx size, int extra, int bel
       if (CONST_INT_P (size))
 	temp = plus_constant (Pmode, virtual_outgoing_args_rtx,
 			      -INTVAL (size) - (below ? 0 : extra));
-      else if (extra != 0 && !below)
+      else if (maybe_nonzero (extra) && !below)
 	temp = gen_rtx_PLUS (Pmode, virtual_outgoing_args_rtx,
 			     negate_rtx (Pmode, plus_constant (Pmode, size,
 							       extra)));
@@ -4269,7 +4269,7 @@ memory_load_overlap (rtx x, rtx y, HOST_
 
 bool
 emit_push_insn (rtx x, machine_mode mode, tree type, rtx size,
-		unsigned int align, int partial, rtx reg, int extra,
+		unsigned int align, int partial, rtx reg, poly_int64 extra,
 		rtx args_addr, rtx args_so_far, int reg_parm_stack_space,
 		rtx alignment_pad, bool sibcall_p)
 {
@@ -4357,9 +4357,11 @@ emit_push_insn (rtx x, machine_mode mode
 	  /* Push padding now if padding above and stack grows down,
 	     or if padding below and stack grows up.
 	     But if space already allocated, this has already been done.  */
-	  if (extra && args_addr == 0
-	      && where_pad != PAD_NONE && where_pad != stack_direction)
-	    anti_adjust_stack (GEN_INT (extra));
+	  if (maybe_nonzero (extra)
+	      && args_addr == 0
+	      && where_pad != PAD_NONE
+	      && where_pad != stack_direction)
+	    anti_adjust_stack (gen_int_mode (extra, Pmode));
 
 	  move_by_pieces (NULL, xinner, INTVAL (size) - used, align, 0);
 	}
@@ -4480,9 +4482,11 @@ emit_push_insn (rtx x, machine_mode mode
       /* Push padding now if padding above and stack grows down,
 	 or if padding below and stack grows up.
 	 But if space already allocated, this has already been done.  */
-      if (extra && args_addr == 0
-	  && where_pad != PAD_NONE && where_pad != stack_direction)
-	anti_adjust_stack (GEN_INT (extra));
+      if (maybe_nonzero (extra)
+	  && args_addr == 0
+	  && where_pad != PAD_NONE
+	  && where_pad != stack_direction)
+	anti_adjust_stack (gen_int_mode (extra, Pmode));
 
       /* If we make space by pushing it, we might as well push
 	 the real data.  Otherwise, we can leave OFFSET nonzero
@@ -4531,9 +4535,11 @@ emit_push_insn (rtx x, machine_mode mode
       /* Push padding now if padding above and stack grows down,
 	 or if padding below and stack grows up.
 	 But if space already allocated, this has already been done.  */
-      if (extra && args_addr == 0
-	  && where_pad != PAD_NONE && where_pad != stack_direction)
-	anti_adjust_stack (GEN_INT (extra));
+      if (maybe_nonzero (extra)
+	  && args_addr == 0
+	  && where_pad != PAD_NONE
+	  && where_pad != stack_direction)
+	anti_adjust_stack (gen_int_mode (extra, Pmode));
 
 #ifdef PUSH_ROUNDING
       if (args_addr == 0 && PUSH_ARGS)
@@ -4578,8 +4584,8 @@ emit_push_insn (rtx x, machine_mode mode
 	}
     }
 
-  if (extra && args_addr == 0 && where_pad == stack_direction)
-    anti_adjust_stack (GEN_INT (extra));
+  if (maybe_nonzero (extra) && args_addr == 0 && where_pad == stack_direction)
+    anti_adjust_stack (gen_int_mode (extra, Pmode));
 
   if (alignment_pad && args_addr == 0)
     anti_adjust_stack (alignment_pad);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [045/nnn] poly_int: REG_ARGS_SIZE
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (41 preceding siblings ...)
  2017-10-23 17:18 ` [041/nnn] poly_int: reload.c Richard Sandiford
@ 2017-10-23 17:19 ` Richard Sandiford
  2017-12-06  0:10   ` Jeff Law
  2017-12-22 21:56   ` Andreas Schwab
  2017-10-23 17:19 ` [044/nnn] poly_int: push_block/emit_push_insn Richard Sandiford
                   ` (64 subsequent siblings)
  107 siblings, 2 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:19 UTC (permalink / raw)
  To: gcc-patches

This patch adds new utility functions for manipulating REG_ARGS_SIZE
notes and allows the notes to carry polynomial as well as constant sizes.

The code was inconsistent about whether INT_MIN or HOST_WIDE_INT_MIN
should be used to represent an unknown size.  The patch uses
HOST_WIDE_INT_MIN throughout.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtl.h (get_args_size, add_args_size_note): New functions.
	(find_args_size_adjust): Return a poly_int64 rather than a
	HOST_WIDE_INT.
	(fixup_args_size_notes): Likewise.  Make the same change to the
	end_args_size parameter.
	* rtlanal.c (get_args_size, add_args_size_note): New functions.
	* builtins.c (expand_builtin_trap): Use add_args_size_note.
	* calls.c (emit_call_1): Likewise.
	* explow.c (adjust_stack_1): Likewise.
	* cfgcleanup.c (old_insns_match_p): Update use of
	find_args_size_adjust.
	* combine.c (distribute_notes): Track polynomial arg sizes.
	* dwarf2cfi.c (dw_trace_info): Change beg_true_args_size,
	end_true_args_size, beg_delay_args_size and end_delay_args_size
	from HOST_WIDE_INT to poly_int64.
	(add_cfi_args_size): Take the args_size as a poly_int64 rather
	than a HOST_WIDE_INT.
	(notice_args_size, notice_eh_throw, maybe_record_trace_start)
	(maybe_record_trace_start_abnormal, scan_trace, connect_traces): Track
	polynomial arg sizes.
	* emit-rtl.c (try_split): Use get_args_size.
	* recog.c (peep2_attempt): Likewise.
	* reload1.c (reload_as_needed): Likewise.
	* expr.c (find_args_size_adjust): Return the adjustment as a
	poly_int64 rather than a HOST_WIDE_INT.
	(fixup_args_size_notes): Change end_args_size from a HOST_WIDE_INT
	to a poly_int64 and change the return type in the same way.
	(emit_single_push_insn): Track polynomial arg sizes.

Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 17:16:55.754801166 +0100
+++ gcc/rtl.h	2017-10-23 17:18:57.862160702 +0100
@@ -3329,6 +3329,7 @@ extern rtx get_related_value (const_rtx)
 extern bool offset_within_block_p (const_rtx, HOST_WIDE_INT);
 extern void split_const (rtx, rtx *, rtx *);
 extern rtx strip_offset (rtx, poly_int64_pod *);
+extern poly_int64 get_args_size (const_rtx);
 extern bool unsigned_reg_p (rtx);
 extern int reg_mentioned_p (const_rtx, const_rtx);
 extern int count_occurrences (const_rtx, const_rtx, int);
@@ -3364,6 +3365,7 @@ extern int find_regno_fusage (const_rtx,
 extern rtx alloc_reg_note (enum reg_note, rtx, rtx);
 extern void add_reg_note (rtx, enum reg_note, rtx);
 extern void add_int_reg_note (rtx_insn *, enum reg_note, int);
+extern void add_args_size_note (rtx_insn *, poly_int64);
 extern void add_shallow_copy_of_reg_note (rtx_insn *, rtx);
 extern rtx duplicate_reg_note (rtx);
 extern void remove_note (rtx_insn *, const_rtx);
@@ -3954,8 +3956,8 @@ extern void emit_jump (rtx);
 /* In expr.c */
 extern rtx move_by_pieces (rtx, rtx, unsigned HOST_WIDE_INT,
 			   unsigned int, int);
-extern HOST_WIDE_INT find_args_size_adjust (rtx_insn *);
-extern int fixup_args_size_notes (rtx_insn *, rtx_insn *, int);
+extern poly_int64 find_args_size_adjust (rtx_insn *);
+extern poly_int64 fixup_args_size_notes (rtx_insn *, rtx_insn *, poly_int64);
 
 /* In expmed.c */
 extern void init_expmed (void);
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:18:53.836514583 +0100
+++ gcc/rtlanal.c	2017-10-23 17:18:57.862160702 +0100
@@ -937,6 +937,15 @@ strip_offset (rtx x, poly_int64_pod *off
   *offset_out = 0;
   return x;
 }
+
+/* Return the argument size in REG_ARGS_SIZE note X.  */
+
+poly_int64
+get_args_size (const_rtx x)
+{
+  gcc_checking_assert (REG_NOTE_KIND (x) == REG_ARGS_SIZE);
+  return rtx_to_poly_int64 (XEXP (x, 0));
+}
 \f
 /* Return the number of places FIND appears within X.  If COUNT_DEST is
    zero, we do not count occurrences inside the destination of a SET.  */
@@ -2362,6 +2371,15 @@ add_int_reg_note (rtx_insn *insn, enum r
 				       datum, REG_NOTES (insn));
 }
 
+/* Add a REG_ARGS_SIZE note to INSN with value VALUE.  */
+
+void
+add_args_size_note (rtx_insn *insn, poly_int64 value)
+{
+  gcc_checking_assert (!find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX));
+  add_reg_note (insn, REG_ARGS_SIZE, gen_int_mode (value, Pmode));
+}
+
 /* Add a register note like NOTE to INSN.  */
 
 void
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	2017-10-23 17:18:42.394520412 +0100
+++ gcc/builtins.c	2017-10-23 17:18:57.855161317 +0100
@@ -5027,7 +5027,7 @@ expand_builtin_trap (void)
 	 REG_ARGS_SIZE note to prevent crossjumping of calls with
 	 different args sizes.  */
       if (!ACCUMULATE_OUTGOING_ARGS)
-	add_reg_note (insn, REG_ARGS_SIZE, GEN_INT (stack_pointer_delta));
+	add_args_size_note (insn, stack_pointer_delta);
     }
   else
     {
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-10-23 17:16:50.357530032 +0100
+++ gcc/calls.c	2017-10-23 17:18:57.856161229 +0100
@@ -497,7 +497,7 @@ emit_call_1 (rtx funexp, tree fntree ATT
       rounded_stack_size_rtx = GEN_INT (rounded_stack_size);
       stack_pointer_delta -= n_popped;
 
-      add_reg_note (call_insn, REG_ARGS_SIZE, GEN_INT (stack_pointer_delta));
+      add_args_size_note (call_insn, stack_pointer_delta);
 
       /* If popup is needed, stack realign must use DRAP  */
       if (SUPPORTS_STACK_ALIGNMENT)
@@ -507,7 +507,7 @@ emit_call_1 (rtx funexp, tree fntree ATT
      REG_ARGS_SIZE note to prevent crossjumping of calls with different
      args sizes.  */
   else if (!ACCUMULATE_OUTGOING_ARGS && (ecf_flags & ECF_NORETURN) != 0)
-    add_reg_note (call_insn, REG_ARGS_SIZE, GEN_INT (stack_pointer_delta));
+    add_args_size_note (call_insn, stack_pointer_delta);
 
   if (!ACCUMULATE_OUTGOING_ARGS)
     {
Index: gcc/explow.c
===================================================================
--- gcc/explow.c	2017-10-23 17:18:53.832514935 +0100
+++ gcc/explow.c	2017-10-23 17:18:57.859160965 +0100
@@ -941,7 +941,7 @@ adjust_stack_1 (rtx adjust, bool anti_p)
     }
 
   if (!suppress_reg_args_size)
-    add_reg_note (insn, REG_ARGS_SIZE, GEN_INT (stack_pointer_delta));
+    add_args_size_note (insn, stack_pointer_delta);
 }
 
 /* Adjust the stack pointer by ADJUST (an rtx for a number of bytes).
Index: gcc/cfgcleanup.c
===================================================================
--- gcc/cfgcleanup.c	2017-10-23 17:11:40.377197950 +0100
+++ gcc/cfgcleanup.c	2017-10-23 17:18:57.856161229 +0100
@@ -1182,7 +1182,7 @@ old_insns_match_p (int mode ATTRIBUTE_UN
       /* ??? Worse, this adjustment had better be constant lest we
          have differing incoming stack levels.  */
       if (!frame_pointer_needed
-          && find_args_size_adjust (i1) == HOST_WIDE_INT_MIN)
+	  && must_eq (find_args_size_adjust (i1), HOST_WIDE_INT_MIN))
 	return dir_none;
     }
   else if (p1 || p2)
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	2017-10-23 17:16:50.358529897 +0100
+++ gcc/combine.c	2017-10-23 17:18:57.858161053 +0100
@@ -14140,7 +14140,7 @@ distribute_notes (rtx notes, rtx_insn *f
 	     entire adjustment.  Assert i3 contains at least some adjust.  */
 	  if (!noop_move_p (i3))
 	    {
-	      int old_size, args_size = INTVAL (XEXP (note, 0));
+	      poly_int64 old_size, args_size = get_args_size (note);
 	      /* fixup_args_size_notes looks at REG_NORETURN note,
 		 so ensure the note is placed there first.  */
 	      if (CALL_P (i3))
@@ -14159,7 +14159,7 @@ distribute_notes (rtx notes, rtx_insn *f
 	      old_size = fixup_args_size_notes (PREV_INSN (i3), i3, args_size);
 	      /* emit_call_1 adds for !ACCUMULATE_OUTGOING_ARGS
 		 REG_ARGS_SIZE note to all noreturn calls, allow that here.  */
-	      gcc_assert (old_size != args_size
+	      gcc_assert (may_ne (old_size, args_size)
 			  || (CALL_P (i3)
 			      && !ACCUMULATE_OUTGOING_ARGS
 			      && find_reg_note (i3, REG_NORETURN, NULL_RTX)));
Index: gcc/dwarf2cfi.c
===================================================================
--- gcc/dwarf2cfi.c	2017-10-23 17:16:57.208604839 +0100
+++ gcc/dwarf2cfi.c	2017-10-23 17:18:57.858161053 +0100
@@ -102,8 +102,8 @@ struct dw_trace_info
      while scanning insns.  However, the args_size value is irrelevant at
      any point except can_throw_internal_p insns.  Therefore the "delay"
      sizes the values that must actually be emitted for this trace.  */
-  HOST_WIDE_INT beg_true_args_size, end_true_args_size;
-  HOST_WIDE_INT beg_delay_args_size, end_delay_args_size;
+  poly_int64_pod beg_true_args_size, end_true_args_size;
+  poly_int64_pod beg_delay_args_size, end_delay_args_size;
 
   /* The first EH insn in the trace, where beg_delay_args_size must be set.  */
   rtx_insn *eh_head;
@@ -475,16 +475,19 @@ add_cfi (dw_cfi_ref cfi)
 }
 
 static void
-add_cfi_args_size (HOST_WIDE_INT size)
+add_cfi_args_size (poly_int64 size)
 {
+  /* We don't yet have a representation for polynomial sizes.  */
+  HOST_WIDE_INT const_size = size.to_constant ();
+
   dw_cfi_ref cfi = new_cfi ();
 
   /* While we can occasionally have args_size < 0 internally, this state
      should not persist at a point we actually need an opcode.  */
-  gcc_assert (size >= 0);
+  gcc_assert (const_size >= 0);
 
   cfi->dw_cfi_opc = DW_CFA_GNU_args_size;
-  cfi->dw_cfi_oprnd1.dw_cfi_offset = size;
+  cfi->dw_cfi_oprnd1.dw_cfi_offset = const_size;
 
   add_cfi (cfi);
 }
@@ -924,16 +927,16 @@ reg_save (unsigned int reg, unsigned int
 static void
 notice_args_size (rtx_insn *insn)
 {
-  HOST_WIDE_INT args_size, delta;
+  poly_int64 args_size, delta;
   rtx note;
 
   note = find_reg_note (insn, REG_ARGS_SIZE, NULL);
   if (note == NULL)
     return;
 
-  args_size = INTVAL (XEXP (note, 0));
+  args_size = get_args_size (note);
   delta = args_size - cur_trace->end_true_args_size;
-  if (delta == 0)
+  if (known_zero (delta))
     return;
 
   cur_trace->end_true_args_size = args_size;
@@ -959,16 +962,14 @@ notice_args_size (rtx_insn *insn)
 static void
 notice_eh_throw (rtx_insn *insn)
 {
-  HOST_WIDE_INT args_size;
-
-  args_size = cur_trace->end_true_args_size;
+  poly_int64 args_size = cur_trace->end_true_args_size;
   if (cur_trace->eh_head == NULL)
     {
       cur_trace->eh_head = insn;
       cur_trace->beg_delay_args_size = args_size;
       cur_trace->end_delay_args_size = args_size;
     }
-  else if (cur_trace->end_delay_args_size != args_size)
+  else if (may_ne (cur_trace->end_delay_args_size, args_size))
     {
       cur_trace->end_delay_args_size = args_size;
 
@@ -2289,7 +2290,6 @@ static void dump_cfi_row (FILE *f, dw_cf
 maybe_record_trace_start (rtx_insn *start, rtx_insn *origin)
 {
   dw_trace_info *ti;
-  HOST_WIDE_INT args_size;
 
   ti = get_trace_info (start);
   gcc_assert (ti != NULL);
@@ -2302,7 +2302,7 @@ maybe_record_trace_start (rtx_insn *star
 	       (origin ? INSN_UID (origin) : 0));
     }
 
-  args_size = cur_trace->end_true_args_size;
+  poly_int64 args_size = cur_trace->end_true_args_size;
   if (ti->beg_row == NULL)
     {
       /* This is the first time we've encountered this trace.  Propagate
@@ -2342,7 +2342,7 @@ maybe_record_trace_start (rtx_insn *star
 #endif
 
       /* The args_size is allowed to conflict if it isn't actually used.  */
-      if (ti->beg_true_args_size != args_size)
+      if (may_ne (ti->beg_true_args_size, args_size))
 	ti->args_size_undefined = true;
     }
 }
@@ -2353,11 +2353,11 @@ maybe_record_trace_start (rtx_insn *star
 static void
 maybe_record_trace_start_abnormal (rtx_insn *start, rtx_insn *origin)
 {
-  HOST_WIDE_INT save_args_size, delta;
+  poly_int64 save_args_size, delta;
   dw_cfa_location save_cfa;
 
   save_args_size = cur_trace->end_true_args_size;
-  if (save_args_size == 0)
+  if (known_zero (save_args_size))
     {
       maybe_record_trace_start (start, origin);
       return;
@@ -2549,7 +2549,6 @@ scan_trace (dw_trace_info *trace)
 
 	      if (INSN_FROM_TARGET_P (elt))
 		{
-		  HOST_WIDE_INT restore_args_size;
 		  cfi_vec save_row_reg_save;
 
 		  /* If ELT is an instruction from target of an annulled
@@ -2557,7 +2556,7 @@ scan_trace (dw_trace_info *trace)
 		     the args_size and CFA along the current path
 		     shouldn't change.  */
 		  add_cfi_insn = NULL;
-		  restore_args_size = cur_trace->end_true_args_size;
+		  poly_int64 restore_args_size = cur_trace->end_true_args_size;
 		  cur_cfa = &cur_row->cfa;
 		  save_row_reg_save = vec_safe_copy (cur_row->reg_save);
 
@@ -2799,7 +2798,7 @@ connect_traces (void)
   /* Connect args_size between traces that have can_throw_internal insns.  */
   if (cfun->eh->lp_array)
     {
-      HOST_WIDE_INT prev_args_size = 0;
+      poly_int64 prev_args_size = 0;
 
       for (i = 0; i < n; ++i)
 	{
@@ -2811,7 +2810,7 @@ connect_traces (void)
 	    continue;
 	  gcc_assert (!ti->args_size_undefined);
 
-	  if (ti->beg_delay_args_size != prev_args_size)
+	  if (may_ne (ti->beg_delay_args_size, prev_args_size))
 	    {
 	      /* ??? Search back to previous CFI note.  */
 	      add_cfi_insn = PREV_INSN (ti->eh_head);
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 17:16:55.754801166 +0100
+++ gcc/emit-rtl.c	2017-10-23 17:18:57.859160965 +0100
@@ -3947,7 +3947,7 @@ try_split (rtx pat, rtx_insn *trial, int
 	  break;
 
 	case REG_ARGS_SIZE:
-	  fixup_args_size_notes (NULL, insn_last, INTVAL (XEXP (note, 0)));
+	  fixup_args_size_notes (NULL, insn_last, get_args_size (note));
 	  break;
 
 	case REG_CALL_DECL:
Index: gcc/recog.c
===================================================================
--- gcc/recog.c	2017-10-23 17:16:50.372528007 +0100
+++ gcc/recog.c	2017-10-23 17:18:57.860160878 +0100
@@ -3464,7 +3464,7 @@ peep2_attempt (basic_block bb, rtx_insn
 
   /* Re-insert the ARGS_SIZE notes.  */
   if (as_note)
-    fixup_args_size_notes (before_try, last, INTVAL (XEXP (as_note, 0)));
+    fixup_args_size_notes (before_try, last, get_args_size (as_note));
 
   /* If we generated a jump instruction, it won't have
      JUMP_LABEL set.  Recompute after we're done.  */
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2017-10-23 17:18:53.835514671 +0100
+++ gcc/reload1.c	2017-10-23 17:18:57.861160790 +0100
@@ -4649,7 +4649,7 @@ reload_as_needed (int live_known)
 		{
 		  remove_note (insn, p);
 		  fixup_args_size_notes (prev, PREV_INSN (next),
-					 INTVAL (XEXP (p, 0)));
+					 get_args_size (p));
 		}
 
 	      /* If this was an ASM, make sure that all the reload insns
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:18:56.434286222 +0100
+++ gcc/expr.c	2017-10-23 17:18:57.860160878 +0100
@@ -3939,9 +3939,9 @@ mem_autoinc_base (rtx mem)
 
    The return value is the amount of adjustment that can be trivially
    verified, via immediate operand or auto-inc.  If the adjustment
-   cannot be trivially extracted, the return value is INT_MIN.  */
+   cannot be trivially extracted, the return value is HOST_WIDE_INT_MIN.  */
 
-HOST_WIDE_INT
+poly_int64
 find_args_size_adjust (rtx_insn *insn)
 {
   rtx dest, set, pat;
@@ -4064,22 +4064,21 @@ find_args_size_adjust (rtx_insn *insn)
     }
 }
 
-int
-fixup_args_size_notes (rtx_insn *prev, rtx_insn *last, int end_args_size)
+poly_int64
+fixup_args_size_notes (rtx_insn *prev, rtx_insn *last,
+		       poly_int64 end_args_size)
 {
-  int args_size = end_args_size;
+  poly_int64 args_size = end_args_size;
   bool saw_unknown = false;
   rtx_insn *insn;
 
   for (insn = last; insn != prev; insn = PREV_INSN (insn))
     {
-      HOST_WIDE_INT this_delta;
-
       if (!NONDEBUG_INSN_P (insn))
 	continue;
 
-      this_delta = find_args_size_adjust (insn);
-      if (this_delta == 0)
+      poly_int64 this_delta = find_args_size_adjust (insn);
+      if (known_zero (this_delta))
 	{
 	  if (!CALL_P (insn)
 	      || ACCUMULATE_OUTGOING_ARGS
@@ -4088,17 +4087,17 @@ fixup_args_size_notes (rtx_insn *prev, r
 	}
 
       gcc_assert (!saw_unknown);
-      if (this_delta == HOST_WIDE_INT_MIN)
+      if (must_eq (this_delta, HOST_WIDE_INT_MIN))
 	saw_unknown = true;
 
-      add_reg_note (insn, REG_ARGS_SIZE, GEN_INT (args_size));
+      add_args_size_note (insn, args_size);
       if (STACK_GROWS_DOWNWARD)
-	this_delta = -(unsigned HOST_WIDE_INT) this_delta;
+	this_delta = -poly_uint64 (this_delta);
 
       args_size -= this_delta;
     }
 
-  return saw_unknown ? INT_MIN : args_size;
+  return saw_unknown ? poly_int64 (HOST_WIDE_INT_MIN) : args_size;
 }
 
 #ifdef PUSH_ROUNDING
@@ -4193,7 +4192,7 @@ emit_single_push_insn_1 (machine_mode mo
 static void
 emit_single_push_insn (machine_mode mode, rtx x, tree type)
 {
-  int delta, old_delta = stack_pointer_delta;
+  poly_int64 delta, old_delta = stack_pointer_delta;
   rtx_insn *prev = get_last_insn ();
   rtx_insn *last;
 
@@ -4204,12 +4203,13 @@ emit_single_push_insn (machine_mode mode
   /* Notice the common case where we emitted exactly one insn.  */
   if (PREV_INSN (last) == prev)
     {
-      add_reg_note (last, REG_ARGS_SIZE, GEN_INT (stack_pointer_delta));
+      add_args_size_note (last, stack_pointer_delta);
       return;
     }
 
   delta = fixup_args_size_notes (prev, last, stack_pointer_delta);
-  gcc_assert (delta == INT_MIN || delta == old_delta);
+  gcc_assert (must_eq (delta, HOST_WIDE_INT_MIN)
+	      || must_eq (delta, old_delta));
 }
 #endif
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [043/nnn] poly_int: frame allocations
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (43 preceding siblings ...)
  2017-10-23 17:19 ` [044/nnn] poly_int: push_block/emit_push_insn Richard Sandiford
@ 2017-10-23 17:19 ` Richard Sandiford
  2017-12-06  3:15   ` Jeff Law
  2017-10-23 17:20 ` [047/nnn] poly_int: argument sizes Richard Sandiford
                   ` (62 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:19 UTC (permalink / raw)
  To: gcc-patches

This patch converts the frame allocation code (mostly in function.c)
to use poly_int64 rather than HOST_WIDE_INT for frame offsets and
sizes.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* function.h (frame_space): Change start and length from HOST_WIDE_INT
	to poly_int64.
	(get_frame_size): Return the size as a poly_int64 rather than a
	HOST_WIDE_INT.
	(frame_offset_overflow): Take the offset as a poly_int64 rather
	than a HOST_WIDE_INT.
	(assign_stack_local_1, assign_stack_local, assign_stack_temp_for_type)
	(assign_stack_temp): Likewise for the size.
	* function.c (get_frame_size): Return a poly_int64 rather than
	a HOST_WIDE_INT.
	(frame_offset_overflow): Take the offset as a poly_int64 rather
	than a HOST_WIDE_INT.
	(try_fit_stack_local): Take the start, length and size as poly_int64s
	rather than HOST_WIDE_INTs.  Return the offset as a poly_int64_pod
	rather than a HOST_WIDE_INT.
	(add_frame_space): Take the start and end as poly_int64s rather than
	HOST_WIDE_INTs.
	(assign_stack_local_1, assign_stack_local, assign_stack_temp_for_type)
	(assign_stack_temp): Likewise for the size.
	(temp_slot): Change size, base_offset and full_size from HOST_WIDE_INT
	to poly_int64.
	(find_temp_slot_from_address): Handle polynomial offsets.
	(combine_temp_slots): Likewise.
	* emit-rtl.h (rtl_data::x_frame_offset): Change from HOST_WIDE_INT
	to poly_int64.
	* cfgexpand.c (alloc_stack_frame_space): Return the offset as a
	poly_int64 rather than a HOST_WIDE_INT.
	(expand_one_stack_var_at): Take the offset as a poly_int64 rather
	than a HOST_WIDE_INT.
	(expand_stack_vars, expand_one_stack_var_1, expand_used_vars): Handle
	polynomial frame offsets.
	* config/m32r/m32r-protos.h (m32r_compute_frame_size): Take the size
	as a poly_int64 rather than an int.
	* config/m32r/m32r.c (m32r_compute_frame_size): Likewise.
	* config/v850/v850-protos.h (compute_frame_size): Likewise.
	* config/v850/v850.c (compute_frame_size): Likewise.
	* config/xtensa/xtensa-protos.h (compute_frame_size): Likewise.
	* config/xtensa/xtensa.c (compute_frame_size): Likewise.
	* config/pa/pa-protos.h (pa_compute_frame_size): Likewise.
	* config/pa/pa.c (pa_compute_frame_size): Likewise.
	* explow.h (get_dynamic_stack_base): Take the offset as a poly_int64
	rather than a HOST_WIDE_INT.
	* explow.c (get_dynamic_stack_base): Likewise.
	* final.c (final_start_function): Use the constant lower bound
	of the frame size for -Wframe-larger-than.
	* ira.c (do_reload): Adjust for new get_frame_size return type.
	* lra.c (lra): Likewise.
	* reload1.c (reload): Likewise.
	* config/avr/avr.c (avr_asm_function_end_prologue): Likewise.
	* config/pa/pa.h (EXIT_IGNORE_STACK): Likewise.
	* rtlanal.c (get_initial_register_offset): Return the offset as
	a poly_int64 rather than a HOST_WIDE_INT.

Index: gcc/function.h
===================================================================
--- gcc/function.h	2017-10-23 17:07:40.163546918 +0100
+++ gcc/function.h	2017-10-23 17:18:53.834514759 +0100
@@ -187,8 +187,8 @@ struct GTY(()) frame_space
 {
   struct frame_space *next;
 
-  HOST_WIDE_INT start;
-  HOST_WIDE_INT length;
+  poly_int64 start;
+  poly_int64 length;
 };
 
 struct GTY(()) stack_usage
@@ -571,19 +571,19 @@ extern void free_after_compilation (stru
 /* Return size needed for stack frame based on slots so far allocated.
    This size counts from zero.  It is not rounded to STACK_BOUNDARY;
    the caller may have to do that.  */
-extern HOST_WIDE_INT get_frame_size (void);
+extern poly_int64 get_frame_size (void);
 
 /* Issue an error message and return TRUE if frame OFFSET overflows in
    the signed target pointer arithmetics for function FUNC.  Otherwise
    return FALSE.  */
-extern bool frame_offset_overflow (HOST_WIDE_INT, tree);
+extern bool frame_offset_overflow (poly_int64, tree);
 
 extern unsigned int spill_slot_alignment (machine_mode);
 
-extern rtx assign_stack_local_1 (machine_mode, HOST_WIDE_INT, int, int);
-extern rtx assign_stack_local (machine_mode, HOST_WIDE_INT, int);
-extern rtx assign_stack_temp_for_type (machine_mode, HOST_WIDE_INT, tree);
-extern rtx assign_stack_temp (machine_mode, HOST_WIDE_INT);
+extern rtx assign_stack_local_1 (machine_mode, poly_int64, int, int);
+extern rtx assign_stack_local (machine_mode, poly_int64, int);
+extern rtx assign_stack_temp_for_type (machine_mode, poly_int64, tree);
+extern rtx assign_stack_temp (machine_mode, poly_int64);
 extern rtx assign_temp (tree, int, int);
 extern void update_temp_slot_address (rtx, rtx);
 extern void preserve_temp_slots (rtx);
Index: gcc/function.c
===================================================================
--- gcc/function.c	2017-10-23 17:16:50.365528952 +0100
+++ gcc/function.c	2017-10-23 17:18:53.834514759 +0100
@@ -218,7 +218,7 @@ free_after_compilation (struct function
    This size counts from zero.  It is not rounded to PREFERRED_STACK_BOUNDARY;
    the caller may have to do that.  */
 
-HOST_WIDE_INT
+poly_int64
 get_frame_size (void)
 {
   if (FRAME_GROWS_DOWNWARD)
@@ -232,20 +232,22 @@ get_frame_size (void)
    return FALSE.  */
 
 bool
-frame_offset_overflow (HOST_WIDE_INT offset, tree func)
+frame_offset_overflow (poly_int64 offset, tree func)
 {
-  unsigned HOST_WIDE_INT size = FRAME_GROWS_DOWNWARD ? -offset : offset;
+  poly_uint64 size = FRAME_GROWS_DOWNWARD ? -offset : offset;
+  unsigned HOST_WIDE_INT limit
+    = ((HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (Pmode) - 1))
+       /* Leave room for the fixed part of the frame.  */
+       - 64 * UNITS_PER_WORD);
 
-  if (size > (HOST_WIDE_INT_1U << (GET_MODE_BITSIZE (Pmode) - 1))
-	       /* Leave room for the fixed part of the frame.  */
-	       - 64 * UNITS_PER_WORD)
+  if (!coeffs_in_range_p (size, 0U, limit))
     {
       error_at (DECL_SOURCE_LOCATION (func),
 		"total size of local objects too large");
-      return TRUE;
+      return true;
     }
 
-  return FALSE;
+  return false;
 }
 
 /* Return the minimum spill slot alignment for a register of mode MODE.  */
@@ -284,11 +286,11 @@ get_stack_local_alignment (tree type, ma
    given a start/length pair that lies at the end of the frame.  */
 
 static bool
-try_fit_stack_local (HOST_WIDE_INT start, HOST_WIDE_INT length,
-		     HOST_WIDE_INT size, unsigned int alignment,
-		     HOST_WIDE_INT *poffset)
+try_fit_stack_local (poly_int64 start, poly_int64 length,
+		     poly_int64 size, unsigned int alignment,
+		     poly_int64_pod *poffset)
 {
-  HOST_WIDE_INT this_frame_offset;
+  poly_int64 this_frame_offset;
   int frame_off, frame_alignment, frame_phase;
 
   /* Calculate how many bytes the start of local variables is off from
@@ -299,33 +301,31 @@ try_fit_stack_local (HOST_WIDE_INT start
 
   /* Round the frame offset to the specified alignment.  */
 
-  /*  We must be careful here, since FRAME_OFFSET might be negative and
-      division with a negative dividend isn't as well defined as we might
-      like.  So we instead assume that ALIGNMENT is a power of two and
-      use logical operations which are unambiguous.  */
   if (FRAME_GROWS_DOWNWARD)
     this_frame_offset
-      = (FLOOR_ROUND (start + length - size - frame_phase,
-		      (unsigned HOST_WIDE_INT) alignment)
+      = (aligned_lower_bound (start + length - size - frame_phase, alignment)
 	 + frame_phase);
   else
     this_frame_offset
-      = (CEIL_ROUND (start - frame_phase,
-		     (unsigned HOST_WIDE_INT) alignment)
-	 + frame_phase);
+      = aligned_upper_bound (start - frame_phase, alignment) + frame_phase;
 
   /* See if it fits.  If this space is at the edge of the frame,
      consider extending the frame to make it fit.  Our caller relies on
      this when allocating a new slot.  */
-  if (frame_offset == start && this_frame_offset < frame_offset)
-    frame_offset = this_frame_offset;
-  else if (this_frame_offset < start)
-    return false;
-  else if (start + length == frame_offset
-	   && this_frame_offset + size > start + length)
-    frame_offset = this_frame_offset + size;
-  else if (this_frame_offset + size > start + length)
-    return false;
+  if (may_lt (this_frame_offset, start))
+    {
+      if (must_eq (frame_offset, start))
+	frame_offset = this_frame_offset;
+      else
+	return false;
+    }
+  else if (may_gt (this_frame_offset + size, start + length))
+    {
+      if (must_eq (frame_offset, start + length))
+	frame_offset = this_frame_offset + size;
+      else
+	return false;
+    }
 
   *poffset = this_frame_offset;
   return true;
@@ -336,7 +336,7 @@ try_fit_stack_local (HOST_WIDE_INT start
    function's frame_space_list.  */
 
 static void
-add_frame_space (HOST_WIDE_INT start, HOST_WIDE_INT end)
+add_frame_space (poly_int64 start, poly_int64 end)
 {
   struct frame_space *space = ggc_alloc<frame_space> ();
   space->next = crtl->frame_space_list;
@@ -363,12 +363,12 @@ add_frame_space (HOST_WIDE_INT start, HO
    We do not round to stack_boundary here.  */
 
 rtx
-assign_stack_local_1 (machine_mode mode, HOST_WIDE_INT size,
+assign_stack_local_1 (machine_mode mode, poly_int64 size,
 		      int align, int kind)
 {
   rtx x, addr;
-  int bigend_correction = 0;
-  HOST_WIDE_INT slot_offset = 0, old_frame_offset;
+  poly_int64 bigend_correction = 0;
+  poly_int64 slot_offset = 0, old_frame_offset;
   unsigned int alignment, alignment_in_bits;
 
   if (align == 0)
@@ -379,7 +379,7 @@ assign_stack_local_1 (machine_mode mode,
   else if (align == -1)
     {
       alignment = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
-      size = CEIL_ROUND (size, alignment);
+      size = aligned_upper_bound (size, alignment);
     }
   else if (align == -2)
     alignment = 1; /* BITS_PER_UNIT / BITS_PER_UNIT */
@@ -415,7 +415,7 @@ assign_stack_local_1 (machine_mode mode,
 		     requested size is 0 or the estimated stack
 		     alignment >= mode alignment.  */
 		  gcc_assert ((kind & ASLK_REDUCE_ALIGN)
-		              || size == 0
+			      || known_zero (size)
 			      || (crtl->stack_alignment_estimated
 				  >= GET_MODE_ALIGNMENT (mode)));
 		  alignment_in_bits = crtl->stack_alignment_estimated;
@@ -430,7 +430,7 @@ assign_stack_local_1 (machine_mode mode,
   if (crtl->max_used_stack_slot_alignment < alignment_in_bits)
     crtl->max_used_stack_slot_alignment = alignment_in_bits;
 
-  if (mode != BLKmode || size != 0)
+  if (mode != BLKmode || maybe_nonzero (size))
     {
       if (kind & ASLK_RECORD_PAD)
 	{
@@ -443,9 +443,9 @@ assign_stack_local_1 (machine_mode mode,
 					alignment, &slot_offset))
 		continue;
 	      *psp = space->next;
-	      if (slot_offset > space->start)
+	      if (must_gt (slot_offset, space->start))
 		add_frame_space (space->start, slot_offset);
-	      if (slot_offset + size < space->start + space->length)
+	      if (must_lt (slot_offset + size, space->start + space->length))
 		add_frame_space (slot_offset + size,
 				 space->start + space->length);
 	      goto found_space;
@@ -467,9 +467,9 @@ assign_stack_local_1 (machine_mode mode,
 
       if (kind & ASLK_RECORD_PAD)
 	{
-	  if (slot_offset > frame_offset)
+	  if (must_gt (slot_offset, frame_offset))
 	    add_frame_space (frame_offset, slot_offset);
-	  if (slot_offset + size < old_frame_offset)
+	  if (must_lt (slot_offset + size, old_frame_offset))
 	    add_frame_space (slot_offset + size, old_frame_offset);
 	}
     }
@@ -480,9 +480,9 @@ assign_stack_local_1 (machine_mode mode,
 
       if (kind & ASLK_RECORD_PAD)
 	{
-	  if (slot_offset > old_frame_offset)
+	  if (must_gt (slot_offset, old_frame_offset))
 	    add_frame_space (old_frame_offset, slot_offset);
-	  if (slot_offset + size < frame_offset)
+	  if (must_lt (slot_offset + size, frame_offset))
 	    add_frame_space (slot_offset + size, frame_offset);
 	}
     }
@@ -490,8 +490,17 @@ assign_stack_local_1 (machine_mode mode,
  found_space:
   /* On a big-endian machine, if we are allocating more space than we will use,
      use the least significant bytes of those that are allocated.  */
-  if (BYTES_BIG_ENDIAN && mode != BLKmode && GET_MODE_SIZE (mode) < size)
-    bigend_correction = size - GET_MODE_SIZE (mode);
+  if (mode != BLKmode)
+    {
+      /* The slot size can sometimes be smaller than the mode size;
+	 e.g. the rs6000 port allocates slots with a vector mode
+	 that have the size of only one element.  However, the slot
+	 size must always be ordered wrt to the mode size, in the
+	 same way as for a subreg.  */
+      gcc_checking_assert (ordered_p (GET_MODE_SIZE (mode), size));
+      if (BYTES_BIG_ENDIAN && may_lt (GET_MODE_SIZE (mode), size))
+	bigend_correction = size - GET_MODE_SIZE (mode);
+    }
 
   /* If we have already instantiated virtual registers, return the actual
      address relative to the frame pointer.  */
@@ -521,7 +530,7 @@ assign_stack_local_1 (machine_mode mode,
 /* Wrap up assign_stack_local_1 with last parameter as false.  */
 
 rtx
-assign_stack_local (machine_mode mode, HOST_WIDE_INT size, int align)
+assign_stack_local (machine_mode mode, poly_int64 size, int align)
 {
   return assign_stack_local_1 (mode, size, align, ASLK_RECORD_PAD);
 }
@@ -548,7 +557,7 @@ struct GTY(()) temp_slot {
   /* The rtx to used to reference the slot.  */
   rtx slot;
   /* The size, in units, of the slot.  */
-  HOST_WIDE_INT size;
+  poly_int64 size;
   /* The type of the object in the slot, or zero if it doesn't correspond
      to a type.  We use this to determine whether a slot can be reused.
      It can be reused if objects of the type of the new slot will always
@@ -562,10 +571,10 @@ struct GTY(()) temp_slot {
   int level;
   /* The offset of the slot from the frame_pointer, including extra space
      for alignment.  This info is for combine_temp_slots.  */
-  HOST_WIDE_INT base_offset;
+  poly_int64 base_offset;
   /* The size of the slot, including extra space for alignment.  This
      info is for combine_temp_slots.  */
-  HOST_WIDE_INT full_size;
+  poly_int64 full_size;
 };
 
 /* Entry for the below hash table.  */
@@ -743,18 +752,14 @@ find_temp_slot_from_address (rtx x)
     return p;
 
   /* Last resort: Address is a virtual stack var address.  */
-  if (GET_CODE (x) == PLUS
-      && XEXP (x, 0) == virtual_stack_vars_rtx
-      && CONST_INT_P (XEXP (x, 1)))
+  poly_int64 offset;
+  if (strip_offset (x, &offset) == virtual_stack_vars_rtx)
     {
       int i;
       for (i = max_slot_level (); i >= 0; i--)
 	for (p = *temp_slots_at_level (i); p; p = p->next)
-	  {
-	    if (INTVAL (XEXP (x, 1)) >= p->base_offset
-		&& INTVAL (XEXP (x, 1)) < p->base_offset + p->full_size)
-	      return p;
-	  }
+	  if (known_in_range_p (offset, p->base_offset, p->full_size))
+	    return p;
     }
 
   return NULL;
@@ -771,16 +776,13 @@ find_temp_slot_from_address (rtx x)
    TYPE is the type that will be used for the stack slot.  */
 
 rtx
-assign_stack_temp_for_type (machine_mode mode, HOST_WIDE_INT size,
-			    tree type)
+assign_stack_temp_for_type (machine_mode mode, poly_int64 size, tree type)
 {
   unsigned int align;
   struct temp_slot *p, *best_p = 0, *selected = NULL, **pp;
   rtx slot;
 
-  /* If SIZE is -1 it means that somebody tried to allocate a temporary
-     of a variable size.  */
-  gcc_assert (size != -1);
+  gcc_assert (known_size_p (size));
 
   align = get_stack_local_alignment (type, mode);
 
@@ -795,13 +797,16 @@ assign_stack_temp_for_type (machine_mode
     {
       for (p = avail_temp_slots; p; p = p->next)
 	{
-	  if (p->align >= align && p->size >= size
+	  if (p->align >= align
+	      && must_ge (p->size, size)
 	      && GET_MODE (p->slot) == mode
 	      && objects_must_conflict_p (p->type, type)
-	      && (best_p == 0 || best_p->size > p->size
-		  || (best_p->size == p->size && best_p->align > p->align)))
+	      && (best_p == 0
+		  || (must_eq (best_p->size, p->size)
+		      ? best_p->align > p->align
+		      : must_ge (best_p->size, p->size))))
 	    {
-	      if (p->align == align && p->size == size)
+	      if (p->align == align && must_eq (p->size, size))
 		{
 		  selected = p;
 		  cut_slot_from_list (selected, &avail_temp_slots);
@@ -825,9 +830,9 @@ assign_stack_temp_for_type (machine_mode
       if (GET_MODE (best_p->slot) == BLKmode)
 	{
 	  int alignment = best_p->align / BITS_PER_UNIT;
-	  HOST_WIDE_INT rounded_size = CEIL_ROUND (size, alignment);
+	  poly_int64 rounded_size = aligned_upper_bound (size, alignment);
 
-	  if (best_p->size - rounded_size >= alignment)
+	  if (must_ge (best_p->size - rounded_size, alignment))
 	    {
 	      p = ggc_alloc<temp_slot> ();
 	      p->in_use = 0;
@@ -850,7 +855,7 @@ assign_stack_temp_for_type (machine_mode
   /* If we still didn't find one, make a new temporary.  */
   if (selected == 0)
     {
-      HOST_WIDE_INT frame_offset_old = frame_offset;
+      poly_int64 frame_offset_old = frame_offset;
 
       p = ggc_alloc<temp_slot> ();
 
@@ -864,9 +869,9 @@ assign_stack_temp_for_type (machine_mode
       gcc_assert (mode != BLKmode || align == BIGGEST_ALIGNMENT);
       p->slot = assign_stack_local_1 (mode,
 				      (mode == BLKmode
-				       ? CEIL_ROUND (size,
-						     (int) align
-						     / BITS_PER_UNIT)
+				       ? aligned_upper_bound (size,
+							      (int) align
+							      / BITS_PER_UNIT)
 				       : size),
 				      align, 0);
 
@@ -931,7 +936,7 @@ assign_stack_temp_for_type (machine_mode
    reuse.  First two arguments are same as in preceding function.  */
 
 rtx
-assign_stack_temp (machine_mode mode, HOST_WIDE_INT size)
+assign_stack_temp (machine_mode mode, poly_int64 size)
 {
   return assign_stack_temp_for_type (mode, size, NULL_TREE);
 }
@@ -1050,14 +1055,14 @@ combine_temp_slots (void)
 	  if (GET_MODE (q->slot) != BLKmode)
 	    continue;
 
-	  if (p->base_offset + p->full_size == q->base_offset)
+	  if (must_eq (p->base_offset + p->full_size, q->base_offset))
 	    {
 	      /* Q comes after P; combine Q into P.  */
 	      p->size += q->size;
 	      p->full_size += q->full_size;
 	      delete_q = 1;
 	    }
-	  else if (q->base_offset + q->full_size == p->base_offset)
+	  else if (must_eq (q->base_offset + q->full_size, p->base_offset))
 	    {
 	      /* P comes after Q; combine P into Q.  */
 	      q->size += p->size;
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-10-23 17:11:40.381205609 +0100
+++ gcc/emit-rtl.h	2017-10-23 17:18:53.832514935 +0100
@@ -126,7 +126,7 @@ struct GTY(()) rtl_data {
   /* Offset to end of allocated area of stack frame.
      If stack grows down, this is the address of the last stack slot allocated.
      If stack grows up, this is the address for the next slot.  */
-  HOST_WIDE_INT x_frame_offset;
+  poly_int64_pod x_frame_offset;
 
   /* Insn after which register parms and SAVE_EXPRs are born, if nonopt.  */
   rtx_insn *x_parm_birth_insn;
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-10-23 17:18:40.711668346 +0100
+++ gcc/cfgexpand.c	2017-10-23 17:18:53.827515374 +0100
@@ -389,22 +389,23 @@ align_base (HOST_WIDE_INT base, unsigned
 /* Allocate SIZE bytes at byte alignment ALIGN from the stack frame.
    Return the frame offset.  */
 
-static HOST_WIDE_INT
+static poly_int64
 alloc_stack_frame_space (HOST_WIDE_INT size, unsigned HOST_WIDE_INT align)
 {
-  HOST_WIDE_INT offset, new_frame_offset;
+  poly_int64 offset, new_frame_offset;
 
   if (FRAME_GROWS_DOWNWARD)
     {
       new_frame_offset
-	= align_base (frame_offset - frame_phase - size,
-		      align, false) + frame_phase;
+	= aligned_lower_bound (frame_offset - frame_phase - size,
+			       align) + frame_phase;
       offset = new_frame_offset;
     }
   else
     {
       new_frame_offset
-	= align_base (frame_offset - frame_phase, align, true) + frame_phase;
+	= aligned_upper_bound (frame_offset - frame_phase,
+			       align) + frame_phase;
       offset = new_frame_offset;
       new_frame_offset += size;
     }
@@ -980,13 +981,13 @@ dump_stack_var_partition (void)
 
 static void
 expand_one_stack_var_at (tree decl, rtx base, unsigned base_align,
-			 HOST_WIDE_INT offset)
+			 poly_int64 offset)
 {
   unsigned align;
   rtx x;
 
   /* If this fails, we've overflowed the stack frame.  Error nicely?  */
-  gcc_assert (offset == trunc_int_for_mode (offset, Pmode));
+  gcc_assert (must_eq (offset, trunc_int_for_mode (offset, Pmode)));
 
   x = plus_constant (Pmode, base, offset);
   x = gen_rtx_MEM (TREE_CODE (decl) == SSA_NAME
@@ -1000,7 +1001,7 @@ expand_one_stack_var_at (tree decl, rtx
 	 important, we'll simply use the alignment that is already set.  */
       if (base == virtual_stack_vars_rtx)
 	offset -= frame_phase;
-      align = least_bit_hwi (offset);
+      align = known_alignment (offset);
       align *= BITS_PER_UNIT;
       if (align == 0 || align > base_align)
 	align = base_align;
@@ -1094,7 +1095,7 @@ expand_stack_vars (bool (*pred) (size_t)
     {
       rtx base;
       unsigned base_align, alignb;
-      HOST_WIDE_INT offset;
+      poly_int64 offset;
 
       i = stack_vars_sorted[si];
 
@@ -1119,13 +1120,16 @@ expand_stack_vars (bool (*pred) (size_t)
       if (alignb * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
 	{
 	  base = virtual_stack_vars_rtx;
-	  if ((asan_sanitize_stack_p ())
-	      && pred)
+	  /* ASAN description strings don't yet have a syntax for expressing
+	     polynomial offsets.  */
+	  HOST_WIDE_INT prev_offset;
+	  if (asan_sanitize_stack_p ()
+	      && pred
+	      && frame_offset.is_constant (&prev_offset))
 	    {
-	      HOST_WIDE_INT prev_offset
-		= align_base (frame_offset,
-			      MAX (alignb, ASAN_RED_ZONE_SIZE),
-			      !FRAME_GROWS_DOWNWARD);
+	      prev_offset = align_base (prev_offset,
+					MAX (alignb, ASAN_RED_ZONE_SIZE),
+					!FRAME_GROWS_DOWNWARD);
 	      tree repr_decl = NULL_TREE;
 	      offset
 		= alloc_stack_frame_space (stack_vars[i].size
@@ -1133,7 +1137,10 @@ expand_stack_vars (bool (*pred) (size_t)
 					   MAX (alignb, ASAN_RED_ZONE_SIZE));
 
 	      data->asan_vec.safe_push (prev_offset);
-	      data->asan_vec.safe_push (offset + stack_vars[i].size);
+	      /* Allocating a constant amount of space from a constant
+		 starting offset must give a constant result.  */
+	      data->asan_vec.safe_push ((offset + stack_vars[i].size)
+					.to_constant ());
 	      /* Find best representative of the partition.
 		 Prefer those with DECL_NAME, even better
 		 satisfying asan_protect_stack_decl predicate.  */
@@ -1179,7 +1186,7 @@ expand_stack_vars (bool (*pred) (size_t)
 	     space.  */
 	  if (large_size > 0 && ! large_allocation_done)
 	    {
-	      HOST_WIDE_INT loffset;
+	      poly_int64 loffset;
 	      rtx large_allocsize;
 
 	      large_allocsize = GEN_INT (large_size);
@@ -1282,7 +1289,8 @@ set_parm_rtl (tree parm, rtx x)
 static void
 expand_one_stack_var_1 (tree var)
 {
-  HOST_WIDE_INT size, offset;
+  HOST_WIDE_INT size;
+  poly_int64 offset;
   unsigned byte_align;
 
   if (TREE_CODE (var) == SSA_NAME)
@@ -2210,9 +2218,12 @@ expand_used_vars (void)
 	   in addition to phase 1 and 2.  */
 	expand_stack_vars (asan_decl_phase_3, &data);
 
-      if (!data.asan_vec.is_empty ())
+      /* ASAN description strings don't yet have a syntax for expressing
+	 polynomial offsets.  */
+      HOST_WIDE_INT prev_offset;
+      if (!data.asan_vec.is_empty ()
+	  && frame_offset.is_constant (&prev_offset))
 	{
-	  HOST_WIDE_INT prev_offset = frame_offset;
 	  HOST_WIDE_INT offset, sz, redzonesz;
 	  redzonesz = ASAN_RED_ZONE_SIZE;
 	  sz = data.asan_vec[0] - prev_offset;
@@ -2221,8 +2232,10 @@ expand_used_vars (void)
 	      && sz + ASAN_RED_ZONE_SIZE >= (int) data.asan_alignb)
 	    redzonesz = ((sz + ASAN_RED_ZONE_SIZE + data.asan_alignb - 1)
 			 & ~(data.asan_alignb - HOST_WIDE_INT_1)) - sz;
-	  offset
-	    = alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE);
+	  /* Allocating a constant amount of space from a constant
+	     starting offset must give a constant result.  */
+	  offset = (alloc_stack_frame_space (redzonesz, ASAN_RED_ZONE_SIZE)
+		    .to_constant ());
 	  data.asan_vec.safe_push (prev_offset);
 	  data.asan_vec.safe_push (offset);
 	  /* Leave space for alignment if STRICT_ALIGNMENT.  */
@@ -2267,9 +2280,10 @@ expand_used_vars (void)
   if (STACK_ALIGNMENT_NEEDED)
     {
       HOST_WIDE_INT align = PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT;
-      if (!FRAME_GROWS_DOWNWARD)
-	frame_offset += align - 1;
-      frame_offset &= -align;
+      if (FRAME_GROWS_DOWNWARD)
+	frame_offset = aligned_lower_bound (frame_offset, align);
+      else
+	frame_offset = aligned_upper_bound (frame_offset, align);
     }
 
   return var_end_seq;
Index: gcc/config/m32r/m32r-protos.h
===================================================================
--- gcc/config/m32r/m32r-protos.h	2017-10-23 17:07:40.163546918 +0100
+++ gcc/config/m32r/m32r-protos.h	2017-10-23 17:18:53.829515199 +0100
@@ -22,7 +22,7 @@
 
 extern void   m32r_init (void);
 extern void   m32r_init_expanders (void);
-extern unsigned m32r_compute_frame_size (int);
+extern unsigned m32r_compute_frame_size (poly_int64);
 extern void   m32r_expand_prologue (void);
 extern void   m32r_expand_epilogue (void);
 extern int    direct_return (void);
Index: gcc/config/m32r/m32r.c
===================================================================
--- gcc/config/m32r/m32r.c	2017-10-23 17:11:40.159782457 +0100
+++ gcc/config/m32r/m32r.c	2017-10-23 17:18:53.829515199 +0100
@@ -1551,7 +1551,7 @@ #define LONG_INSN_SIZE 4	/* Size of long
    SIZE is the size needed for local variables.  */
 
 unsigned int
-m32r_compute_frame_size (int size)	/* # of var. bytes allocated.  */
+m32r_compute_frame_size (poly_int64 size)   /* # of var. bytes allocated.  */
 {
   unsigned int regno;
   unsigned int total_size, var_size, args_size, pretend_size, extra_size;
Index: gcc/config/v850/v850-protos.h
===================================================================
--- gcc/config/v850/v850-protos.h	2017-10-23 17:07:40.163546918 +0100
+++ gcc/config/v850/v850-protos.h	2017-10-23 17:18:53.831515023 +0100
@@ -26,7 +26,7 @@ extern void   expand_prologue
 extern void   expand_epilogue               (void);
 extern int    v850_handle_pragma            (int (*)(void), void (*)(int), char *);
 extern int    compute_register_save_size    (long *);
-extern int    compute_frame_size            (int, long *);
+extern int    compute_frame_size            (poly_int64, long *);
 extern void   v850_init_expanders           (void);
 
 #ifdef RTX_CODE
Index: gcc/config/v850/v850.c
===================================================================
--- gcc/config/v850/v850.c	2017-10-23 17:11:40.188837984 +0100
+++ gcc/config/v850/v850.c	2017-10-23 17:18:53.831515023 +0100
@@ -1574,7 +1574,7 @@ compute_register_save_size (long * p_reg
   -------------------------- ---- ------------------   V */
 
 int
-compute_frame_size (int size, long * p_reg_saved)
+compute_frame_size (poly_int64 size, long * p_reg_saved)
 {
   return (size
 	  + compute_register_save_size (p_reg_saved)
Index: gcc/config/xtensa/xtensa-protos.h
===================================================================
--- gcc/config/xtensa/xtensa-protos.h	2017-10-23 17:07:40.163546918 +0100
+++ gcc/config/xtensa/xtensa-protos.h	2017-10-23 17:18:53.831515023 +0100
@@ -67,7 +67,7 @@ extern rtx xtensa_return_addr (int, rtx)
 
 extern void xtensa_setup_frame_addresses (void);
 extern int xtensa_dbx_register_number (int);
-extern long compute_frame_size (int);
+extern long compute_frame_size (poly_int64);
 extern bool xtensa_use_return_instruction_p (void);
 extern void xtensa_expand_prologue (void);
 extern void xtensa_expand_epilogue (void);
Index: gcc/config/xtensa/xtensa.c
===================================================================
--- gcc/config/xtensa/xtensa.c	2017-10-23 17:11:40.190841813 +0100
+++ gcc/config/xtensa/xtensa.c	2017-10-23 17:18:53.832514935 +0100
@@ -2690,7 +2690,7 @@ #define STACK_BYTES (STACK_BOUNDARY / BI
 #define XTENSA_STACK_ALIGN(LOC) (((LOC) + STACK_BYTES-1) & ~(STACK_BYTES-1))
 
 long
-compute_frame_size (int size)
+compute_frame_size (poly_int64 size)
 {
   int regno;
 
Index: gcc/config/pa/pa-protos.h
===================================================================
--- gcc/config/pa/pa-protos.h	2017-10-23 17:07:40.163546918 +0100
+++ gcc/config/pa/pa-protos.h	2017-10-23 17:18:53.829515199 +0100
@@ -85,7 +85,7 @@ extern int pa_shadd_constant_p (int);
 extern int pa_zdepi_cint_p (unsigned HOST_WIDE_INT);
 
 extern void pa_output_ascii (FILE *, const char *, int);
-extern HOST_WIDE_INT pa_compute_frame_size (HOST_WIDE_INT, int *);
+extern HOST_WIDE_INT pa_compute_frame_size (poly_int64, int *);
 extern void pa_expand_prologue (void);
 extern void pa_expand_epilogue (void);
 extern bool pa_can_use_return_insn (void);
Index: gcc/config/pa/pa.c
===================================================================
--- gcc/config/pa/pa.c	2017-10-23 17:11:40.168799690 +0100
+++ gcc/config/pa/pa.c	2017-10-23 17:18:53.830515111 +0100
@@ -3767,7 +3767,7 @@ set_reg_plus_d (int reg, int base, HOST_
 }
 
 HOST_WIDE_INT
-pa_compute_frame_size (HOST_WIDE_INT size, int *fregs_live)
+pa_compute_frame_size (poly_int64 size, int *fregs_live)
 {
   int freg_saved = 0;
   int i, j;
Index: gcc/explow.h
===================================================================
--- gcc/explow.h	2017-10-23 17:07:40.163546918 +0100
+++ gcc/explow.h	2017-10-23 17:18:53.832514935 +0100
@@ -101,8 +101,7 @@ extern rtx allocate_dynamic_stack_space
 extern void get_dynamic_stack_size (rtx *, unsigned, unsigned, HOST_WIDE_INT *);
 
 /* Returns the address of the dynamic stack space without allocating it.  */
-extern rtx get_dynamic_stack_base (HOST_WIDE_INT offset,
-				   unsigned required_align);
+extern rtx get_dynamic_stack_base (poly_int64, unsigned);
 
 /* Emit one stack probe at ADDRESS, an address within the stack.  */
 extern void emit_stack_probe (rtx);
Index: gcc/explow.c
===================================================================
--- gcc/explow.c	2017-10-23 17:11:40.226910743 +0100
+++ gcc/explow.c	2017-10-23 17:18:53.832514935 +0100
@@ -1579,7 +1579,7 @@ allocate_dynamic_stack_space (rtx size,
    of memory.  */
 
 rtx
-get_dynamic_stack_base (HOST_WIDE_INT offset, unsigned required_align)
+get_dynamic_stack_base (poly_int64 offset, unsigned required_align)
 {
   rtx target;
 
Index: gcc/final.c
===================================================================
--- gcc/final.c	2017-10-23 17:16:50.365528952 +0100
+++ gcc/final.c	2017-10-23 17:18:53.833514847 +0100
@@ -1828,14 +1828,15 @@ final_start_function (rtx_insn *first, F
       TREE_ASM_WRITTEN (DECL_INITIAL (current_function_decl)) = 1;
     }
 
+  HOST_WIDE_INT min_frame_size = constant_lower_bound (get_frame_size ());
   if (warn_frame_larger_than
-    && get_frame_size () > frame_larger_than_size)
-  {
+      && min_frame_size > frame_larger_than_size)
+    {
       /* Issue a warning */
       warning (OPT_Wframe_larger_than_,
-               "the frame size of %wd bytes is larger than %wd bytes",
-               get_frame_size (), frame_larger_than_size);
-  }
+	       "the frame size of %wd bytes is larger than %wd bytes",
+	       min_frame_size, frame_larger_than_size);
+    }
 
   /* First output the function prologue: code to set up the stack frame.  */
   targetm.asm_out.function_prologue (file);
Index: gcc/ira.c
===================================================================
--- gcc/ira.c	2017-10-23 17:16:50.369528412 +0100
+++ gcc/ira.c	2017-10-23 17:18:53.834514759 +0100
@@ -5550,13 +5550,13 @@ do_reload (void)
      function's frame size is larger than we expect.  */
   if (flag_stack_check == GENERIC_STACK_CHECK)
     {
-      HOST_WIDE_INT size = get_frame_size () + STACK_CHECK_FIXED_FRAME_SIZE;
+      poly_int64 size = get_frame_size () + STACK_CHECK_FIXED_FRAME_SIZE;
 
       for (int i = 0; i < FIRST_PSEUDO_REGISTER; i++)
 	if (df_regs_ever_live_p (i) && !fixed_regs[i] && call_used_regs[i])
 	  size += UNITS_PER_WORD;
 
-      if (size > STACK_CHECK_MAX_FRAME_SIZE)
+      if (constant_lower_bound (size) > STACK_CHECK_MAX_FRAME_SIZE)
 	warning (0, "frame size too large for reliable stack checking");
     }
 
Index: gcc/lra.c
===================================================================
--- gcc/lra.c	2017-10-23 17:11:40.394230500 +0100
+++ gcc/lra.c	2017-10-23 17:18:53.834514759 +0100
@@ -2371,7 +2371,7 @@ lra (FILE *f)
   bitmap_initialize (&lra_optional_reload_pseudos, &reg_obstack);
   bitmap_initialize (&lra_subreg_reload_pseudos, &reg_obstack);
   live_p = false;
-  if (get_frame_size () != 0 && crtl->stack_alignment_needed)
+  if (maybe_nonzero (get_frame_size ()) && crtl->stack_alignment_needed)
     /* If we have a stack frame, we must align it now.  The stack size
        may be a part of the offset computation for register
        elimination.  */
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2017-10-23 17:18:52.641619623 +0100
+++ gcc/reload1.c	2017-10-23 17:18:53.835514671 +0100
@@ -887,7 +887,7 @@ reload (rtx_insn *first, int global)
   for (;;)
     {
       int something_changed;
-      HOST_WIDE_INT starting_frame_size;
+      poly_int64 starting_frame_size;
 
       starting_frame_size = get_frame_size ();
       something_was_spilled = false;
@@ -955,7 +955,7 @@ reload (rtx_insn *first, int global)
       if (caller_save_needed)
 	setup_save_areas ();
 
-      if (starting_frame_size && crtl->stack_alignment_needed)
+      if (maybe_nonzero (starting_frame_size) && crtl->stack_alignment_needed)
 	{
 	  /* If we have a stack frame, we must align it now.  The
 	     stack size may be a part of the offset computation for
@@ -968,7 +968,8 @@ reload (rtx_insn *first, int global)
 	  assign_stack_local (BLKmode, 0, crtl->stack_alignment_needed);
 	}
       /* If we allocated another stack slot, redo elimination bookkeeping.  */
-      if (something_was_spilled || starting_frame_size != get_frame_size ())
+      if (something_was_spilled
+	  || may_ne (starting_frame_size, get_frame_size ()))
 	{
 	  if (update_eliminables_and_spill ())
 	    finish_spills (0);
@@ -994,7 +995,8 @@ reload (rtx_insn *first, int global)
 
       /* If we allocated any new memory locations, make another pass
 	 since it might have changed elimination offsets.  */
-      if (something_was_spilled || starting_frame_size != get_frame_size ())
+      if (something_was_spilled
+	  || may_ne (starting_frame_size, get_frame_size ()))
 	something_changed = 1;
 
       /* Even if the frame size remained the same, we might still have
@@ -1043,11 +1045,11 @@ reload (rtx_insn *first, int global)
   if (insns_need_reload != 0 || something_needs_elimination
       || something_needs_operands_changed)
     {
-      HOST_WIDE_INT old_frame_size = get_frame_size ();
+      poly_int64 old_frame_size = get_frame_size ();
 
       reload_as_needed (global);
 
-      gcc_assert (old_frame_size == get_frame_size ());
+      gcc_assert (must_eq (old_frame_size, get_frame_size ()));
 
       gcc_assert (verify_initial_elim_offsets ());
     }
Index: gcc/config/avr/avr.c
===================================================================
--- gcc/config/avr/avr.c	2017-10-23 17:11:40.146757566 +0100
+++ gcc/config/avr/avr.c	2017-10-23 17:18:53.829515199 +0100
@@ -2044,7 +2044,7 @@ avr_asm_function_end_prologue (FILE *fil
              avr_outgoing_args_size());
 
   fprintf (file, "/* frame size = " HOST_WIDE_INT_PRINT_DEC " */\n",
-           get_frame_size());
+           (HOST_WIDE_INT) get_frame_size());
 
   if (!cfun->machine->gasisr.yes)
     {
Index: gcc/config/pa/pa.h
===================================================================
--- gcc/config/pa/pa.h	2017-10-23 17:07:40.163546918 +0100
+++ gcc/config/pa/pa.h	2017-10-23 17:18:53.830515111 +0100
@@ -702,7 +702,7 @@ #define NO_PROFILE_COUNTERS 1
 extern int may_call_alloca;
 
 #define EXIT_IGNORE_STACK	\
- (get_frame_size () != 0	\
+ (maybe_nonzero (get_frame_size ())	\
   || cfun->calls_alloca || crtl->outgoing_args_size)
 
 /* Length in units of the trampoline for entering a nested function.  */
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:16:50.375527601 +0100
+++ gcc/rtlanal.c	2017-10-23 17:18:53.836514583 +0100
@@ -344,7 +344,7 @@ rtx_varies_p (const_rtx x, bool for_alia
    FROM and TO for the current function, as it was at the start
    of the routine.  */
 
-static HOST_WIDE_INT
+static poly_int64
 get_initial_register_offset (int from, int to)
 {
   static const struct elim_table_t
@@ -352,7 +352,7 @@ get_initial_register_offset (int from, i
     const int from;
     const int to;
   } table[] = ELIMINABLE_REGS;
-  HOST_WIDE_INT offset1, offset2;
+  poly_int64 offset1, offset2;
   unsigned int i, j;
 
   if (to == from)

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [047/nnn] poly_int: argument sizes
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (44 preceding siblings ...)
  2017-10-23 17:19 ` [043/nnn] poly_int: frame allocations Richard Sandiford
@ 2017-10-23 17:20 ` Richard Sandiford
  2017-12-06 20:57   ` Jeff Law
  2017-10-23 17:20 ` [046/nnn] poly_int: instantiate_virtual_regs Richard Sandiford
                   ` (61 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:20 UTC (permalink / raw)
  To: gcc-patches

This patch changes various bits of state related to argument sizes so
that they have type poly_int64 rather than HOST_WIDE_INT.  This includes:

- incoming_args::pops_args and incoming_args::size
- rtl_data::outgoing_args_size
- pending_stack_adjust
- stack_pointer_delta
- stack_usage::pushed_stack_size
- args_size::constant

It also changes TARGET_RETURN_POPS_ARGS so that the size of the
arguments passed in and the size returned by the hook are both
poly_int64s.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* target.def (return_pops_args): Treat both the input and output
	sizes as poly_int64s rather than HOST_WIDE_INTS.
	* targhooks.h (default_return_pops_args): Update accordingly.
	* targhooks.c (default_return_pops_args): Likewise.
	* doc/tm.texi: Regenerate.
	* emit-rtl.h (incoming_args): Change pops_args, size and
	outgoing_args_size from int to poly_int64_pod.
	* function.h (expr_status): Change x_pending_stack_adjust and
	x_stack_pointer_delta from int to poly_int64.
	(args_size::constant): Change from HOST_WIDE_INT to poly_int64.
	(ARGS_SIZE_RTX): Update accordingly.
	* calls.c (highest_outgoing_arg_in_use): Change from int to
	unsigned int.
	(stack_usage_watermark, stored_args_watermark): New variables.
	(stack_region_maybe_used_p, mark_stack_region_used): New functions.
	(emit_call_1): Change the stack_size and rounded_stack_size
	parameters from HOST_WIDE_INT to poly_int64.  Track n_popped
	as a poly_int64.
	(save_fixed_argument_area): Check stack_usage_watermark.
	(initialize_argument_information): Change old_pending_adj from
	a HOST_WIDE_INT * to a poly_int64_pod *.
	(compute_argument_block_size): Return the size as a poly_int64
	rather than an int.
	(finalize_must_preallocate): Track polynomial argument sizes.
	(compute_argument_addresses): Likewise.
	(internal_arg_pointer_based_exp): Track polynomial offsets.
	(mem_overlaps_already_clobbered_arg_p): Rename to...
	(mem_might_overlap_already_clobbered_arg_p): ...this and take the
	size as a poly_uint64 rather than an unsigned HOST_WIDE_INT.
	Check stored_args_used_watermark.
	(load_register_parameters): Update accordingly.
	(check_sibcall_argument_overlap_1): Likewise.
	(combine_pending_stack_adjustment_and_call): Take the unadjusted
	args size as a poly_int64 rather than an int.  Return a bool
	indicating whether the optimization was possible and return
	the new adjustment by reference.
	(check_sibcall_argument_overlap): Track polynomail argument sizes.
	Update stored_args_watermark.
	(can_implement_as_sibling_call_p): Handle polynomial argument sizes.
	(expand_call): Likewise.  Maintain stack_usage_watermark and
	stored_args_watermark.  Update calls to
	combine_pending_stack_adjustment_and_call.
	(emit_library_call_value_1): Handle polynomial argument sizes.
	Call stack_region_maybe_used_p and mark_stack_region_used.
	Maintain stack_usage_watermark.
	(store_one_arg): Likewise.  Update call to
	mem_overlaps_already_clobbered_arg_p.
	* config/arm/arm.c (arm_output_function_prologue): Add a cast to
	HOST_WIDE_INT.
	* config/avr/avr.c (avr_outgoing_args_size): Likewise.
	* config/microblaze/microblaze.c (microblaze_function_prologue):
	Likewise.
	* config/cr16/cr16.c (cr16_return_pops_args): Update for new
	TARGET_RETURN_POPS_ARGS interface.
	(cr16_compute_frame, cr16_initial_elimination_offset): Add casts
	to HOST_WIDE_INT.
	* config/ft32/ft32.c (ft32_compute_frame): Likewise.
	* config/i386/i386.c (ix86_return_pops_args): Update for new
	TARGET_RETURN_POPS_ARGS interface.
	(ix86_expand_split_stack_prologue): Add a cast to HOST_WIDE_INT.
	* config/moxie/moxie.c (moxie_compute_frame): Likewise.
	* config/m68k/m68k.c (m68k_return_pops_args): Update for new
	TARGET_RETURN_POPS_ARGS interface.
	* config/vax/vax.c (vax_return_pops_args): Likewise.
	* config/pa/pa.h (STACK_POINTER_OFFSET): Add a cast to poly_int64.
	(EXIT_IGNORE_STACK): Update reference to crtl->outgoing_args_size.
	* config/arm/arm.h (CALLER_INTERWORKING_SLOT_SIZE): Likewise.
	* config/powerpcspe/aix.h (STACK_DYNAMIC_OFFSET): Likewise.
	* config/powerpcspe/darwin.h (STACK_DYNAMIC_OFFSET): Likewise.
	* config/powerpcspe/powerpcspe.h (STACK_DYNAMIC_OFFSET): Likewise.
	* config/rs6000/aix.h (STACK_DYNAMIC_OFFSET): Likewise.
	* config/rs6000/darwin.h (STACK_DYNAMIC_OFFSET): Likewise.
	* config/rs6000/rs6000.h (STACK_DYNAMIC_OFFSET): Likewise.
	* dojump.h (saved_pending_stack_adjust): Change x_pending_stack_adjust
	and x_stack_pointer_delta from int to poly_int64.
	* dojump.c (do_pending_stack_adjust): Update accordingly.
	* explow.c (allocate_dynamic_stack_space): Handle polynomial
	stack_pointer_deltas.
	* function.c (STACK_DYNAMIC_OFFSET): Add a cast to poly_int64.
	(pad_to_arg_alignment): Track polynomial offsets.
	(assign_parm_find_stack_rtl): Likewise.
	(assign_parms, locate_and_pad_parm): Handle polynomial argument sizes.
	* toplev.c (output_stack_usage): Update reference to
	current_function_pushed_stack_size.

Index: gcc/target.def
===================================================================
--- gcc/target.def	2017-10-23 17:11:40.311071579 +0100
+++ gcc/target.def	2017-10-23 17:19:01.411170305 +0100
@@ -5043,7 +5043,7 @@ arguments pop them but other functions (
 nothing (the caller pops all).  When this convention is in use,\n\
 @var{funtype} is examined to determine whether a function takes a fixed\n\
 number of arguments.",
- int, (tree fundecl, tree funtype, int size),
+ poly_int64, (tree fundecl, tree funtype, poly_int64 size),
  default_return_pops_args)
 
 /* Return a mode wide enough to copy any function value that might be
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	2017-10-23 17:11:40.312073494 +0100
+++ gcc/targhooks.h	2017-10-23 17:19:01.411170305 +0100
@@ -154,7 +154,7 @@ extern bool default_function_value_regno
 extern rtx default_internal_arg_pointer (void);
 extern rtx default_static_chain (const_tree, bool);
 extern void default_trampoline_init (rtx, tree, rtx);
-extern int default_return_pops_args (tree, tree, int);
+extern poly_int64 default_return_pops_args (tree, tree, poly_int64);
 extern reg_class_t default_branch_target_register_class (void);
 extern reg_class_t default_ira_change_pseudo_allocno_class (int, reg_class_t,
 							    reg_class_t);
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2017-10-23 17:11:40.312073494 +0100
+++ gcc/targhooks.c	2017-10-23 17:19:01.411170305 +0100
@@ -1009,10 +1009,8 @@ default_trampoline_init (rtx ARG_UNUSED
   sorry ("nested function trampolines not supported on this target");
 }
 
-int
-default_return_pops_args (tree fundecl ATTRIBUTE_UNUSED,
-			  tree funtype ATTRIBUTE_UNUSED,
-			  int size ATTRIBUTE_UNUSED)
+poly_int64
+default_return_pops_args (tree, tree, poly_int64)
 {
   return 0;
 }
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2017-10-23 17:11:40.308065835 +0100
+++ gcc/doc/tm.texi	2017-10-23 17:19:01.408170265 +0100
@@ -3822,7 +3822,7 @@ suppresses this behavior and causes the
 stack in its natural location.
 @end defmac
 
-@deftypefn {Target Hook} int TARGET_RETURN_POPS_ARGS (tree @var{fundecl}, tree @var{funtype}, int @var{size})
+@deftypefn {Target Hook} poly_int64 TARGET_RETURN_POPS_ARGS (tree @var{fundecl}, tree @var{funtype}, poly_int64 @var{size})
 This target hook returns the number of bytes of its own arguments that
 a function pops on returning, or 0 if the function pops no arguments
 and the caller must therefore pop them all after the function returns.
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-10-23 17:18:53.832514935 +0100
+++ gcc/emit-rtl.h	2017-10-23 17:19:01.409170278 +0100
@@ -28,12 +28,12 @@ struct GTY(()) incoming_args {
   /* Number of bytes of args popped by function being compiled on its return.
      Zero if no bytes are to be popped.
      May affect compilation of return insn or of function epilogue.  */
-  int pops_args;
+  poly_int64_pod pops_args;
 
   /* If function's args have a fixed size, this is that size, in bytes.
      Otherwise, it is -1.
      May affect compilation of return insn or of function epilogue.  */
-  int size;
+  poly_int64_pod size;
 
   /* # bytes the prologue should push and pretend that the caller pushed them.
      The prologue must do this, but only if parms can be passed in
@@ -68,7 +68,7 @@ struct GTY(()) rtl_data {
 
   /* # of bytes of outgoing arguments.  If ACCUMULATE_OUTGOING_ARGS is
      defined, the needed space is pushed by the prologue.  */
-  int outgoing_args_size;
+  poly_int64_pod outgoing_args_size;
 
   /* If nonzero, an RTL expression for the location at which the current
      function returns its result.  If the current function returns its
Index: gcc/function.h
===================================================================
--- gcc/function.h	2017-10-23 17:18:53.834514759 +0100
+++ gcc/function.h	2017-10-23 17:19:01.410170292 +0100
@@ -94,7 +94,7 @@ #define REGNO_POINTER_ALIGN(REGNO) (crtl
 struct GTY(()) expr_status {
   /* Number of units that we should eventually pop off the stack.
      These are the arguments to function calls that have already returned.  */
-  int x_pending_stack_adjust;
+  poly_int64_pod x_pending_stack_adjust;
 
   /* Under some ABIs, it is the caller's responsibility to pop arguments
      pushed for function calls.  A naive implementation would simply pop
@@ -117,7 +117,7 @@ struct GTY(()) expr_status {
      boundary can be momentarily unaligned while pushing the arguments.
      Record the delta since last aligned boundary here in order to get
      stack alignment in the nested function calls working right.  */
-  int x_stack_pointer_delta;
+  poly_int64_pod x_stack_pointer_delta;
 
   /* Nonzero means __builtin_saveregs has already been done in this function.
      The value is the pseudoreg containing the value __builtin_saveregs
@@ -200,9 +200,10 @@ struct GTY(()) stack_usage
      meaningful only if has_unbounded_dynamic_stack_size is zero.  */
   HOST_WIDE_INT dynamic_stack_size;
 
-  /* # of bytes of space pushed onto the stack after the prologue.  If
-     !ACCUMULATE_OUTGOING_ARGS, it contains the outgoing arguments.  */
-  int pushed_stack_size;
+  /* Upper bound on the number of bytes pushed onto the stack after the
+     prologue.  If !ACCUMULATE_OUTGOING_ARGS, it contains the outgoing
+     arguments.  */
+  poly_int64 pushed_stack_size;
 
   /* Nonzero if the amount of stack space allocated dynamically cannot
      be bounded at compile-time.  */
@@ -476,7 +477,7 @@ extern struct machine_function * (*init_
 
 struct args_size
 {
-  HOST_WIDE_INT constant;
+  poly_int64_pod constant;
   tree var;
 };
 
@@ -538,7 +539,7 @@ #define ARGS_SIZE_TREE(SIZE)					\
 
 /* Convert the implicit sum in a `struct args_size' into an rtx.  */
 #define ARGS_SIZE_RTX(SIZE)					\
-((SIZE).var == 0 ? GEN_INT ((SIZE).constant)			\
+((SIZE).var == 0 ? gen_int_mode ((SIZE).constant, Pmode)	\
  : expand_normal (ARGS_SIZE_TREE (SIZE)))
 
 #define ASLK_REDUCE_ALIGN 1
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-10-23 17:18:57.856161229 +0100
+++ gcc/calls.c	2017-10-23 17:19:01.395170091 +0100
@@ -127,7 +127,11 @@ struct arg_data
 static char *stack_usage_map;
 
 /* Size of STACK_USAGE_MAP.  */
-static int highest_outgoing_arg_in_use;
+static unsigned int highest_outgoing_arg_in_use;
+
+/* Assume that any stack location at this byte index is used,
+   without checking the contents of stack_usage_map.  */
+static unsigned HOST_WIDE_INT stack_usage_watermark = HOST_WIDE_INT_M1U;
 
 /* A bitmap of virtual-incoming stack space.  Bit is set if the corresponding
    stack location's tail call argument has been already stored into the stack.
@@ -136,6 +140,10 @@ struct arg_data
    overwritten with tail call arguments.  */
 static sbitmap stored_args_map;
 
+/* Assume that any virtual-incoming location at this byte index has been
+   stored, without checking the contents of stored_args_map.  */
+static unsigned HOST_WIDE_INT stored_args_watermark;
+
 /* stack_arg_under_construction is nonzero when an argument may be
    initialized with a constructor call (including a C function that
    returns a BLKmode struct) and expand_call must take special action
@@ -143,9 +151,6 @@ struct arg_data
    argument list for the constructor call.  */
 static int stack_arg_under_construction;
 
-static void emit_call_1 (rtx, tree, tree, tree, HOST_WIDE_INT, HOST_WIDE_INT,
-			 HOST_WIDE_INT, rtx, rtx, int, rtx, int,
-			 cumulative_args_t);
 static void precompute_register_parameters (int, struct arg_data *, int *);
 static void store_bounds (struct arg_data *, struct arg_data *);
 static int store_one_arg (struct arg_data *, rtx, int, int, int);
@@ -153,13 +158,6 @@ static void store_unaligned_arguments_in
 static int finalize_must_preallocate (int, int, struct arg_data *,
 				      struct args_size *);
 static void precompute_arguments (int, struct arg_data *);
-static int compute_argument_block_size (int, struct args_size *, tree, tree, int);
-static void initialize_argument_information (int, struct arg_data *,
-					     struct args_size *, int,
-					     tree, tree,
-					     tree, tree, cumulative_args_t, int,
-					     rtx *, int *, int *, int *,
-					     bool *, bool);
 static void compute_argument_addresses (struct arg_data *, rtx, int);
 static rtx rtx_for_function_call (tree, tree);
 static void load_register_parameters (struct arg_data *, int, rtx *, int,
@@ -168,8 +166,6 @@ static int special_function_p (const_tre
 static int check_sibcall_argument_overlap_1 (rtx);
 static int check_sibcall_argument_overlap (rtx_insn *, struct arg_data *, int);
 
-static int combine_pending_stack_adjustment_and_call (int, struct args_size *,
-						      unsigned int);
 static tree split_complex_types (tree);
 
 #ifdef REG_PARM_STACK_SPACE
@@ -177,6 +173,46 @@ static rtx save_fixed_argument_area (int
 static void restore_fixed_argument_area (rtx, rtx, int, int);
 #endif
 \f
+/* Return true if bytes [LOWER_BOUND, UPPER_BOUND) of the outgoing
+   stack region might already be in use.  */
+
+static bool
+stack_region_maybe_used_p (poly_uint64 lower_bound, poly_uint64 upper_bound,
+			   unsigned int reg_parm_stack_space)
+{
+  unsigned HOST_WIDE_INT const_lower, const_upper;
+  const_lower = constant_lower_bound (lower_bound);
+  if (!upper_bound.is_constant (&const_upper))
+    const_upper = HOST_WIDE_INT_M1U;
+
+  if (const_upper > stack_usage_watermark)
+    return true;
+
+  /* Don't worry about things in the fixed argument area;
+     it has already been saved.  */
+  const_lower = MAX (const_lower, reg_parm_stack_space);
+  const_upper = MIN (const_upper, highest_outgoing_arg_in_use);
+  for (unsigned HOST_WIDE_INT i = const_lower; i < const_upper; ++i)
+    if (stack_usage_map[i])
+      return true;
+  return false;
+}
+
+/* Record that bytes [LOWER_BOUND, UPPER_BOUND) of the outgoing
+   stack region are now in use.  */
+
+static void
+mark_stack_region_used (poly_uint64 lower_bound, poly_uint64 upper_bound)
+{
+  unsigned HOST_WIDE_INT const_lower, const_upper;
+  const_lower = constant_lower_bound (lower_bound);
+  if (upper_bound.is_constant (&const_upper))
+    for (unsigned HOST_WIDE_INT i = const_lower; i < const_upper; ++i)
+      stack_usage_map[i] = 1;
+  else
+    stack_usage_watermark = MIN (stack_usage_watermark, const_lower);
+}
+
 /* Force FUNEXP into a form suitable for the address of a CALL,
    and return that as an rtx.  Also load the static chain register
    if FNDECL is a nested function.
@@ -339,17 +375,17 @@ prepare_call_address (tree fndecl_or_typ
 static void
 emit_call_1 (rtx funexp, tree fntree ATTRIBUTE_UNUSED, tree fndecl ATTRIBUTE_UNUSED,
 	     tree funtype ATTRIBUTE_UNUSED,
-	     HOST_WIDE_INT stack_size ATTRIBUTE_UNUSED,
-	     HOST_WIDE_INT rounded_stack_size,
+	     poly_int64 stack_size ATTRIBUTE_UNUSED,
+	     poly_int64 rounded_stack_size,
 	     HOST_WIDE_INT struct_value_size ATTRIBUTE_UNUSED,
 	     rtx next_arg_reg ATTRIBUTE_UNUSED, rtx valreg,
 	     int old_inhibit_defer_pop, rtx call_fusage, int ecf_flags,
 	     cumulative_args_t args_so_far ATTRIBUTE_UNUSED)
 {
-  rtx rounded_stack_size_rtx = GEN_INT (rounded_stack_size);
+  rtx rounded_stack_size_rtx = gen_int_mode (rounded_stack_size, Pmode);
   rtx call, funmem, pat;
   int already_popped = 0;
-  HOST_WIDE_INT n_popped = 0;
+  poly_int64 n_popped = 0;
 
   /* Sibling call patterns never pop arguments (no sibcall(_value)_pop
      patterns exist).  Any popping that the callee does on return will
@@ -407,12 +443,12 @@ emit_call_1 (rtx funexp, tree fntree ATT
      if no arguments are actually popped.  If the target does not have
      "call" or "call_value" insns, then we must use the popping versions
      even if the call has no arguments to pop.  */
-  else if (n_popped > 0
+  else if (maybe_nonzero (n_popped)
 	   || !(valreg
 		? targetm.have_call_value ()
 		: targetm.have_call ()))
     {
-      rtx n_pop = GEN_INT (n_popped);
+      rtx n_pop = gen_int_mode (n_popped, Pmode);
 
       /* If this subroutine pops its own args, record that in the call insn
 	 if possible, for the sake of frame pointer elimination.  */
@@ -486,7 +522,7 @@ emit_call_1 (rtx funexp, tree fntree ATT
      if the context of the call as a whole permits.  */
   inhibit_defer_pop = old_inhibit_defer_pop;
 
-  if (n_popped > 0)
+  if (maybe_nonzero (n_popped))
     {
       if (!already_popped)
 	CALL_INSN_FUNCTION_USAGE (call_insn)
@@ -494,7 +530,7 @@ emit_call_1 (rtx funexp, tree fntree ATT
 			       gen_rtx_CLOBBER (VOIDmode, stack_pointer_rtx),
 			       CALL_INSN_FUNCTION_USAGE (call_insn));
       rounded_stack_size -= n_popped;
-      rounded_stack_size_rtx = GEN_INT (rounded_stack_size);
+      rounded_stack_size_rtx = gen_int_mode (rounded_stack_size, Pmode);
       stack_pointer_delta -= n_popped;
 
       add_args_size_note (call_insn, stack_pointer_delta);
@@ -518,7 +554,7 @@ emit_call_1 (rtx funexp, tree fntree ATT
 	 If returning from the subroutine does pop the args, indicate that the
 	 stack pointer will be changed.  */
 
-      if (rounded_stack_size != 0)
+      if (maybe_nonzero (rounded_stack_size))
 	{
 	  if (ecf_flags & ECF_NORETURN)
 	    /* Just pretend we did the pop.  */
@@ -541,8 +577,8 @@ emit_call_1 (rtx funexp, tree fntree ATT
 
      ??? It will be worthwhile to enable combine_stack_adjustments even for
      such machines.  */
-  else if (n_popped)
-    anti_adjust_stack (GEN_INT (n_popped));
+  else if (maybe_nonzero (n_popped))
+    anti_adjust_stack (gen_int_mode (n_popped, Pmode));
 }
 
 /* Determine if the function identified by FNDECL is one with
@@ -1017,8 +1053,8 @@ precompute_register_parameters (int num_
 static rtx
 save_fixed_argument_area (int reg_parm_stack_space, rtx argblock, int *low_to_save, int *high_to_save)
 {
-  int low;
-  int high;
+  unsigned int low;
+  unsigned int high;
 
   /* Compute the boundary of the area that needs to be saved, if any.  */
   high = reg_parm_stack_space;
@@ -1029,7 +1065,7 @@ save_fixed_argument_area (int reg_parm_s
     high = highest_outgoing_arg_in_use;
 
   for (low = 0; low < high; low++)
-    if (stack_usage_map[low] != 0)
+    if (stack_usage_map[low] != 0 || low >= stack_usage_watermark)
       {
 	int num_to_save;
 	machine_mode save_mode;
@@ -1555,7 +1591,8 @@ initialize_argument_information (int num
 				 tree fndecl, tree fntype,
 				 cumulative_args_t args_so_far,
 				 int reg_parm_stack_space,
-				 rtx *old_stack_level, int *old_pending_adj,
+				 rtx *old_stack_level,
+				 poly_int64_pod *old_pending_adj,
 				 int *must_preallocate, int *ecf_flags,
 				 bool *may_tailcall, bool call_from_thunk_p)
 {
@@ -1958,14 +1995,14 @@ initialize_argument_information (int num
    REG_PARM_STACK_SPACE holds the number of bytes of stack space reserved
    for arguments passed in registers.  */
 
-static int
+static poly_int64
 compute_argument_block_size (int reg_parm_stack_space,
 			     struct args_size *args_size,
 			     tree fndecl ATTRIBUTE_UNUSED,
 			     tree fntype ATTRIBUTE_UNUSED,
 			     int preferred_stack_boundary ATTRIBUTE_UNUSED)
 {
-  int unadjusted_args_size = args_size->constant;
+  poly_int64 unadjusted_args_size = args_size->constant;
 
   /* For accumulate outgoing args mode we don't need to align, since the frame
      will be already aligned.  Align to STACK_BOUNDARY in order to prevent
@@ -1988,7 +2025,8 @@ compute_argument_block_size (int reg_par
 	  /* We don't handle this case yet.  To handle it correctly we have
 	     to add the delta, round and subtract the delta.
 	     Currently no machine description requires this support.  */
-	  gcc_assert (!(stack_pointer_delta & (preferred_stack_boundary - 1)));
+	  gcc_assert (multiple_p (stack_pointer_delta,
+				  preferred_stack_boundary));
 	  args_size->var = round_up (args_size->var, preferred_stack_boundary);
 	}
 
@@ -2011,15 +2049,13 @@ compute_argument_block_size (int reg_par
       preferred_stack_boundary /= BITS_PER_UNIT;
       if (preferred_stack_boundary < 1)
 	preferred_stack_boundary = 1;
-      args_size->constant = (((args_size->constant
-			       + stack_pointer_delta
-			       + preferred_stack_boundary - 1)
-			      / preferred_stack_boundary
-			      * preferred_stack_boundary)
+      args_size->constant = (aligned_upper_bound (args_size->constant
+						  + stack_pointer_delta,
+						  preferred_stack_boundary)
 			     - stack_pointer_delta);
 
-      args_size->constant = MAX (args_size->constant,
-				 reg_parm_stack_space);
+      args_size->constant = upper_bound (args_size->constant,
+					 reg_parm_stack_space);
 
       if (! OUTGOING_REG_PARM_STACK_SPACE ((!fndecl ? fntype : TREE_TYPE (fndecl))))
 	args_size->constant -= reg_parm_stack_space;
@@ -2124,7 +2160,7 @@ finalize_must_preallocate (int must_prea
   if (! must_preallocate)
     {
       int partial_seen = 0;
-      int copy_to_evaluate_size = 0;
+      poly_int64 copy_to_evaluate_size = 0;
       int i;
 
       for (i = 0; i < num_actuals && ! must_preallocate; i++)
@@ -2149,8 +2185,8 @@ finalize_must_preallocate (int must_prea
 	      += int_size_in_bytes (TREE_TYPE (args[i].tree_value));
 	}
 
-      if (copy_to_evaluate_size * 2 >= args_size->constant
-	  && args_size->constant > 0)
+      if (maybe_nonzero (args_size->constant)
+	  && may_ge (copy_to_evaluate_size * 2, args_size->constant))
 	must_preallocate = 1;
     }
   return must_preallocate;
@@ -2170,10 +2206,14 @@ compute_argument_addresses (struct arg_d
   if (argblock)
     {
       rtx arg_reg = argblock;
-      int i, arg_offset = 0;
+      int i;
+      poly_int64 arg_offset = 0;
 
       if (GET_CODE (argblock) == PLUS)
-	arg_reg = XEXP (argblock, 0), arg_offset = INTVAL (XEXP (argblock, 1));
+	{
+	  arg_reg = XEXP (argblock, 0);
+	  arg_offset = rtx_to_poly_int64 (XEXP (argblock, 1));
+	}
 
       for (i = 0; i < num_actuals; i++)
 	{
@@ -2181,7 +2221,7 @@ compute_argument_addresses (struct arg_d
 	  rtx slot_offset = ARGS_SIZE_RTX (args[i].locate.slot_offset);
 	  rtx addr;
 	  unsigned int align, boundary;
-	  unsigned int units_on_stack = 0;
+	  poly_uint64 units_on_stack = 0;
 	  machine_mode partial_mode = VOIDmode;
 
 	  /* Skip this parm if it will not be passed on the stack.  */
@@ -2202,7 +2242,7 @@ compute_argument_addresses (struct arg_d
 	      /* Only part of the parameter is being passed on the stack.
 		 Generate a simple memory reference of the correct size.  */
 	      units_on_stack = args[i].locate.size.constant;
-	      unsigned int bits_on_stack = units_on_stack * BITS_PER_UNIT;
+	      poly_uint64 bits_on_stack = units_on_stack * BITS_PER_UNIT;
 	      partial_mode = int_mode_for_size (bits_on_stack, 1).else_blk ();
 	      args[i].stack = gen_rtx_MEM (partial_mode, addr);
 	      set_mem_size (args[i].stack, units_on_stack);
@@ -2215,12 +2255,16 @@ compute_argument_addresses (struct arg_d
 	    }
 	  align = BITS_PER_UNIT;
 	  boundary = args[i].locate.boundary;
+	  poly_int64 offset_val;
 	  if (args[i].locate.where_pad != PAD_DOWNWARD)
 	    align = boundary;
-	  else if (CONST_INT_P (offset))
+	  else if (poly_int_rtx_p (offset, &offset_val))
 	    {
-	      align = INTVAL (offset) * BITS_PER_UNIT | boundary;
-	      align = least_bit_hwi (align);
+	      align = least_bit_hwi (boundary);
+	      unsigned int offset_align
+		= known_alignment (offset_val) * BITS_PER_UNIT;
+	      if (offset_align != 0)
+		align = MIN (align, offset_align);
 	    }
 	  set_mem_align (args[i].stack, align);
 
@@ -2363,12 +2407,13 @@ internal_arg_pointer_based_exp (const_rt
   if (REG_P (rtl) && HARD_REGISTER_P (rtl))
     return NULL_RTX;
 
-  if (GET_CODE (rtl) == PLUS && CONST_INT_P (XEXP (rtl, 1)))
+  poly_int64 offset;
+  if (GET_CODE (rtl) == PLUS && poly_int_rtx_p (XEXP (rtl, 1), &offset))
     {
       rtx val = internal_arg_pointer_based_exp (XEXP (rtl, 0), toplevel);
       if (val == NULL_RTX || val == pc_rtx)
 	return val;
-      return plus_constant (Pmode, val, INTVAL (XEXP (rtl, 1)));
+      return plus_constant (Pmode, val, offset);
     }
 
   /* When called at the topmost level, scan pseudo assignments in between the
@@ -2399,45 +2444,53 @@ internal_arg_pointer_based_exp (const_rt
   return NULL_RTX;
 }
 
-/* Return true if and only if SIZE storage units (usually bytes)
-   starting from address ADDR overlap with already clobbered argument
-   area.  This function is used to determine if we should give up a
-   sibcall.  */
+/* Return true if SIZE bytes starting from address ADDR might overlap an
+   already-clobbered argument area.  This function is used to determine
+   if we should give up a sibcall.  */
 
 static bool
-mem_overlaps_already_clobbered_arg_p (rtx addr, unsigned HOST_WIDE_INT size)
+mem_might_overlap_already_clobbered_arg_p (rtx addr, poly_uint64 size)
 {
-  HOST_WIDE_INT i;
+  poly_int64 i;
+  unsigned HOST_WIDE_INT start, end;
   rtx val;
 
-  if (bitmap_empty_p (stored_args_map))
+  if (bitmap_empty_p (stored_args_map)
+      && stored_args_watermark == HOST_WIDE_INT_M1U)
     return false;
   val = internal_arg_pointer_based_exp (addr, true);
   if (val == NULL_RTX)
     return false;
-  else if (val == pc_rtx)
+  else if (!poly_int_rtx_p (val, &i))
     return true;
-  else
-    i = INTVAL (val);
+
+  if (known_zero (size))
+    return false;
 
   if (STACK_GROWS_DOWNWARD)
     i -= crtl->args.pretend_args_size;
   else
     i += crtl->args.pretend_args_size;
 
-
   if (ARGS_GROW_DOWNWARD)
     i = -i - size;
 
-  if (size > 0)
-    {
-      unsigned HOST_WIDE_INT k;
+  /* We can ignore any references to the function's pretend args,
+     which at this point would manifest as negative values of I.  */
+  if (must_le (i, 0) && must_le (size, poly_uint64 (-i)))
+    return false;
 
-      for (k = 0; k < size; k++)
-	if (i + k < SBITMAP_SIZE (stored_args_map)
-	    && bitmap_bit_p (stored_args_map, i + k))
-	  return true;
-    }
+  start = may_lt (i, 0) ? 0 : constant_lower_bound (i);
+  if (!(i + size).is_constant (&end))
+    end = HOST_WIDE_INT_M1U;
+
+  if (end > stored_args_watermark)
+    return true;
+
+  end = MIN (end, SBITMAP_SIZE (stored_args_map));
+  for (unsigned HOST_WIDE_INT k = start; k < end; ++k)
+    if (bitmap_bit_p (stored_args_map, k))
+      return true;
 
   return false;
 }
@@ -2541,7 +2594,7 @@ load_register_parameters (struct arg_dat
 	         providing that this has non-zero size.  */
 	      if (is_sibcall
 		  && size != 0
-		  && (mem_overlaps_already_clobbered_arg_p
+		  && (mem_might_overlap_already_clobbered_arg_p
 		      (XEXP (args[i].value, 0), size)))
 		*sibcall_failure = 1;
 
@@ -2610,27 +2663,32 @@ load_register_parameters (struct arg_dat
 /* We need to pop PENDING_STACK_ADJUST bytes.  But, if the arguments
    wouldn't fill up an even multiple of PREFERRED_UNIT_STACK_BOUNDARY
    bytes, then we would need to push some additional bytes to pad the
-   arguments.  So, we compute an adjust to the stack pointer for an
+   arguments.  So, we try to compute an adjust to the stack pointer for an
    amount that will leave the stack under-aligned by UNADJUSTED_ARGS_SIZE
    bytes.  Then, when the arguments are pushed the stack will be perfectly
-   aligned.  ARGS_SIZE->CONSTANT is set to the number of bytes that should
-   be popped after the call.  Returns the adjustment.  */
+   aligned.
 
-static int
-combine_pending_stack_adjustment_and_call (int unadjusted_args_size,
+   Return true if this optimization is possible, storing the adjustment
+   in ADJUSTMENT_OUT and setting ARGS_SIZE->CONSTANT to the number of
+   bytes that should be popped after the call.  */
+
+static bool
+combine_pending_stack_adjustment_and_call (poly_int64_pod *adjustment_out,
+					   poly_int64 unadjusted_args_size,
 					   struct args_size *args_size,
 					   unsigned int preferred_unit_stack_boundary)
 {
   /* The number of bytes to pop so that the stack will be
      under-aligned by UNADJUSTED_ARGS_SIZE bytes.  */
-  HOST_WIDE_INT adjustment;
+  poly_int64 adjustment;
   /* The alignment of the stack after the arguments are pushed, if we
      just pushed the arguments without adjust the stack here.  */
   unsigned HOST_WIDE_INT unadjusted_alignment;
 
-  unadjusted_alignment
-    = ((stack_pointer_delta + unadjusted_args_size)
-       % preferred_unit_stack_boundary);
+  if (!known_misalignment (stack_pointer_delta + unadjusted_args_size,
+			   preferred_unit_stack_boundary,
+			   &unadjusted_alignment))
+    return false;
 
   /* We want to get rid of as many of the PENDING_STACK_ADJUST bytes
      as possible -- leaving just enough left to cancel out the
@@ -2639,15 +2697,24 @@ combine_pending_stack_adjustment_and_cal
      -UNADJUSTED_ALIGNMENT modulo the PREFERRED_UNIT_STACK_BOUNDARY.  */
 
   /* Begin by trying to pop all the bytes.  */
-  unadjusted_alignment
-    = (unadjusted_alignment
-       - (pending_stack_adjust % preferred_unit_stack_boundary));
+  unsigned HOST_WIDE_INT tmp_misalignment;
+  if (!known_misalignment (pending_stack_adjust,
+			   preferred_unit_stack_boundary,
+			   &tmp_misalignment))
+    return false;
+  unadjusted_alignment -= tmp_misalignment;
   adjustment = pending_stack_adjust;
   /* Push enough additional bytes that the stack will be aligned
      after the arguments are pushed.  */
   if (preferred_unit_stack_boundary > 1 && unadjusted_alignment)
     adjustment -= preferred_unit_stack_boundary - unadjusted_alignment;
 
+  /* We need to know whether the adjusted argument size
+     (UNADJUSTED_ARGS_SIZE - ADJUSTMENT) constitutes an allocation
+     or a deallocation.  */
+  if (!ordered_p (adjustment, unadjusted_args_size))
+    return false;
+
   /* Now, sets ARGS_SIZE->CONSTANT so that we pop the right number of
      bytes after the call.  The right number is the entire
      PENDING_STACK_ADJUST less our ADJUSTMENT plus the amount required
@@ -2655,7 +2722,8 @@ combine_pending_stack_adjustment_and_cal
   args_size->constant
     = pending_stack_adjust - adjustment + unadjusted_args_size;
 
-  return adjustment;
+  *adjustment_out = adjustment;
+  return true;
 }
 
 /* Scan X expression if it does not dereference any argument slots
@@ -2681,8 +2749,8 @@ check_sibcall_argument_overlap_1 (rtx x)
     return 0;
 
   if (code == MEM)
-    return mem_overlaps_already_clobbered_arg_p (XEXP (x, 0),
-						 GET_MODE_SIZE (GET_MODE (x)));
+    return (mem_might_overlap_already_clobbered_arg_p
+	    (XEXP (x, 0), GET_MODE_SIZE (GET_MODE (x))));
 
   /* Scan all subexpressions.  */
   fmt = GET_RTX_FORMAT (code);
@@ -2714,7 +2782,8 @@ check_sibcall_argument_overlap_1 (rtx x)
 check_sibcall_argument_overlap (rtx_insn *insn, struct arg_data *arg,
 				int mark_stored_args_map)
 {
-  int low, high;
+  poly_uint64 low, high;
+  unsigned HOST_WIDE_INT const_low, const_high;
 
   if (insn == NULL_RTX)
     insn = get_insns ();
@@ -2732,9 +2801,14 @@ check_sibcall_argument_overlap (rtx_insn
 	low = -arg->locate.slot_offset.constant - arg->locate.size.constant;
       else
 	low = arg->locate.slot_offset.constant;
+      high = low + arg->locate.size.constant;
 
-      for (high = low + arg->locate.size.constant; low < high; low++)
-	bitmap_set_bit (stored_args_map, low);
+      const_low = constant_lower_bound (low);
+      if (high.is_constant (&const_high))
+	for (unsigned HOST_WIDE_INT i = const_low; i < const_high; ++i)
+	  bitmap_set_bit (stored_args_map, i);
+      else
+	stored_args_watermark = MIN (stored_args_watermark, const_low);
     }
   return insn != NULL_RTX;
 }
@@ -2877,7 +2951,8 @@ can_implement_as_sibling_call_p (tree ex
      function, we cannot change it into a sibling call.
      crtl->args.pretend_args_size is not part of the
      stack allocated by our caller.  */
-  if (args_size.constant > (crtl->args.size - crtl->args.pretend_args_size))
+  if (may_gt (args_size.constant,
+	      crtl->args.size - crtl->args.pretend_args_size))
     {
       maybe_complain_about_tail_call (exp,
 				      "callee required more stack slots"
@@ -2887,10 +2962,12 @@ can_implement_as_sibling_call_p (tree ex
 
   /* If the callee pops its own arguments, then it must pop exactly
      the same number of arguments as the current function.  */
-  if (targetm.calls.return_pops_args (fndecl, funtype, args_size.constant)
-      != targetm.calls.return_pops_args (current_function_decl,
-					 TREE_TYPE (current_function_decl),
-					 crtl->args.size))
+  if (may_ne (targetm.calls.return_pops_args (fndecl, funtype,
+					      args_size.constant),
+	      targetm.calls.return_pops_args (current_function_decl,
+					      TREE_TYPE
+					      (current_function_decl),
+					      crtl->args.size)))
     {
       maybe_complain_about_tail_call (exp,
 				      "inconsistent number of"
@@ -2980,7 +3057,7 @@ expand_call (tree exp, rtx target, int i
   struct args_size args_size;
   struct args_size adjusted_args_size;
   /* Size of arguments before any adjustments (such as rounding).  */
-  int unadjusted_args_size;
+  poly_int64 unadjusted_args_size;
   /* Data on reg parms scanned so far.  */
   CUMULATIVE_ARGS args_so_far_v;
   cumulative_args_t args_so_far;
@@ -3013,22 +3090,23 @@ expand_call (tree exp, rtx target, int i
   rtx save_area = 0;		/* Place that it is saved */
 #endif
 
-  int initial_highest_arg_in_use = highest_outgoing_arg_in_use;
+  unsigned int initial_highest_arg_in_use = highest_outgoing_arg_in_use;
   char *initial_stack_usage_map = stack_usage_map;
+  unsigned HOST_WIDE_INT initial_stack_usage_watermark = stack_usage_watermark;
   char *stack_usage_map_buf = NULL;
 
-  int old_stack_allocated;
+  poly_int64 old_stack_allocated;
 
   /* State variables to track stack modifications.  */
   rtx old_stack_level = 0;
   int old_stack_arg_under_construction = 0;
-  int old_pending_adj = 0;
+  poly_int64 old_pending_adj = 0;
   int old_inhibit_defer_pop = inhibit_defer_pop;
 
   /* Some stack pointer alterations we make are performed via
      allocate_dynamic_stack_space. This modifies the stack_pointer_delta,
      which we then also need to save/restore along the way.  */
-  int old_stack_pointer_delta = 0;
+  poly_int64 old_stack_pointer_delta = 0;
 
   rtx call_fusage;
   tree addr = CALL_EXPR_FN (exp);
@@ -3292,7 +3370,8 @@ expand_call (tree exp, rtx target, int i
 	  || reg_mentioned_p (virtual_outgoing_args_rtx,
 			      structure_value_addr))
       && (args_size.var
-	  || (!ACCUMULATE_OUTGOING_ARGS && args_size.constant)))
+	  || (!ACCUMULATE_OUTGOING_ARGS
+	      && maybe_nonzero (args_size.constant))))
     structure_value_addr = copy_to_reg (structure_value_addr);
 
   /* Tail calls can make things harder to debug, and we've traditionally
@@ -3408,10 +3487,10 @@ expand_call (tree exp, rtx target, int i
 	 call sequence.
 	 Also do the adjustments before a throwing call, otherwise
 	 exception handling can fail; PR 19225. */
-      if (pending_stack_adjust >= 32
-	  || (pending_stack_adjust > 0
+      if (may_ge (pending_stack_adjust, 32)
+	  || (maybe_nonzero (pending_stack_adjust)
 	      && (flags & ECF_MAY_BE_ALLOCA))
-	  || (pending_stack_adjust > 0
+	  || (maybe_nonzero (pending_stack_adjust)
 	      && flag_exceptions && !(flags & ECF_NOTHROW))
 	  || pass == 0)
 	do_pending_stack_adjust ();
@@ -3457,8 +3536,10 @@ expand_call (tree exp, rtx target, int i
 	    argblock
 	      = plus_constant (Pmode, argblock, -crtl->args.pretend_args_size);
 
-	  stored_args_map = sbitmap_alloc (args_size.constant);
+	  HOST_WIDE_INT map_size = constant_lower_bound (args_size.constant);
+	  stored_args_map = sbitmap_alloc (map_size);
 	  bitmap_clear (stored_args_map);
+	  stored_args_watermark = HOST_WIDE_INT_M1U;
 	}
 
       /* If we have no actual push instructions, or shouldn't use them,
@@ -3488,14 +3569,14 @@ expand_call (tree exp, rtx target, int i
 	     in the area reserved for register arguments, which may be part of
 	     the stack frame.  */
 
-	  int needed = adjusted_args_size.constant;
+	  poly_int64 needed = adjusted_args_size.constant;
 
 	  /* Store the maximum argument space used.  It will be pushed by
 	     the prologue (if ACCUMULATE_OUTGOING_ARGS, or stack overflow
 	     checking).  */
 
-	  if (needed > crtl->outgoing_args_size)
-	    crtl->outgoing_args_size = needed;
+	  crtl->outgoing_args_size = upper_bound (crtl->outgoing_args_size,
+						  needed);
 
 	  if (must_preallocate)
 	    {
@@ -3521,12 +3602,16 @@ expand_call (tree exp, rtx target, int i
 		  if (! OUTGOING_REG_PARM_STACK_SPACE ((!fndecl ? fntype : TREE_TYPE (fndecl))))
 		    needed += reg_parm_stack_space;
 
+		  poly_int64 limit = needed;
 		  if (ARGS_GROW_DOWNWARD)
-		    highest_outgoing_arg_in_use
-		      = MAX (initial_highest_arg_in_use, needed + 1);
-		  else
-		    highest_outgoing_arg_in_use
-		      = MAX (initial_highest_arg_in_use, needed);
+		    limit += 1;
+
+		  /* For polynomial sizes, this is the maximum possible
+		     size needed for arguments with a constant size
+		     and offset.  */
+		  HOST_WIDE_INT const_limit = constant_lower_bound (limit);
+		  highest_outgoing_arg_in_use
+		    = MAX (initial_highest_arg_in_use, const_limit);
 
 		  free (stack_usage_map_buf);
 		  stack_usage_map_buf = XNEWVEC (char, highest_outgoing_arg_in_use);
@@ -3551,23 +3636,25 @@ expand_call (tree exp, rtx target, int i
 		}
 	      else
 		{
-		  if (inhibit_defer_pop == 0)
+		  /* Try to reuse some or all of the pending_stack_adjust
+		     to get this space.  */
+		  if (inhibit_defer_pop == 0
+		      && (combine_pending_stack_adjustment_and_call
+			  (&needed,
+			   unadjusted_args_size,
+			   &adjusted_args_size,
+			   preferred_unit_stack_boundary)))
 		    {
-		      /* Try to reuse some or all of the pending_stack_adjust
-			 to get this space.  */
-		      needed
-			= (combine_pending_stack_adjustment_and_call
-			   (unadjusted_args_size,
-			    &adjusted_args_size,
-			    preferred_unit_stack_boundary));
-
 		      /* combine_pending_stack_adjustment_and_call computes
 			 an adjustment before the arguments are allocated.
 			 Account for them and see whether or not the stack
 			 needs to go up or down.  */
 		      needed = unadjusted_args_size - needed;
 
-		      if (needed < 0)
+		      /* Checked by
+			 combine_pending_stack_adjustment_and_call.  */
+		      gcc_checking_assert (ordered_p (needed, 0));
+		      if (may_lt (needed, 0))
 			{
 			  /* We're releasing stack space.  */
 			  /* ??? We can avoid any adjustment at all if we're
@@ -3584,11 +3671,12 @@ expand_call (tree exp, rtx target, int i
 
 		  /* Special case this because overhead of `push_block' in
 		     this case is non-trivial.  */
-		  if (needed == 0)
+		  if (known_zero (needed))
 		    argblock = virtual_outgoing_args_rtx;
 		  else
 		    {
-		      argblock = push_block (GEN_INT (needed), 0, 0);
+		      rtx needed_rtx = gen_int_mode (needed, Pmode);
+		      argblock = push_block (needed_rtx, 0, 0);
 		      if (ARGS_GROW_DOWNWARD)
 			argblock = plus_constant (Pmode, argblock, needed);
 		    }
@@ -3614,10 +3702,11 @@ expand_call (tree exp, rtx target, int i
 	  if (stack_arg_under_construction)
 	    {
 	      rtx push_size
-		= GEN_INT (adjusted_args_size.constant
-			   + (OUTGOING_REG_PARM_STACK_SPACE ((!fndecl ? fntype
-			   					      : TREE_TYPE (fndecl))) ? 0
-			      : reg_parm_stack_space));
+		= (gen_int_mode
+		   (adjusted_args_size.constant
+		    + (OUTGOING_REG_PARM_STACK_SPACE (!fndecl ? fntype
+						      : TREE_TYPE (fndecl))
+		       ? 0 : reg_parm_stack_space), Pmode));
 	      if (old_stack_level == 0)
 		{
 		  emit_stack_save (SAVE_BLOCK, &old_stack_level);
@@ -3636,6 +3725,7 @@ expand_call (tree exp, rtx target, int i
 		  stack_usage_map_buf = XCNEWVEC (char, highest_outgoing_arg_in_use);
 		  stack_usage_map = stack_usage_map_buf;
 		  highest_outgoing_arg_in_use = 0;
+		  stack_usage_watermark = HOST_WIDE_INT_M1U;
 		}
 	      /* We can pass TRUE as the 4th argument because we just
 		 saved the stack pointer and will restore it right after
@@ -3671,24 +3761,23 @@ expand_call (tree exp, rtx target, int i
 
       /* Perform stack alignment before the first push (the last arg).  */
       if (argblock == 0
-          && adjusted_args_size.constant > reg_parm_stack_space
-	  && adjusted_args_size.constant != unadjusted_args_size)
+	  && may_gt (adjusted_args_size.constant, reg_parm_stack_space)
+	  && may_ne (adjusted_args_size.constant, unadjusted_args_size))
 	{
 	  /* When the stack adjustment is pending, we get better code
 	     by combining the adjustments.  */
-	  if (pending_stack_adjust
-	      && ! inhibit_defer_pop)
-	    {
-	      pending_stack_adjust
-		= (combine_pending_stack_adjustment_and_call
-		   (unadjusted_args_size,
-		    &adjusted_args_size,
-		    preferred_unit_stack_boundary));
-	      do_pending_stack_adjust ();
-	    }
+	  if (maybe_nonzero (pending_stack_adjust)
+	      && ! inhibit_defer_pop
+	      && (combine_pending_stack_adjustment_and_call
+		  (&pending_stack_adjust,
+		   unadjusted_args_size,
+		   &adjusted_args_size,
+		   preferred_unit_stack_boundary)))
+	    do_pending_stack_adjust ();
 	  else if (argblock == 0)
-	    anti_adjust_stack (GEN_INT (adjusted_args_size.constant
-					- unadjusted_args_size));
+	    anti_adjust_stack (gen_int_mode (adjusted_args_size.constant
+					     - unadjusted_args_size,
+					     Pmode));
 	}
       /* Now that the stack is properly aligned, pops can't safely
 	 be deferred during the evaluation of the arguments.  */
@@ -3702,9 +3791,10 @@ expand_call (tree exp, rtx target, int i
 	  && pass
 	  && adjusted_args_size.var == 0)
 	{
-	  int pushed = adjusted_args_size.constant + pending_stack_adjust;
-	  if (pushed > current_function_pushed_stack_size)
-	    current_function_pushed_stack_size = pushed;
+	  poly_int64 pushed = (adjusted_args_size.constant
+			       + pending_stack_adjust);
+	  current_function_pushed_stack_size
+	    = upper_bound (current_function_pushed_stack_size, pushed);
 	}
 
       funexp = rtx_for_function_call (fndecl, addr);
@@ -3739,7 +3829,7 @@ expand_call (tree exp, rtx target, int i
 
 	      /* We don't allow passing huge (> 2^30 B) arguments
 	         by value.  It would cause an overflow later on.  */
-	      if (adjusted_args_size.constant
+	      if (constant_lower_bound (adjusted_args_size.constant)
 		  >= (1 << (HOST_BITS_PER_INT - 2)))
 	        {
 	          sorry ("passing too large argument on stack");
@@ -3922,7 +4012,8 @@ expand_call (tree exp, rtx target, int i
 
       /* Stack must be properly aligned now.  */
       gcc_assert (!pass
-		  || !(stack_pointer_delta % preferred_unit_stack_boundary));
+		  || multiple_p (stack_pointer_delta,
+				 preferred_unit_stack_boundary));
 
       /* Generate the actual call instruction.  */
       emit_call_1 (funexp, exp, fndecl, funtype, unadjusted_args_size,
@@ -4150,6 +4241,7 @@ expand_call (tree exp, rtx target, int i
 	  stack_arg_under_construction = old_stack_arg_under_construction;
 	  highest_outgoing_arg_in_use = initial_highest_arg_in_use;
 	  stack_usage_map = initial_stack_usage_map;
+	  stack_usage_watermark = initial_stack_usage_watermark;
 	  sibcall_failure = 1;
 	}
       else if (ACCUMULATE_OUTGOING_ARGS && pass)
@@ -4174,12 +4266,14 @@ expand_call (tree exp, rtx target, int i
 		  emit_move_insn (stack_area, args[i].save_area);
 		else
 		  emit_block_move (stack_area, args[i].save_area,
-				   GEN_INT (args[i].locate.size.constant),
+				   (gen_int_mode
+				    (args[i].locate.size.constant, Pmode)),
 				   BLOCK_OP_CALL_PARM);
 	      }
 
 	  highest_outgoing_arg_in_use = initial_highest_arg_in_use;
 	  stack_usage_map = initial_stack_usage_map;
+	  stack_usage_watermark = initial_stack_usage_watermark;
 	}
 
       /* If this was alloca, record the new stack level.  */
@@ -4222,8 +4316,8 @@ expand_call (tree exp, rtx target, int i
 
 	  /* Verify that we've deallocated all the stack we used.  */
 	  gcc_assert ((flags & ECF_NORETURN)
-		      || (old_stack_allocated
-			  == stack_pointer_delta - pending_stack_adjust));
+		      || must_eq (old_stack_allocated,
+				  stack_pointer_delta - pending_stack_adjust));
 	}
 
       /* If something prevents making this a sibling call,
@@ -4390,7 +4484,7 @@ emit_library_call_value_1 (int retval, r
   int struct_value_size = 0;
   int flags;
   int reg_parm_stack_space = 0;
-  int needed;
+  poly_int64 needed;
   rtx_insn *before_call;
   bool have_push_fusage;
   tree tfom;			/* type_for_mode (outmode, 0) */
@@ -4403,8 +4497,9 @@ emit_library_call_value_1 (int retval, r
 #endif
 
   /* Size of the stack reserved for parameter registers.  */
-  int initial_highest_arg_in_use = highest_outgoing_arg_in_use;
+  unsigned int initial_highest_arg_in_use = highest_outgoing_arg_in_use;
   char *initial_stack_usage_map = stack_usage_map;
+  unsigned HOST_WIDE_INT initial_stack_usage_watermark = stack_usage_watermark;
   char *stack_usage_map_buf = NULL;
 
   rtx struct_value = targetm.calls.struct_value_rtx (0, 0);
@@ -4636,27 +4731,25 @@ emit_library_call_value_1 (int retval, r
   assemble_external_libcall (fun);
 
   original_args_size = args_size;
-  args_size.constant = (((args_size.constant
-			  + stack_pointer_delta
-			  + STACK_BYTES - 1)
-			  / STACK_BYTES
-			  * STACK_BYTES)
-			 - stack_pointer_delta);
+  args_size.constant = (aligned_upper_bound (args_size.constant
+					     + stack_pointer_delta,
+					     STACK_BYTES)
+			- stack_pointer_delta);
 
-  args_size.constant = MAX (args_size.constant,
-			    reg_parm_stack_space);
+  args_size.constant = upper_bound (args_size.constant,
+				    reg_parm_stack_space);
 
   if (! OUTGOING_REG_PARM_STACK_SPACE ((!fndecl ? fntype : TREE_TYPE (fndecl))))
     args_size.constant -= reg_parm_stack_space;
 
-  if (args_size.constant > crtl->outgoing_args_size)
-    crtl->outgoing_args_size = args_size.constant;
+  crtl->outgoing_args_size = upper_bound (crtl->outgoing_args_size,
+					  args_size.constant);
 
   if (flag_stack_usage_info && !ACCUMULATE_OUTGOING_ARGS)
     {
-      int pushed = args_size.constant + pending_stack_adjust;
-      if (pushed > current_function_pushed_stack_size)
-	current_function_pushed_stack_size = pushed;
+      poly_int64 pushed = args_size.constant + pending_stack_adjust;
+      current_function_pushed_stack_size
+	= upper_bound (current_function_pushed_stack_size, pushed);
     }
 
   if (ACCUMULATE_OUTGOING_ARGS)
@@ -4681,11 +4774,15 @@ emit_library_call_value_1 (int retval, r
       if (! OUTGOING_REG_PARM_STACK_SPACE ((!fndecl ? fntype : TREE_TYPE (fndecl))))
 	needed += reg_parm_stack_space;
 
+      poly_int64 limit = needed;
       if (ARGS_GROW_DOWNWARD)
-	highest_outgoing_arg_in_use = MAX (initial_highest_arg_in_use,
-					   needed + 1);
-      else
-	highest_outgoing_arg_in_use = MAX (initial_highest_arg_in_use, needed);
+	limit += 1;
+
+      /* For polynomial sizes, this is the maximum possible size needed
+	 for arguments with a constant size and offset.  */
+      HOST_WIDE_INT const_limit = constant_lower_bound (limit);
+      highest_outgoing_arg_in_use = MAX (initial_highest_arg_in_use,
+					 const_limit);
 
       stack_usage_map_buf = XNEWVEC (char, highest_outgoing_arg_in_use);
       stack_usage_map = stack_usage_map_buf;
@@ -4713,14 +4810,15 @@ emit_library_call_value_1 (int retval, r
   else
     {
       if (!PUSH_ARGS)
-	argblock = push_block (GEN_INT (args_size.constant), 0, 0);
+	argblock = push_block (gen_int_mode (args_size.constant, Pmode), 0, 0);
     }
 
   /* We push args individually in reverse order, perform stack alignment
      before the first push (the last arg).  */
   if (argblock == 0)
-    anti_adjust_stack (GEN_INT (args_size.constant
-				- original_args_size.constant));
+    anti_adjust_stack (gen_int_mode (args_size.constant
+				     - original_args_size.constant,
+				     Pmode));
 
   argnum = nargs - 1;
 
@@ -4760,7 +4858,7 @@ emit_library_call_value_1 (int retval, r
       rtx reg = argvec[argnum].reg;
       int partial = argvec[argnum].partial;
       unsigned int parm_align = argvec[argnum].locate.boundary;
-      int lower_bound = 0, upper_bound = 0, i;
+      poly_int64 lower_bound = 0, upper_bound = 0;
 
       if (! (reg != 0 && partial == 0))
 	{
@@ -4784,18 +4882,11 @@ emit_library_call_value_1 (int retval, r
 		  upper_bound = lower_bound + argvec[argnum].locate.size.constant;
 		}
 
-	      i = lower_bound;
-	      /* Don't worry about things in the fixed argument area;
-		 it has already been saved.  */
-	      if (i < reg_parm_stack_space)
-		i = reg_parm_stack_space;
-	      while (i < upper_bound && stack_usage_map[i] == 0)
-		i++;
-
-	      if (i < upper_bound)
+	      if (stack_region_maybe_used_p (lower_bound, upper_bound,
+					     reg_parm_stack_space))
 		{
 		  /* We need to make a save area.  */
-		  unsigned int size
+		  poly_uint64 size
 		    = argvec[argnum].locate.size.constant * BITS_PER_UNIT;
 		  machine_mode save_mode
 		    = int_mode_for_size (size, 1).else_blk ();
@@ -4815,7 +4906,9 @@ emit_library_call_value_1 (int retval, r
 		      emit_block_move (validize_mem
 				         (copy_rtx (argvec[argnum].save_area)),
 				       stack_area,
-				       GEN_INT (argvec[argnum].locate.size.constant),
+				       (gen_int_mode
+					(argvec[argnum].locate.size.constant,
+					 Pmode)),
 				       BLOCK_OP_CALL_PARM);
 		    }
 		  else
@@ -4829,14 +4922,14 @@ emit_library_call_value_1 (int retval, r
 
 	  emit_push_insn (val, mode, NULL_TREE, NULL_RTX, parm_align,
 			  partial, reg, 0, argblock,
-			  GEN_INT (argvec[argnum].locate.offset.constant),
+			  (gen_int_mode
+			   (argvec[argnum].locate.offset.constant, Pmode)),
 			  reg_parm_stack_space,
 			  ARGS_SIZE_RTX (argvec[argnum].locate.alignment_pad), false);
 
 	  /* Now mark the segment we just used.  */
 	  if (ACCUMULATE_OUTGOING_ARGS)
-	    for (i = lower_bound; i < upper_bound; i++)
-	      stack_usage_map[i] = 1;
+	    mark_stack_region_used (lower_bound, upper_bound);
 
 	  NO_DEFER_POP;
 
@@ -4958,8 +5051,8 @@ emit_library_call_value_1 (int retval, r
 	    ? hard_libcall_value (outmode, orgfun) : NULL_RTX);
 
   /* Stack must be properly aligned now.  */
-  gcc_assert (!(stack_pointer_delta
-		& (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT - 1)));
+  gcc_assert (multiple_p (stack_pointer_delta,
+			  PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT));
 
   before_call = get_last_insn ();
 
@@ -5097,7 +5190,8 @@ emit_library_call_value_1 (int retval, r
 	      emit_block_move (stack_area,
 			       validize_mem
 			         (copy_rtx (argvec[count].save_area)),
-			       GEN_INT (argvec[count].locate.size.constant),
+			       (gen_int_mode
+				(argvec[count].locate.size.constant, Pmode)),
 			       BLOCK_OP_CALL_PARM);
 	    else
 	      emit_move_insn (stack_area, argvec[count].save_area);
@@ -5105,6 +5199,7 @@ emit_library_call_value_1 (int retval, r
 
       highest_outgoing_arg_in_use = initial_highest_arg_in_use;
       stack_usage_map = initial_stack_usage_map;
+      stack_usage_watermark = initial_stack_usage_watermark;
     }
 
   free (stack_usage_map_buf);
@@ -5201,8 +5296,8 @@ store_one_arg (struct arg_data *arg, rtx
   tree pval = arg->tree_value;
   rtx reg = 0;
   int partial = 0;
-  int used = 0;
-  int i, lower_bound = 0, upper_bound = 0;
+  poly_int64 used = 0;
+  poly_int64 lower_bound = 0, upper_bound = 0;
   int sibcall_failure = 0;
 
   if (TREE_CODE (pval) == ERROR_MARK)
@@ -5223,7 +5318,10 @@ store_one_arg (struct arg_data *arg, rtx
 	      /* stack_slot is negative, but we want to index stack_usage_map
 		 with positive values.  */
 	      if (GET_CODE (XEXP (arg->stack_slot, 0)) == PLUS)
-		upper_bound = -INTVAL (XEXP (XEXP (arg->stack_slot, 0), 1)) + 1;
+		{
+		  rtx offset = XEXP (XEXP (arg->stack_slot, 0), 1);
+		  upper_bound = -rtx_to_poly_int64 (offset) + 1;
+		}
 	      else
 		upper_bound = 0;
 
@@ -5232,25 +5330,21 @@ store_one_arg (struct arg_data *arg, rtx
 	  else
 	    {
 	      if (GET_CODE (XEXP (arg->stack_slot, 0)) == PLUS)
-		lower_bound = INTVAL (XEXP (XEXP (arg->stack_slot, 0), 1));
+		{
+		  rtx offset = XEXP (XEXP (arg->stack_slot, 0), 1);
+		  lower_bound = rtx_to_poly_int64 (offset);
+		}
 	      else
 		lower_bound = 0;
 
 	      upper_bound = lower_bound + arg->locate.size.constant;
 	    }
 
-	  i = lower_bound;
-	  /* Don't worry about things in the fixed argument area;
-	     it has already been saved.  */
-	  if (i < reg_parm_stack_space)
-	    i = reg_parm_stack_space;
-	  while (i < upper_bound && stack_usage_map[i] == 0)
-	    i++;
-
-	  if (i < upper_bound)
+	  if (stack_region_maybe_used_p (lower_bound, upper_bound,
+					 reg_parm_stack_space))
 	    {
 	      /* We need to make a save area.  */
-	      unsigned int size = arg->locate.size.constant * BITS_PER_UNIT;
+	      poly_uint64 size = arg->locate.size.constant * BITS_PER_UNIT;
 	      machine_mode save_mode
 		= int_mode_for_size (size, 1).else_blk ();
 	      rtx adr = memory_address (save_mode, XEXP (arg->stack_slot, 0));
@@ -5263,7 +5357,8 @@ store_one_arg (struct arg_data *arg, rtx
 		  preserve_temp_slots (arg->save_area);
 		  emit_block_move (validize_mem (copy_rtx (arg->save_area)),
 				   stack_area,
-				   GEN_INT (arg->locate.size.constant),
+				   (gen_int_mode
+				    (arg->locate.size.constant, Pmode)),
 				   BLOCK_OP_CALL_PARM);
 		}
 	      else
@@ -5340,8 +5435,8 @@ store_one_arg (struct arg_data *arg, rtx
   /* Check for overlap with already clobbered argument area.  */
   if ((flags & ECF_SIBCALL)
       && MEM_P (arg->value)
-      && mem_overlaps_already_clobbered_arg_p (XEXP (arg->value, 0),
-					       arg->locate.size.constant))
+      && mem_might_overlap_already_clobbered_arg_p (XEXP (arg->value, 0),
+						    arg->locate.size.constant))
     sibcall_failure = 1;
 
   /* Don't allow anything left on stack from computation
@@ -5389,12 +5484,10 @@ store_one_arg (struct arg_data *arg, rtx
       if (targetm.calls.function_arg_padding (arg->mode, TREE_TYPE (pval))
 	  == PAD_DOWNWARD)
 	{
-	  int pad = used - size;
-	  if (pad)
-	    {
-	      unsigned int pad_align = least_bit_hwi (pad) * BITS_PER_UNIT;
-	      parm_align = MIN (parm_align, pad_align);
-	    }
+	  poly_int64 pad = used - size;
+	  unsigned int pad_align = known_alignment (pad) * BITS_PER_UNIT;
+	  if (pad_align != 0)
+	    parm_align = MIN (parm_align, pad_align);
 	}
 
       /* This isn't already where we want it on the stack, so put it there.
@@ -5415,7 +5508,7 @@ store_one_arg (struct arg_data *arg, rtx
       /* BLKmode, at least partly to be pushed.  */
 
       unsigned int parm_align;
-      int excess;
+      poly_int64 excess;
       rtx size_rtx;
 
       /* Pushing a nonscalar.
@@ -5451,10 +5544,12 @@ store_one_arg (struct arg_data *arg, rtx
 	{
 	  if (arg->locate.size.var)
 	    parm_align = BITS_PER_UNIT;
-	  else if (excess)
+	  else
 	    {
-	      unsigned int excess_align = least_bit_hwi (excess) * BITS_PER_UNIT;
-	      parm_align = MIN (parm_align, excess_align);
+	      unsigned int excess_align
+		= known_alignment (excess) * BITS_PER_UNIT;
+	      if (excess_align != 0)
+		parm_align = MIN (parm_align, excess_align);
 	    }
 	}
 
@@ -5463,7 +5558,7 @@ store_one_arg (struct arg_data *arg, rtx
 	  /* emit_push_insn might not work properly if arg->value and
 	     argblock + arg->locate.offset areas overlap.  */
 	  rtx x = arg->value;
-	  int i = 0;
+	  poly_int64 i = 0;
 
 	  if (XEXP (x, 0) == crtl->args.internal_arg_pointer
 	      || (GET_CODE (XEXP (x, 0)) == PLUS
@@ -5472,7 +5567,7 @@ store_one_arg (struct arg_data *arg, rtx
 		  && CONST_INT_P (XEXP (XEXP (x, 0), 1))))
 	    {
 	      if (XEXP (x, 0) != crtl->args.internal_arg_pointer)
-		i = INTVAL (XEXP (XEXP (x, 0), 1));
+		i = rtx_to_poly_int64 (XEXP (XEXP (x, 0), 1));
 
 	      /* arg.locate doesn't contain the pretend_args_size offset,
 		 it's part of argblock.  Ensure we don't count it in I.  */
@@ -5483,33 +5578,28 @@ store_one_arg (struct arg_data *arg, rtx
 
 	      /* expand_call should ensure this.  */
 	      gcc_assert (!arg->locate.offset.var
-			  && arg->locate.size.var == 0
-			  && CONST_INT_P (size_rtx));
+			  && arg->locate.size.var == 0);
+	      poly_int64 size_val = rtx_to_poly_int64 (size_rtx);
 
-	      if (arg->locate.offset.constant > i)
-		{
-		  if (arg->locate.offset.constant < i + INTVAL (size_rtx))
-		    sibcall_failure = 1;
-		}
-	      else if (arg->locate.offset.constant < i)
-		{
-		  /* Use arg->locate.size.constant instead of size_rtx
-		     because we only care about the part of the argument
-		     on the stack.  */
-		  if (i < (arg->locate.offset.constant
-			   + arg->locate.size.constant))
-		    sibcall_failure = 1;
-		}
-	      else
+	      if (must_eq (arg->locate.offset.constant, i))
 		{
 		  /* Even though they appear to be at the same location,
 		     if part of the outgoing argument is in registers,
 		     they aren't really at the same location.  Check for
 		     this by making sure that the incoming size is the
 		     same as the outgoing size.  */
-		  if (arg->locate.size.constant != INTVAL (size_rtx))
+		  if (may_ne (arg->locate.size.constant, size_val))
 		    sibcall_failure = 1;
 		}
+	      else if (maybe_in_range_p (arg->locate.offset.constant,
+					 i, size_val))
+		sibcall_failure = 1;
+	      /* Use arg->locate.size.constant instead of size_rtx
+		 because we only care about the part of the argument
+		 on the stack.  */
+	      else if (maybe_in_range_p (i, arg->locate.offset.constant,
+					 arg->locate.size.constant))
+		sibcall_failure = 1;
 	    }
 	}
 
@@ -5541,8 +5631,7 @@ store_one_arg (struct arg_data *arg, rtx
   /* Mark all slots this store used.  */
   if (ACCUMULATE_OUTGOING_ARGS && !(flags & ECF_SIBCALL)
       && argblock && ! variable_size && arg->stack)
-    for (i = lower_bound; i < upper_bound; i++)
-      stack_usage_map[i] = 1;
+    mark_stack_region_used (lower_bound, upper_bound);
 
   /* Once we have pushed something, pops can't safely
      be deferred during the rest of the arguments.  */
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2017-10-23 17:11:40.145755651 +0100
+++ gcc/config/arm/arm.c	2017-10-23 17:19:01.398170131 +0100
@@ -19764,8 +19764,8 @@ arm_output_function_prologue (FILE *f)
   if (IS_CMSE_ENTRY (func_type))
     asm_fprintf (f, "\t%@ Non-secure entry function: called from non-secure code.\n");
 
-  asm_fprintf (f, "\t%@ args = %d, pretend = %d, frame = %wd\n",
-	       crtl->args.size,
+  asm_fprintf (f, "\t%@ args = %wd, pretend = %d, frame = %wd\n",
+	       (HOST_WIDE_INT) crtl->args.size,
 	       crtl->args.pretend_args_size,
 	       (HOST_WIDE_INT) get_frame_size ());
 
Index: gcc/config/avr/avr.c
===================================================================
--- gcc/config/avr/avr.c	2017-10-23 17:18:53.829515199 +0100
+++ gcc/config/avr/avr.c	2017-10-23 17:19:01.400170158 +0100
@@ -1151,7 +1151,8 @@ avr_accumulate_outgoing_args (void)
 static inline int
 avr_outgoing_args_size (void)
 {
-  return ACCUMULATE_OUTGOING_ARGS ? crtl->outgoing_args_size : 0;
+  return (ACCUMULATE_OUTGOING_ARGS
+	  ? (HOST_WIDE_INT) crtl->outgoing_args_size : 0);
 }
 
 
Index: gcc/config/microblaze/microblaze.c
===================================================================
--- gcc/config/microblaze/microblaze.c	2017-10-23 17:11:40.161786287 +0100
+++ gcc/config/microblaze/microblaze.c	2017-10-23 17:19:01.405170225 +0100
@@ -2728,7 +2728,7 @@ microblaze_function_prologue (FILE * fil
 			  STACK_POINTER_REGNUM]), fsiz,
 	       reg_names[MB_ABI_SUB_RETURN_ADDR_REGNUM + GP_REG_FIRST],
 	       current_frame_info.var_size, current_frame_info.num_gp,
-	       crtl->outgoing_args_size);
+	       (int) crtl->outgoing_args_size);
       fprintf (file, "\t.mask\t0x%08lx\n", current_frame_info.mask);
     }
 }
Index: gcc/config/cr16/cr16.c
===================================================================
--- gcc/config/cr16/cr16.c	2017-10-23 17:11:40.148761395 +0100
+++ gcc/config/cr16/cr16.c	2017-10-23 17:19:01.400170158 +0100
@@ -253,10 +253,8 @@ cr16_class_likely_spilled_p (reg_class_t
   return false;
 }
 
-static int
-cr16_return_pops_args (tree fundecl ATTRIBUTE_UNUSED,
-                       tree funtype ATTRIBUTE_UNUSED, 
-		       int size ATTRIBUTE_UNUSED)
+static poly_int64
+cr16_return_pops_args (tree, tree, poly_int64)
 {
   return 0;
 }
@@ -433,9 +431,10 @@ cr16_compute_frame (void)
     padding_locals = stack_alignment - padding_locals;
 
   current_frame_info.var_size += padding_locals;
-  current_frame_info.total_size = current_frame_info.var_size 
-			          + (ACCUMULATE_OUTGOING_ARGS
-			             ? crtl->outgoing_args_size : 0);
+  current_frame_info.total_size
+    = (current_frame_info.var_size
+       + (ACCUMULATE_OUTGOING_ARGS
+	  ? (HOST_WIDE_INT) crtl->outgoing_args_size : 0));
 }
 
 /* Implements the macro INITIAL_ELIMINATION_OFFSET, return the OFFSET.  */
@@ -449,12 +448,14 @@ cr16_initial_elimination_offset (int fro
   cr16_compute_frame ();
 
   if (((from) == FRAME_POINTER_REGNUM) && ((to) == STACK_POINTER_REGNUM))
-    return (ACCUMULATE_OUTGOING_ARGS ? crtl->outgoing_args_size : 0);
+    return (ACCUMULATE_OUTGOING_ARGS
+	    ? (HOST_WIDE_INT) crtl->outgoing_args_size : 0);
   else if (((from) == ARG_POINTER_REGNUM) && ((to) == FRAME_POINTER_REGNUM))
     return (current_frame_info.reg_size + current_frame_info.var_size);
   else if (((from) == ARG_POINTER_REGNUM) && ((to) == STACK_POINTER_REGNUM))
     return (current_frame_info.reg_size + current_frame_info.var_size 
-	    + (ACCUMULATE_OUTGOING_ARGS ? crtl->outgoing_args_size : 0));
+	    + (ACCUMULATE_OUTGOING_ARGS
+	       ? (HOST_WIDE_INT) crtl->outgoing_args_size : 0));
   else
     gcc_unreachable ();
 }
Index: gcc/config/ft32/ft32.c
===================================================================
--- gcc/config/ft32/ft32.c	2017-10-23 17:11:40.150765225 +0100
+++ gcc/config/ft32/ft32.c	2017-10-23 17:19:01.400170158 +0100
@@ -417,7 +417,8 @@ ft32_compute_frame (void)
   cfun->machine->size_for_adjusting_sp =
     0 // crtl->args.pretend_args_size
     + cfun->machine->local_vars_size
-    + (ACCUMULATE_OUTGOING_ARGS ? crtl->outgoing_args_size : 0);
+    + (ACCUMULATE_OUTGOING_ARGS
+       ? (HOST_WIDE_INT) crtl->outgoing_args_size : 0);
 }
 
 // Must use LINK/UNLINK when...
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	2017-10-23 17:11:40.155774798 +0100
+++ gcc/config/i386/i386.c	2017-10-23 17:19:01.404170211 +0100
@@ -6528,8 +6528,8 @@ ix86_keep_aggregate_return_pointer (tree
 
    The attribute stdcall is equivalent to RTD on a per module basis.  */
 
-static int
-ix86_return_pops_args (tree fundecl, tree funtype, int size)
+static poly_int64
+ix86_return_pops_args (tree fundecl, tree funtype, poly_int64 size)
 {
   unsigned int ccvt;
 
@@ -14172,7 +14172,7 @@ ix86_expand_split_stack_prologue (void)
      anyhow.  In 64-bit mode we pass the parameters in r10 and
      r11.  */
   allocate_rtx = GEN_INT (allocate);
-  args_size = crtl->args.size >= 0 ? crtl->args.size : 0;
+  args_size = crtl->args.size >= 0 ? (HOST_WIDE_INT) crtl->args.size : 0;
   call_fusage = NULL_RTX;
   rtx pop = NULL_RTX;
   if (TARGET_64BIT)
Index: gcc/config/moxie/moxie.c
===================================================================
--- gcc/config/moxie/moxie.c	2017-10-23 17:11:40.164792031 +0100
+++ gcc/config/moxie/moxie.c	2017-10-23 17:19:01.405170225 +0100
@@ -270,7 +270,8 @@ moxie_compute_frame (void)
   cfun->machine->size_for_adjusting_sp = 
     crtl->args.pretend_args_size
     + cfun->machine->local_vars_size 
-    + (ACCUMULATE_OUTGOING_ARGS ? crtl->outgoing_args_size : 0);
+    + (ACCUMULATE_OUTGOING_ARGS
+       ? (HOST_WIDE_INT) crtl->outgoing_args_size : 0);
 }
 
 void
Index: gcc/config/m68k/m68k.c
===================================================================
--- gcc/config/m68k/m68k.c	2017-10-23 17:11:40.160784372 +0100
+++ gcc/config/m68k/m68k.c	2017-10-23 17:19:01.404170211 +0100
@@ -178,7 +178,7 @@ static bool m68k_return_in_memory (const
 #endif
 static void m68k_output_dwarf_dtprel (FILE *, int, rtx) ATTRIBUTE_UNUSED;
 static void m68k_trampoline_init (rtx, tree, rtx);
-static int m68k_return_pops_args (tree, tree, int);
+static poly_int64 m68k_return_pops_args (tree, tree, poly_int64);
 static rtx m68k_delegitimize_address (rtx);
 static void m68k_function_arg_advance (cumulative_args_t, machine_mode,
 				       const_tree, bool);
@@ -6533,14 +6533,14 @@ m68k_trampoline_init (rtx m_tramp, tree
    standard Unix calling sequences.  If the option is not selected,
    the caller must always pop the args.  */
 
-static int
-m68k_return_pops_args (tree fundecl, tree funtype, int size)
+static poly_int64
+m68k_return_pops_args (tree fundecl, tree funtype, poly_int64 size)
 {
   return ((TARGET_RTD
 	   && (!fundecl
 	       || TREE_CODE (fundecl) != IDENTIFIER_NODE)
 	   && (!stdarg_p (funtype)))
-	  ? size : 0);
+	  ? (HOST_WIDE_INT) size : 0);
 }
 
 /* Make sure everything's fine if we *don't* have a given processor.
Index: gcc/config/vax/vax.c
===================================================================
--- gcc/config/vax/vax.c	2017-10-23 17:11:40.188837984 +0100
+++ gcc/config/vax/vax.c	2017-10-23 17:19:01.407170251 +0100
@@ -62,7 +62,7 @@ static rtx vax_struct_value_rtx (tree, i
 static rtx vax_builtin_setjmp_frame_value (void);
 static void vax_asm_trampoline_template (FILE *);
 static void vax_trampoline_init (rtx, tree, rtx);
-static int vax_return_pops_args (tree, tree, int);
+static poly_int64 vax_return_pops_args (tree, tree, poly_int64);
 static bool vax_mode_dependent_address_p (const_rtx, addr_space_t);
 \f
 /* Initialize the GCC target structure.  */
@@ -2136,11 +2136,11 @@ vax_trampoline_init (rtx m_tramp, tree f
 
    On the VAX, the RET insn pops a maximum of 255 args for any function.  */
 
-static int
+static poly_int64
 vax_return_pops_args (tree fundecl ATTRIBUTE_UNUSED,
-		      tree funtype ATTRIBUTE_UNUSED, int size)
+		      tree funtype ATTRIBUTE_UNUSED, poly_int64 size)
 {
-  return size > 255 * 4 ? 0 : size;
+  return size > 255 * 4 ? 0 : (HOST_WIDE_INT) size;
 }
 
 /* Define where to put the arguments to a function.
Index: gcc/config/pa/pa.h
===================================================================
--- gcc/config/pa/pa.h	2017-10-23 17:18:53.830515111 +0100
+++ gcc/config/pa/pa.h	2017-10-23 17:19:01.405170225 +0100
@@ -546,7 +546,7 @@ #define ACCUMULATE_OUTGOING_ARGS 1
    marker, although the runtime documentation only describes a 16
    byte marker.  For compatibility, we allocate 48 bytes.  */
 #define STACK_POINTER_OFFSET \
-  (TARGET_64BIT ? -(crtl->outgoing_args_size + 48): -32)
+  (TARGET_64BIT ? -(crtl->outgoing_args_size + 48) : poly_int64 (-32))
 
 #define STACK_DYNAMIC_OFFSET(FNDECL)	\
   (TARGET_64BIT				\
@@ -703,7 +703,7 @@ #define NO_PROFILE_COUNTERS 1
 
 #define EXIT_IGNORE_STACK	\
  (maybe_nonzero (get_frame_size ())	\
-  || cfun->calls_alloca || crtl->outgoing_args_size)
+  || cfun->calls_alloca || maybe_nonzero (crtl->outgoing_args_size))
 
 /* Length in units of the trampoline for entering a nested function.  */
 
Index: gcc/config/arm/arm.h
===================================================================
--- gcc/config/arm/arm.h	2017-10-23 16:52:18.296738218 +0100
+++ gcc/config/arm/arm.h	2017-10-23 17:19:01.399170145 +0100
@@ -1248,7 +1248,7 @@ #define FRAME_GROWS_DOWNWARD 1
    couldn't convert a direct call into an indirect one.  */
 #define CALLER_INTERWORKING_SLOT_SIZE			\
   (TARGET_CALLER_INTERWORKING				\
-   && crtl->outgoing_args_size != 0		\
+   && maybe_nonzero (crtl->outgoing_args_size)		\
    ? UNITS_PER_WORD : 0)
 
 /* Offset within stack frame to start allocating local variables at.
Index: gcc/config/powerpcspe/aix.h
===================================================================
--- gcc/config/powerpcspe/aix.h	2017-10-23 16:52:18.296738218 +0100
+++ gcc/config/powerpcspe/aix.h	2017-10-23 17:19:01.405170225 +0100
@@ -73,7 +73,8 @@ #define STARTING_FRAME_OFFSET						\
    `emit-rtl.c').  */
 #undef STACK_DYNAMIC_OFFSET
 #define STACK_DYNAMIC_OFFSET(FUNDECL)					\
-   RS6000_ALIGN (crtl->outgoing_args_size + STACK_POINTER_OFFSET, 16)
+  RS6000_ALIGN (crtl->outgoing_args_size.to_constant () \
+		+ STACK_POINTER_OFFSET, 16)
 
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
Index: gcc/config/powerpcspe/darwin.h
===================================================================
--- gcc/config/powerpcspe/darwin.h	2017-10-23 16:52:18.296738218 +0100
+++ gcc/config/powerpcspe/darwin.h	2017-10-23 17:19:01.405170225 +0100
@@ -157,7 +157,7 @@ #define STARTING_FRAME_OFFSET						\
 
 #undef STACK_DYNAMIC_OFFSET
 #define STACK_DYNAMIC_OFFSET(FUNDECL)					\
-  (RS6000_ALIGN (crtl->outgoing_args_size, 16)		\
+  (RS6000_ALIGN (crtl->outgoing_args_size.to_constant (), 16)		\
    + (STACK_POINTER_OFFSET))
 
 /* Darwin uses a function call if everything needs to be saved/restored.  */
Index: gcc/config/powerpcspe/powerpcspe.h
===================================================================
--- gcc/config/powerpcspe/powerpcspe.h	2017-10-23 16:52:18.296738218 +0100
+++ gcc/config/powerpcspe/powerpcspe.h	2017-10-23 17:19:01.406170238 +0100
@@ -1668,7 +1668,8 @@ #define STARTING_FRAME_OFFSET						\
    This value must be a multiple of STACK_BOUNDARY (hard coded in
    `emit-rtl.c').  */
 #define STACK_DYNAMIC_OFFSET(FUNDECL)					\
-  RS6000_ALIGN (crtl->outgoing_args_size + STACK_POINTER_OFFSET,	\
+  RS6000_ALIGN (crtl->outgoing_args_size.to_constant ()			\
+		+ STACK_POINTER_OFFSET,					\
 		(TARGET_ALTIVEC || TARGET_VSX) ? 16 : 8)
 
 /* If we generate an insn to push BYTES bytes,
Index: gcc/config/rs6000/aix.h
===================================================================
--- gcc/config/rs6000/aix.h	2017-10-23 16:52:18.296738218 +0100
+++ gcc/config/rs6000/aix.h	2017-10-23 17:19:01.406170238 +0100
@@ -73,7 +73,8 @@ #define STARTING_FRAME_OFFSET						\
    `emit-rtl.c').  */
 #undef STACK_DYNAMIC_OFFSET
 #define STACK_DYNAMIC_OFFSET(FUNDECL)					\
-   RS6000_ALIGN (crtl->outgoing_args_size + STACK_POINTER_OFFSET, 16)
+   RS6000_ALIGN (crtl->outgoing_args_size.to_constant ()		\
+		 + STACK_POINTER_OFFSET, 16)
 
 #undef  TARGET_IEEEQUAD
 #define TARGET_IEEEQUAD 0
Index: gcc/config/rs6000/darwin.h
===================================================================
--- gcc/config/rs6000/darwin.h	2017-10-23 16:52:18.296738218 +0100
+++ gcc/config/rs6000/darwin.h	2017-10-23 17:19:01.406170238 +0100
@@ -157,7 +157,7 @@ #define STARTING_FRAME_OFFSET						\
 
 #undef STACK_DYNAMIC_OFFSET
 #define STACK_DYNAMIC_OFFSET(FUNDECL)					\
-  (RS6000_ALIGN (crtl->outgoing_args_size, 16)		\
+  (RS6000_ALIGN (crtl->outgoing_args_size.to_constant (), 16)		\
    + (STACK_POINTER_OFFSET))
 
 /* Darwin uses a function call if everything needs to be saved/restored.  */
Index: gcc/config/rs6000/rs6000.h
===================================================================
--- gcc/config/rs6000/rs6000.h	2017-10-23 16:52:18.296738218 +0100
+++ gcc/config/rs6000/rs6000.h	2017-10-23 17:19:01.407170251 +0100
@@ -1570,7 +1570,8 @@ #define STARTING_FRAME_OFFSET						\
    This value must be a multiple of STACK_BOUNDARY (hard coded in
    `emit-rtl.c').  */
 #define STACK_DYNAMIC_OFFSET(FUNDECL)					\
-  RS6000_ALIGN (crtl->outgoing_args_size + STACK_POINTER_OFFSET,	\
+  RS6000_ALIGN (crtl->outgoing_args_size.to_constant ()			\
+		+ STACK_POINTER_OFFSET,					\
 		(TARGET_ALTIVEC || TARGET_VSX) ? 16 : 8)
 
 /* If we generate an insn to push BYTES bytes,
Index: gcc/dojump.h
===================================================================
--- gcc/dojump.h	2017-10-23 16:52:18.296738218 +0100
+++ gcc/dojump.h	2017-10-23 17:19:01.409170278 +0100
@@ -40,10 +40,10 @@ extern void do_pending_stack_adjust (voi
 struct saved_pending_stack_adjust
 {
   /* Saved value of pending_stack_adjust.  */
-  int x_pending_stack_adjust;
+  poly_int64 x_pending_stack_adjust;
 
   /* Saved value of stack_pointer_delta.  */
-  int x_stack_pointer_delta;
+  poly_int64 x_stack_pointer_delta;
 };
 
 /* Remember pending_stack_adjust/stack_pointer_delta.
Index: gcc/dojump.c
===================================================================
--- gcc/dojump.c	2017-10-23 16:52:18.296738218 +0100
+++ gcc/dojump.c	2017-10-23 17:19:01.409170278 +0100
@@ -89,8 +89,8 @@ do_pending_stack_adjust (void)
 {
   if (inhibit_defer_pop == 0)
     {
-      if (pending_stack_adjust != 0)
-        adjust_stack (GEN_INT (pending_stack_adjust));
+      if (maybe_nonzero (pending_stack_adjust))
+	adjust_stack (gen_int_mode (pending_stack_adjust, Pmode));
       pending_stack_adjust = 0;
     }
 }
Index: gcc/explow.c
===================================================================
--- gcc/explow.c	2017-10-23 17:18:57.859160965 +0100
+++ gcc/explow.c	2017-10-23 17:19:01.409170278 +0100
@@ -1467,8 +1467,8 @@ allocate_dynamic_stack_space (rtx size,
 
  /* We ought to be called always on the toplevel and stack ought to be aligned
     properly.  */
-  gcc_assert (!(stack_pointer_delta
-		% (PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT)));
+  gcc_assert (multiple_p (stack_pointer_delta,
+			  PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT));
 
   /* If needed, check that we have the required amount of stack.  Take into
      account what has already been checked.  */
@@ -1498,7 +1498,7 @@ allocate_dynamic_stack_space (rtx size,
     }
   else
     {
-      int saved_stack_pointer_delta;
+      poly_int64 saved_stack_pointer_delta;
 
       if (!STACK_GROWS_DOWNWARD)
 	emit_move_insn (target, virtual_stack_dynamic_rtx);
Index: gcc/function.c
===================================================================
--- gcc/function.c	2017-10-23 17:18:59.743148042 +0100
+++ gcc/function.c	2017-10-23 17:19:01.410170292 +0100
@@ -1407,7 +1407,7 @@ #define STACK_DYNAMIC_OFFSET(FNDECL)	\
   : 0) + (STACK_POINTER_OFFSET))
 #else
 #define STACK_DYNAMIC_OFFSET(FNDECL)	\
-((ACCUMULATE_OUTGOING_ARGS ? crtl->outgoing_args_size : 0)	      \
+  ((ACCUMULATE_OUTGOING_ARGS ? crtl->outgoing_args_size : poly_int64 (0)) \
  + (STACK_POINTER_OFFSET))
 #endif
 #endif
@@ -2720,12 +2720,15 @@ assign_parm_find_stack_rtl (tree parm, s
      is TARGET_FUNCTION_ARG_BOUNDARY.  If we're using slot_offset, we're
      intentionally forcing upward padding.  Otherwise we have to come
      up with a guess at the alignment based on OFFSET_RTX.  */
+  poly_int64 offset;
   if (data->locate.where_pad != PAD_DOWNWARD || data->entry_parm)
     align = boundary;
-  else if (CONST_INT_P (offset_rtx))
+  else if (poly_int_rtx_p (offset_rtx, &offset))
     {
-      align = INTVAL (offset_rtx) * BITS_PER_UNIT | boundary;
-      align = least_bit_hwi (align);
+      align = least_bit_hwi (boundary);
+      unsigned int offset_align = known_alignment (offset) * BITS_PER_UNIT;
+      if (offset_align != 0)
+	align = MIN (align, offset_align);
     }
   set_mem_align (stack_parm, align);
 
@@ -3887,14 +3890,15 @@ assign_parms (tree fndecl)
   /* Adjust function incoming argument size for alignment and
      minimum length.  */
 
-  crtl->args.size = MAX (crtl->args.size, all.reg_parm_stack_space);
-  crtl->args.size = CEIL_ROUND (crtl->args.size,
-					   PARM_BOUNDARY / BITS_PER_UNIT);
+  crtl->args.size = upper_bound (crtl->args.size, all.reg_parm_stack_space);
+  crtl->args.size = aligned_upper_bound (crtl->args.size,
+					 PARM_BOUNDARY / BITS_PER_UNIT);
 
   if (ARGS_GROW_DOWNWARD)
     {
       crtl->args.arg_offset_rtx
-	= (all.stack_args_size.var == 0 ? GEN_INT (-all.stack_args_size.constant)
+	= (all.stack_args_size.var == 0
+	   ? gen_int_mode (-all.stack_args_size.constant, Pmode)
 	   : expand_expr (size_diffop (all.stack_args_size.var,
 				       size_int (-all.stack_args_size.constant)),
 			  NULL_RTX, VOIDmode, EXPAND_NORMAL));
@@ -4135,15 +4139,19 @@ locate_and_pad_parm (machine_mode passed
     {
       if (reg_parm_stack_space > 0)
 	{
-	  if (initial_offset_ptr->var)
+	  if (initial_offset_ptr->var
+	      || !ordered_p (initial_offset_ptr->constant,
+			     reg_parm_stack_space))
 	    {
 	      initial_offset_ptr->var
 		= size_binop (MAX_EXPR, ARGS_SIZE_TREE (*initial_offset_ptr),
 			      ssize_int (reg_parm_stack_space));
 	      initial_offset_ptr->constant = 0;
 	    }
-	  else if (initial_offset_ptr->constant < reg_parm_stack_space)
-	    initial_offset_ptr->constant = reg_parm_stack_space;
+	  else
+	    initial_offset_ptr->constant
+	      = ordered_max (initial_offset_ptr->constant,
+			     reg_parm_stack_space);
 	}
     }
 
@@ -4269,9 +4277,9 @@ pad_to_arg_alignment (struct args_size *
 		      struct args_size *alignment_pad)
 {
   tree save_var = NULL_TREE;
-  HOST_WIDE_INT save_constant = 0;
+  poly_int64 save_constant = 0;
   int boundary_in_bytes = boundary / BITS_PER_UNIT;
-  HOST_WIDE_INT sp_offset = STACK_POINTER_OFFSET;
+  poly_int64 sp_offset = STACK_POINTER_OFFSET;
 
 #ifdef SPARC_STACK_BOUNDARY_HACK
   /* ??? The SPARC port may claim a STACK_BOUNDARY higher than
@@ -4292,7 +4300,10 @@ pad_to_arg_alignment (struct args_size *
 
   if (boundary > BITS_PER_UNIT)
     {
-      if (offset_ptr->var)
+      int misalign;
+      if (offset_ptr->var
+	  || !known_misalignment (offset_ptr->constant + sp_offset,
+				  boundary_in_bytes, &misalign))
 	{
 	  tree sp_offset_tree = ssize_int (sp_offset);
 	  tree offset = size_binop (PLUS_EXPR,
@@ -4313,13 +4324,13 @@ pad_to_arg_alignment (struct args_size *
 	}
       else
 	{
-	  offset_ptr->constant = -sp_offset +
-	    (ARGS_GROW_DOWNWARD
-	    ? FLOOR_ROUND (offset_ptr->constant + sp_offset, boundary_in_bytes)
-	    : CEIL_ROUND (offset_ptr->constant + sp_offset, boundary_in_bytes));
+	  if (ARGS_GROW_DOWNWARD)
+	    offset_ptr->constant -= misalign;
+	  else
+	    offset_ptr->constant += -misalign & (boundary_in_bytes - 1);
 
-	    if (boundary > PARM_BOUNDARY)
-	      alignment_pad->constant = offset_ptr->constant - save_constant;
+	  if (boundary > PARM_BOUNDARY)
+	    alignment_pad->constant = offset_ptr->constant - save_constant;
 	}
     }
 }
Index: gcc/toplev.c
===================================================================
--- gcc/toplev.c	2017-10-23 16:52:18.296738218 +0100
+++ gcc/toplev.c	2017-10-23 17:19:01.412170318 +0100
@@ -958,19 +958,32 @@ output_stack_usage (void)
   stack_usage_kind = STATIC;
 
   /* Add the maximum amount of space pushed onto the stack.  */
-  if (current_function_pushed_stack_size > 0)
+  if (maybe_nonzero (current_function_pushed_stack_size))
     {
-      stack_usage += current_function_pushed_stack_size;
-      stack_usage_kind = DYNAMIC_BOUNDED;
+      HOST_WIDE_INT extra;
+      if (current_function_pushed_stack_size.is_constant (&extra))
+	{
+	  stack_usage += extra;
+	  stack_usage_kind = DYNAMIC_BOUNDED;
+	}
+      else
+	{
+	  extra = constant_lower_bound (current_function_pushed_stack_size);
+	  stack_usage += extra;
+	  stack_usage_kind = DYNAMIC;
+	}
     }
 
   /* Now on to the tricky part: dynamic stack allocation.  */
   if (current_function_allocates_dynamic_stack_space)
     {
-      if (current_function_has_unbounded_dynamic_stack_size)
-	stack_usage_kind = DYNAMIC;
-      else
-	stack_usage_kind = DYNAMIC_BOUNDED;
+      if (stack_usage_kind != DYNAMIC)
+	{
+	  if (current_function_has_unbounded_dynamic_stack_size)
+	    stack_usage_kind = DYNAMIC;
+	  else
+	    stack_usage_kind = DYNAMIC_BOUNDED;
+	}
 
       /* Add the size even in the unbounded case, this can't hurt.  */
       stack_usage += current_function_dynamic_stack_size;

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [046/nnn] poly_int: instantiate_virtual_regs
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (45 preceding siblings ...)
  2017-10-23 17:20 ` [047/nnn] poly_int: argument sizes Richard Sandiford
@ 2017-10-23 17:20 ` Richard Sandiford
  2017-11-28 18:00   ` Jeff Law
  2017-10-23 17:21 ` [048/nnn] poly_int: cfgexpand stack variables Richard Sandiford
                   ` (60 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:20 UTC (permalink / raw)
  To: gcc-patches

This patch makes the instantiate virtual regs pass track offsets
as poly_ints.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* function.c (in_arg_offset, var_offset, dynamic_offset)
	(out_arg_offset, cfa_offset): Change from int to poly_int64.
	(instantiate_new_reg): Return the new offset as a poly_int64_pod
	rather than a HOST_WIDE_INT.
	(instantiate_virtual_regs_in_rtx): Track polynomial offsets.
	(instantiate_virtual_regs_in_insn): Likewise.

Index: gcc/function.c
===================================================================
--- gcc/function.c	2017-10-23 17:18:53.834514759 +0100
+++ gcc/function.c	2017-10-23 17:18:59.743148042 +0100
@@ -1367,11 +1367,11 @@ initial_value_entry (int i, rtx *hreg, r
    routines.  They contain the offsets of the virtual registers from their
    respective hard registers.  */
 
-static int in_arg_offset;
-static int var_offset;
-static int dynamic_offset;
-static int out_arg_offset;
-static int cfa_offset;
+static poly_int64 in_arg_offset;
+static poly_int64 var_offset;
+static poly_int64 dynamic_offset;
+static poly_int64 out_arg_offset;
+static poly_int64 cfa_offset;
 
 /* In most machines, the stack pointer register is equivalent to the bottom
    of the stack.  */
@@ -1418,10 +1418,10 @@ #define STACK_DYNAMIC_OFFSET(FNDECL)	\
    offset indirectly through the pointer.  Otherwise, return 0.  */
 
 static rtx
-instantiate_new_reg (rtx x, HOST_WIDE_INT *poffset)
+instantiate_new_reg (rtx x, poly_int64_pod *poffset)
 {
   rtx new_rtx;
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
 
   if (x == virtual_incoming_args_rtx)
     {
@@ -1480,7 +1480,7 @@ instantiate_virtual_regs_in_rtx (rtx *lo
       if (rtx x = *loc)
 	{
 	  rtx new_rtx;
-	  HOST_WIDE_INT offset;
+	  poly_int64 offset;
 	  switch (GET_CODE (x))
 	    {
 	    case REG:
@@ -1533,7 +1533,7 @@ safe_insn_predicate (int code, int opera
 static void
 instantiate_virtual_regs_in_insn (rtx_insn *insn)
 {
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
   int insn_code, i;
   bool any_change = false;
   rtx set, new_rtx, x;
@@ -1572,7 +1572,8 @@ instantiate_virtual_regs_in_insn (rtx_in
 	 to the generic case is avoiding a new pseudo and eliminating a
 	 move insn in the initial rtl stream.  */
       new_rtx = instantiate_new_reg (SET_SRC (set), &offset);
-      if (new_rtx && offset != 0
+      if (new_rtx
+	  && maybe_nonzero (offset)
 	  && REG_P (SET_DEST (set))
 	  && REGNO (SET_DEST (set)) > LAST_VIRTUAL_REGISTER)
 	{
@@ -1598,17 +1599,18 @@ instantiate_virtual_regs_in_insn (rtx_in
 
       /* Handle a plus involving a virtual register by determining if the
 	 operands remain valid if they're modified in place.  */
+      poly_int64 delta;
       if (GET_CODE (SET_SRC (set)) == PLUS
 	  && recog_data.n_operands >= 3
 	  && recog_data.operand_loc[1] == &XEXP (SET_SRC (set), 0)
 	  && recog_data.operand_loc[2] == &XEXP (SET_SRC (set), 1)
-	  && CONST_INT_P (recog_data.operand[2])
+	  && poly_int_rtx_p (recog_data.operand[2], &delta)
 	  && (new_rtx = instantiate_new_reg (recog_data.operand[1], &offset)))
 	{
-	  offset += INTVAL (recog_data.operand[2]);
+	  offset += delta;
 
 	  /* If the sum is zero, then replace with a plain move.  */
-	  if (offset == 0
+	  if (known_zero (offset)
 	      && REG_P (SET_DEST (set))
 	      && REGNO (SET_DEST (set)) > LAST_VIRTUAL_REGISTER)
 	    {
@@ -1686,7 +1688,7 @@ instantiate_virtual_regs_in_insn (rtx_in
 	  new_rtx = instantiate_new_reg (x, &offset);
 	  if (new_rtx == NULL)
 	    continue;
-	  if (offset == 0)
+	  if (known_zero (offset))
 	    x = new_rtx;
 	  else
 	    {
@@ -1711,7 +1713,7 @@ instantiate_virtual_regs_in_insn (rtx_in
 	  new_rtx = instantiate_new_reg (SUBREG_REG (x), &offset);
 	  if (new_rtx == NULL)
 	    continue;
-	  if (offset != 0)
+	  if (maybe_nonzero (offset))
 	    {
 	      start_sequence ();
 	      new_rtx = expand_simple_binop

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [048/nnn] poly_int: cfgexpand stack variables
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (46 preceding siblings ...)
  2017-10-23 17:20 ` [046/nnn] poly_int: instantiate_virtual_regs Richard Sandiford
@ 2017-10-23 17:21 ` Richard Sandiford
  2017-12-05 23:22   ` Jeff Law
  2017-10-23 17:21 ` [050/nnn] poly_int: reload<->ira interface Richard Sandiford
                   ` (59 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:21 UTC (permalink / raw)
  To: gcc-patches

This patch changes the type of stack_var::size from HOST_WIDE_INT
to poly_uint64.  The difference in signedness is because the
field was set by:

  v->size = tree_to_uhwi (size);


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* cfgexpand.c (stack_var::size): Change from a HOST_WIDE_INT
	to a poly_uint64.
	(add_stack_var, stack_var_cmp, partition_stack_vars)
	(dump_stack_var_partition): Update accordingly.
	(alloc_stack_frame_space): Take the size as a poly_int64 rather
	than a HOST_WIDE_INT.
	(expand_stack_vars, expand_one_stack_var_1): Handle polynomial sizes.
	(defer_stack_allocation, estimated_stack_frame_size): Likewise.
	(account_stack_vars, expand_one_var): Likewise.  Return a poly_uint64
	rather than a HOST_WIDE_INT.

Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-10-23 17:18:53.827515374 +0100
+++ gcc/cfgexpand.c	2017-10-23 17:19:04.559212322 +0100
@@ -314,7 +314,7 @@ struct stack_var
 
   /* Initially, the size of the variable.  Later, the size of the partition,
      if this variable becomes it's partition's representative.  */
-  HOST_WIDE_INT size;
+  poly_uint64 size;
 
   /* The *byte* alignment required for this variable.  Or as, with the
      size, the alignment for this partition.  */
@@ -390,7 +390,7 @@ align_base (HOST_WIDE_INT base, unsigned
    Return the frame offset.  */
 
 static poly_int64
-alloc_stack_frame_space (HOST_WIDE_INT size, unsigned HOST_WIDE_INT align)
+alloc_stack_frame_space (poly_int64 size, unsigned HOST_WIDE_INT align)
 {
   poly_int64 offset, new_frame_offset;
 
@@ -443,10 +443,10 @@ add_stack_var (tree decl)
   tree size = TREE_CODE (decl) == SSA_NAME
     ? TYPE_SIZE_UNIT (TREE_TYPE (decl))
     : DECL_SIZE_UNIT (decl);
-  v->size = tree_to_uhwi (size);
+  v->size = tree_to_poly_uint64 (size);
   /* Ensure that all variables have size, so that &a != &b for any two
      variables that are simultaneously live.  */
-  if (v->size == 0)
+  if (known_zero (v->size))
     v->size = 1;
   v->alignb = align_local_variable (decl);
   /* An alignment of zero can mightily confuse us later.  */
@@ -676,8 +676,8 @@ stack_var_cmp (const void *a, const void
   size_t ib = *(const size_t *)b;
   unsigned int aligna = stack_vars[ia].alignb;
   unsigned int alignb = stack_vars[ib].alignb;
-  HOST_WIDE_INT sizea = stack_vars[ia].size;
-  HOST_WIDE_INT sizeb = stack_vars[ib].size;
+  poly_int64 sizea = stack_vars[ia].size;
+  poly_int64 sizeb = stack_vars[ib].size;
   tree decla = stack_vars[ia].decl;
   tree declb = stack_vars[ib].decl;
   bool largea, largeb;
@@ -690,10 +690,9 @@ stack_var_cmp (const void *a, const void
     return (int)largeb - (int)largea;
 
   /* Secondary compare on size, decreasing  */
-  if (sizea > sizeb)
-    return -1;
-  if (sizea < sizeb)
-    return 1;
+  int diff = compare_sizes_for_sort (sizeb, sizea);
+  if (diff != 0)
+    return diff;
 
   /* Tertiary compare on true alignment, decreasing.  */
   if (aligna < alignb)
@@ -904,7 +903,7 @@ partition_stack_vars (void)
     {
       size_t i = stack_vars_sorted[si];
       unsigned int ialign = stack_vars[i].alignb;
-      HOST_WIDE_INT isize = stack_vars[i].size;
+      poly_int64 isize = stack_vars[i].size;
 
       /* Ignore objects that aren't partition representatives. If we
          see a var that is not a partition representative, it must
@@ -916,7 +915,7 @@ partition_stack_vars (void)
 	{
 	  size_t j = stack_vars_sorted[sj];
 	  unsigned int jalign = stack_vars[j].alignb;
-	  HOST_WIDE_INT jsize = stack_vars[j].size;
+	  poly_int64 jsize = stack_vars[j].size;
 
 	  /* Ignore objects that aren't partition representatives.  */
 	  if (stack_vars[j].representative != j)
@@ -932,8 +931,8 @@ partition_stack_vars (void)
 	     sizes, as the shorter vars wouldn't be adequately protected.
 	     Don't do that for "large" (unsupported) alignment objects,
 	     those aren't protected anyway.  */
-	  if ((asan_sanitize_stack_p ())
-	      && isize != jsize
+	  if (asan_sanitize_stack_p ()
+	      && may_ne (isize, jsize)
 	      && ialign * BITS_PER_UNIT <= MAX_SUPPORTED_STACK_ALIGNMENT)
 	    break;
 
@@ -964,9 +963,9 @@ dump_stack_var_partition (void)
       if (stack_vars[i].representative != i)
 	continue;
 
-      fprintf (dump_file, "Partition %lu: size " HOST_WIDE_INT_PRINT_DEC
-	       " align %u\n", (unsigned long) i, stack_vars[i].size,
-	       stack_vars[i].alignb);
+      fprintf (dump_file, "Partition %lu: size ", (unsigned long) i);
+      print_dec (stack_vars[i].size, dump_file);
+      fprintf (dump_file, " align %u\n", stack_vars[i].alignb);
 
       for (j = i; j != EOC; j = stack_vars[j].next)
 	{
@@ -1042,7 +1041,7 @@ struct stack_vars_data
 expand_stack_vars (bool (*pred) (size_t), struct stack_vars_data *data)
 {
   size_t si, i, j, n = stack_vars_num;
-  HOST_WIDE_INT large_size = 0, large_alloc = 0;
+  poly_uint64 large_size = 0, large_alloc = 0;
   rtx large_base = NULL;
   unsigned large_align = 0;
   bool large_allocation_done = false;
@@ -1085,8 +1084,7 @@ expand_stack_vars (bool (*pred) (size_t)
 	      : DECL_RTL (decl) != pc_rtx)
 	    continue;
 
-	  large_size += alignb - 1;
-	  large_size &= -(HOST_WIDE_INT)alignb;
+	  large_size = aligned_upper_bound (large_size, alignb);
 	  large_size += stack_vars[i].size;
 	}
     }
@@ -1125,7 +1123,8 @@ expand_stack_vars (bool (*pred) (size_t)
 	  HOST_WIDE_INT prev_offset;
 	  if (asan_sanitize_stack_p ()
 	      && pred
-	      && frame_offset.is_constant (&prev_offset))
+	      && frame_offset.is_constant (&prev_offset)
+	      && stack_vars[i].size.is_constant ())
 	    {
 	      prev_offset = align_base (prev_offset,
 					MAX (alignb, ASAN_RED_ZONE_SIZE),
@@ -1184,23 +1183,22 @@ expand_stack_vars (bool (*pred) (size_t)
 
 	  /* If there were any variables requiring "large" alignment, allocate
 	     space.  */
-	  if (large_size > 0 && ! large_allocation_done)
+	  if (maybe_nonzero (large_size) && ! large_allocation_done)
 	    {
 	      poly_int64 loffset;
 	      rtx large_allocsize;
 
-	      large_allocsize = GEN_INT (large_size);
+	      large_allocsize = gen_int_mode (large_size, Pmode);
 	      get_dynamic_stack_size (&large_allocsize, 0, large_align, NULL);
 	      loffset = alloc_stack_frame_space
-		(INTVAL (large_allocsize),
+		(rtx_to_poly_int64 (large_allocsize),
 		 PREFERRED_STACK_BOUNDARY / BITS_PER_UNIT);
 	      large_base = get_dynamic_stack_base (loffset, large_align);
 	      large_allocation_done = true;
 	    }
 	  gcc_assert (large_base != NULL);
 
-	  large_alloc += alignb - 1;
-	  large_alloc &= -(HOST_WIDE_INT)alignb;
+	  large_alloc = aligned_upper_bound (large_alloc, alignb);
 	  offset = large_alloc;
 	  large_alloc += stack_vars[i].size;
 
@@ -1218,15 +1216,15 @@ expand_stack_vars (bool (*pred) (size_t)
 	}
     }
 
-  gcc_assert (large_alloc == large_size);
+  gcc_assert (must_eq (large_alloc, large_size));
 }
 
 /* Take into account all sizes of partitions and reset DECL_RTLs.  */
-static HOST_WIDE_INT
+static poly_uint64
 account_stack_vars (void)
 {
   size_t si, j, i, n = stack_vars_num;
-  HOST_WIDE_INT size = 0;
+  poly_uint64 size = 0;
 
   for (si = 0; si < n; ++si)
     {
@@ -1289,19 +1287,19 @@ set_parm_rtl (tree parm, rtx x)
 static void
 expand_one_stack_var_1 (tree var)
 {
-  HOST_WIDE_INT size;
+  poly_uint64 size;
   poly_int64 offset;
   unsigned byte_align;
 
   if (TREE_CODE (var) == SSA_NAME)
     {
       tree type = TREE_TYPE (var);
-      size = tree_to_uhwi (TYPE_SIZE_UNIT (type));
+      size = tree_to_poly_uint64 (TYPE_SIZE_UNIT (type));
       byte_align = TYPE_ALIGN_UNIT (type);
     }
   else
     {
-      size = tree_to_uhwi (DECL_SIZE_UNIT (var));
+      size = tree_to_poly_uint64 (DECL_SIZE_UNIT (var));
       byte_align = align_local_variable (var);
     }
 
@@ -1505,12 +1503,14 @@ defer_stack_allocation (tree var, bool t
   tree size_unit = TREE_CODE (var) == SSA_NAME
     ? TYPE_SIZE_UNIT (TREE_TYPE (var))
     : DECL_SIZE_UNIT (var);
+  poly_uint64 size;
 
   /* Whether the variable is small enough for immediate allocation not to be
      a problem with regard to the frame size.  */
   bool smallish
-    = ((HOST_WIDE_INT) tree_to_uhwi (size_unit)
-       < PARAM_VALUE (PARAM_MIN_SIZE_FOR_STACK_SHARING));
+    = (poly_int_tree_p (size_unit, &size)
+       && (estimated_poly_value (size)
+	   < PARAM_VALUE (PARAM_MIN_SIZE_FOR_STACK_SHARING)));
 
   /* If stack protection is enabled, *all* stack variables must be deferred,
      so that we can re-order the strings to the top of the frame.
@@ -1564,7 +1564,7 @@ defer_stack_allocation (tree var, bool t
    Return stack usage this variable is supposed to take.
 */
 
-static HOST_WIDE_INT
+static poly_uint64
 expand_one_var (tree var, bool toplevel, bool really_expand)
 {
   unsigned int align = BITS_PER_UNIT;
@@ -1607,6 +1607,7 @@ expand_one_var (tree var, bool toplevel,
 
   record_alignment_for_reg_var (align);
 
+  poly_uint64 size;
   if (TREE_CODE (origvar) == SSA_NAME)
     {
       gcc_assert (!VAR_P (var)
@@ -1647,7 +1648,8 @@ expand_one_var (tree var, bool toplevel,
       if (really_expand)
         expand_one_register_var (origvar);
     }
-  else if (! valid_constant_size_p (DECL_SIZE_UNIT (var)))
+  else if (!poly_int_tree_p (DECL_SIZE_UNIT (var), &size)
+	   || !valid_constant_size_p (DECL_SIZE_UNIT (var)))
     {
       /* Reject variables which cover more than half of the address-space.  */
       if (really_expand)
@@ -1669,9 +1671,7 @@ expand_one_var (tree var, bool toplevel,
 
           expand_one_stack_var (origvar);
         }
-
-
-      return tree_to_uhwi (DECL_SIZE_UNIT (var));
+      return size;
     }
   return 0;
 }
@@ -1924,7 +1924,7 @@ fini_vars_expansion (void)
 HOST_WIDE_INT
 estimated_stack_frame_size (struct cgraph_node *node)
 {
-  HOST_WIDE_INT size = 0;
+  poly_int64 size = 0;
   size_t i;
   tree var;
   struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
@@ -1948,7 +1948,7 @@ estimated_stack_frame_size (struct cgrap
 
   fini_vars_expansion ();
   pop_cfun ();
-  return size;
+  return estimated_poly_value (size);
 }
 
 /* Helper routine to check if a record or union contains an array field. */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [049/nnn] poly_int: emit_inc
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (48 preceding siblings ...)
  2017-10-23 17:21 ` [050/nnn] poly_int: reload<->ira interface Richard Sandiford
@ 2017-10-23 17:21 ` Richard Sandiford
  2017-11-28 17:30   ` Jeff Law
  2017-10-23 17:22 ` [051/nnn] poly_int: emit_group_load/store Richard Sandiford
                   ` (57 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:21 UTC (permalink / raw)
  To: gcc-patches

This patch changes the LRA emit_inc routine so that it takes
a poly_int64 rather than an int.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* lra-constraints.c (emit_inc): Change inc_amount from an int
	to a poly_int64.

Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c	2017-10-23 17:19:21.001863152 +0100
+++ gcc/lra-constraints.c	2017-10-23 17:20:47.003797985 +0100
@@ -3533,7 +3533,7 @@ process_address (int nop, bool check_onl
 
    Return pseudo containing the result.	 */
 static rtx
-emit_inc (enum reg_class new_rclass, rtx in, rtx value, int inc_amount)
+emit_inc (enum reg_class new_rclass, rtx in, rtx value, poly_int64 inc_amount)
 {
   /* REG or MEM to be copied and incremented.  */
   rtx incloc = XEXP (value, 0);
@@ -3561,7 +3561,7 @@ emit_inc (enum reg_class new_rclass, rtx
       if (GET_CODE (value) == PRE_DEC || GET_CODE (value) == POST_DEC)
 	inc_amount = -inc_amount;
 
-      inc = GEN_INT (inc_amount);
+      inc = gen_int_mode (inc_amount, GET_MODE (value));
     }
 
   if (! post && REG_P (incloc))

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [050/nnn] poly_int: reload<->ira interface
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (47 preceding siblings ...)
  2017-10-23 17:21 ` [048/nnn] poly_int: cfgexpand stack variables Richard Sandiford
@ 2017-10-23 17:21 ` Richard Sandiford
  2017-11-28 16:55   ` Jeff Law
  2017-10-23 17:21 ` [049/nnn] poly_int: emit_inc Richard Sandiford
                   ` (58 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:21 UTC (permalink / raw)
  To: gcc-patches

This patch uses poly_int64 for:

- ira_reuse_stack_slot
- ira_mark_new_stack_slot
- ira_spilled_reg_stack_slot::width

all of which are part of the IRA/reload interface.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* ira-int.h (ira_spilled_reg_stack_slot::width): Change from
	an unsigned int to a poly_uint64.
	* ira.h (ira_reuse_stack_slot, ira_mark_new_stack_slot): Take the
	sizes as poly_uint64s rather than unsigned ints.
	* ira-color.c (ira_reuse_stack_slot, ira_mark_new_stack_slot):
	Likewise.

Index: gcc/ira-int.h
===================================================================
--- gcc/ira-int.h	2017-10-23 16:52:18.222670182 +0100
+++ gcc/ira-int.h	2017-10-23 17:20:48.204761416 +0100
@@ -604,7 +604,7 @@ struct ira_spilled_reg_stack_slot
   /* RTL representation of the stack slot.  */
   rtx mem;
   /* Size of the stack slot.  */
-  unsigned int width;
+  poly_uint64_pod width;
 };
 
 /* The number of elements in the following array.  */
Index: gcc/ira.h
===================================================================
--- gcc/ira.h	2017-10-23 17:10:45.257213436 +0100
+++ gcc/ira.h	2017-10-23 17:20:48.204761416 +0100
@@ -200,8 +200,8 @@ extern void ira_mark_allocation_change (
 extern void ira_mark_memory_move_deletion (int, int);
 extern bool ira_reassign_pseudos (int *, int, HARD_REG_SET, HARD_REG_SET *,
 				  HARD_REG_SET *, bitmap);
-extern rtx ira_reuse_stack_slot (int, unsigned int, unsigned int);
-extern void ira_mark_new_stack_slot (rtx, int, unsigned int);
+extern rtx ira_reuse_stack_slot (int, poly_uint64, poly_uint64);
+extern void ira_mark_new_stack_slot (rtx, int, poly_uint64);
 extern bool ira_better_spill_reload_regno_p (int *, int *, rtx, rtx, rtx_insn *);
 extern bool ira_bad_reload_regno (int, rtx, rtx);
 
Index: gcc/ira-color.c
===================================================================
--- gcc/ira-color.c	2017-10-23 17:11:40.005487591 +0100
+++ gcc/ira-color.c	2017-10-23 17:20:48.204761416 +0100
@@ -4495,8 +4495,8 @@ ira_reassign_pseudos (int *spilled_pseud
    TOTAL_SIZE.  In the case of failure to find a slot which can be
    used for REGNO, the function returns NULL.  */
 rtx
-ira_reuse_stack_slot (int regno, unsigned int inherent_size,
-		      unsigned int total_size)
+ira_reuse_stack_slot (int regno, poly_uint64 inherent_size,
+		      poly_uint64 total_size)
 {
   unsigned int i;
   int slot_num, best_slot_num;
@@ -4509,8 +4509,8 @@ ira_reuse_stack_slot (int regno, unsigne
 
   ira_assert (! ira_use_lra_p);
 
-  ira_assert (inherent_size == PSEUDO_REGNO_BYTES (regno)
-	      && inherent_size <= total_size
+  ira_assert (must_eq (inherent_size, PSEUDO_REGNO_BYTES (regno))
+	      && must_le (inherent_size, total_size)
 	      && ALLOCNO_HARD_REGNO (allocno) < 0);
   if (! flag_ira_share_spill_slots)
     return NULL_RTX;
@@ -4533,8 +4533,8 @@ ira_reuse_stack_slot (int regno, unsigne
 	  slot = &ira_spilled_reg_stack_slots[slot_num];
 	  if (slot->mem == NULL_RTX)
 	    continue;
-	  if (slot->width < total_size
-	      || GET_MODE_SIZE (GET_MODE (slot->mem)) < inherent_size)
+	  if (may_lt (slot->width, total_size)
+	      || may_lt (GET_MODE_SIZE (GET_MODE (slot->mem)), inherent_size))
 	    continue;
 
 	  EXECUTE_IF_SET_IN_BITMAP (&slot->spilled_regs,
@@ -4586,7 +4586,7 @@ ira_reuse_stack_slot (int regno, unsigne
     }
   if (x != NULL_RTX)
     {
-      ira_assert (slot->width >= total_size);
+      ira_assert (must_ge (slot->width, total_size));
 #ifdef ENABLE_IRA_CHECKING
       EXECUTE_IF_SET_IN_BITMAP (&slot->spilled_regs,
 				FIRST_PSEUDO_REGISTER, i, bi)
@@ -4615,7 +4615,7 @@ ira_reuse_stack_slot (int regno, unsigne
    TOTAL_SIZE was allocated for REGNO.  We store this info for
    subsequent ira_reuse_stack_slot calls.  */
 void
-ira_mark_new_stack_slot (rtx x, int regno, unsigned int total_size)
+ira_mark_new_stack_slot (rtx x, int regno, poly_uint64 total_size)
 {
   struct ira_spilled_reg_stack_slot *slot;
   int slot_num;
@@ -4623,7 +4623,7 @@ ira_mark_new_stack_slot (rtx x, int regn
 
   ira_assert (! ira_use_lra_p);
 
-  ira_assert (PSEUDO_REGNO_BYTES (regno) <= total_size);
+  ira_assert (must_le (PSEUDO_REGNO_BYTES (regno), total_size));
   allocno = ira_regno_allocno_map[regno];
   slot_num = -ALLOCNO_HARD_REGNO (allocno) - 2;
   if (slot_num == -1)

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [052/nnn] poly_int: bit_field_size/offset
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (50 preceding siblings ...)
  2017-10-23 17:22 ` [051/nnn] poly_int: emit_group_load/store Richard Sandiford
@ 2017-10-23 17:22 ` Richard Sandiford
  2017-12-05 17:25   ` Jeff Law
  2017-10-23 17:22 ` [053/nnn] poly_int: decode_addr_const Richard Sandiford
                   ` (55 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:22 UTC (permalink / raw)
  To: gcc-patches

verify_expr ensured that the size and offset in gimple BIT_FIELD_REFs
satisfied tree_fits_uhwi_p.  This patch extends that so that they can
be poly_uint64s, and adds helper routines for accessing them when the
verify_expr requirements apply.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree.h (bit_field_size, bit_field_offset): New functions.
	* hsa-gen.c (gen_hsa_addr): Use them.
	* tree-ssa-forwprop.c (simplify_bitfield_ref): Likewise.
	(simplify_vector_constructor): Likewise.
	* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
	* tree-cfg.c (verify_expr): Require the sizes and offsets of a
	BIT_FIELD_REF to be poly_uint64s rather than uhwis.
	* fold-const.c (fold_ternary_loc): Protect tree_to_uhwi with
	tree_fits_uhwi_p.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 17:18:47.668056833 +0100
+++ gcc/tree.h	2017-10-23 17:20:50.884679814 +0100
@@ -4764,6 +4764,24 @@ poly_int_tree_p (const_tree t)
   return (TREE_CODE (t) == INTEGER_CST || POLY_INT_CST_P (t));
 }
 
+/* Return the bit size of BIT_FIELD_REF T, in cases where it is known
+   to be a poly_uint64.  (This is always true at the gimple level.)  */
+
+inline poly_uint64
+bit_field_size (const_tree t)
+{
+  return tree_to_poly_uint64 (TREE_OPERAND (t, 1));
+}
+
+/* Return the starting bit offset of BIT_FIELD_REF T, in cases where it is
+   known to be a poly_uint64.  (This is always true at the gimple level.)  */
+
+inline poly_uint64
+bit_field_offset (const_tree t)
+{
+  return tree_to_poly_uint64 (TREE_OPERAND (t, 2));
+}
+
 extern tree strip_float_extensions (tree);
 extern int really_constant_p (const_tree);
 extern bool ptrdiff_tree_p (const_tree, poly_int64_pod *);
Index: gcc/hsa-gen.c
===================================================================
--- gcc/hsa-gen.c	2017-10-23 17:18:47.664057184 +0100
+++ gcc/hsa-gen.c	2017-10-23 17:20:50.882679875 +0100
@@ -1959,8 +1959,8 @@ gen_hsa_addr (tree ref, hsa_bb *hbb, HOS
       goto out;
     }
   else if (TREE_CODE (ref) == BIT_FIELD_REF
-	   && ((tree_to_uhwi (TREE_OPERAND (ref, 1)) % BITS_PER_UNIT) != 0
-	       || (tree_to_uhwi (TREE_OPERAND (ref, 2)) % BITS_PER_UNIT) != 0))
+	   && (!multiple_p (bit_field_size (ref), BITS_PER_UNIT)
+	       || !multiple_p (bit_field_offset (ref), BITS_PER_UNIT)))
     {
       HSA_SORRY_ATV (EXPR_LOCATION (origref),
 		     "support for HSA does not implement "
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2017-10-23 17:17:01.434034223 +0100
+++ gcc/tree-ssa-forwprop.c	2017-10-23 17:20:50.883679845 +0100
@@ -1727,7 +1727,7 @@ simplify_bitfield_ref (gimple_stmt_itera
   gimple *def_stmt;
   tree op, op0, op1, op2;
   tree elem_type;
-  unsigned idx, n, size;
+  unsigned idx, size;
   enum tree_code code;
 
   op = gimple_assign_rhs1 (stmt);
@@ -1762,12 +1762,11 @@ simplify_bitfield_ref (gimple_stmt_itera
     return false;
 
   size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
-  n = TREE_INT_CST_LOW (op1) / size;
-  if (n != 1)
+  if (may_ne (bit_field_size (op), size))
     return false;
-  idx = TREE_INT_CST_LOW (op2) / size;
 
-  if (code == VEC_PERM_EXPR)
+  if (code == VEC_PERM_EXPR
+      && constant_multiple_p (bit_field_offset (op), size, &idx))
     {
       tree p, m, tem;
       unsigned nelts;
@@ -2020,9 +2019,10 @@ simplify_vector_constructor (gimple_stmt
 	    return false;
 	  orig = ref;
 	}
-      if (TREE_INT_CST_LOW (TREE_OPERAND (op1, 1)) != elem_size)
+      unsigned int elt;
+      if (may_ne (bit_field_size (op1), elem_size)
+	  || !constant_multiple_p (bit_field_offset (op1), elem_size, &elt))
 	return false;
-      unsigned int elt = TREE_INT_CST_LOW (TREE_OPERAND (op1, 2)) / elem_size;
       if (elt != i)
 	maybe_ident = false;
       sel.quick_push (elt);
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	2017-10-23 17:17:01.435034088 +0100
+++ gcc/tree-ssa-sccvn.c	2017-10-23 17:20:50.884679814 +0100
@@ -766,12 +766,8 @@ copy_reference_ops_from_ref (tree ref, v
 	  /* Record bits, position and storage order.  */
 	  temp.op0 = TREE_OPERAND (ref, 1);
 	  temp.op1 = TREE_OPERAND (ref, 2);
-	  if (tree_fits_shwi_p (TREE_OPERAND (ref, 2)))
-	    {
-	      HOST_WIDE_INT off = tree_to_shwi (TREE_OPERAND (ref, 2));
-	      if (off % BITS_PER_UNIT == 0)
-		temp.off = off / BITS_PER_UNIT;
-	    }
+	  if (!multiple_p (bit_field_offset (ref), BITS_PER_UNIT, &temp.off))
+	    temp.off = -1;
 	  temp.reverse = REF_REVERSE_STORAGE_ORDER (ref);
 	  break;
 	case COMPONENT_REF:
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-10-23 17:11:40.247950952 +0100
+++ gcc/tree-cfg.c	2017-10-23 17:20:50.883679845 +0100
@@ -3054,8 +3054,9 @@ #define CHECK_OP(N, MSG) \
 	  tree t0 = TREE_OPERAND (t, 0);
 	  tree t1 = TREE_OPERAND (t, 1);
 	  tree t2 = TREE_OPERAND (t, 2);
-	  if (!tree_fits_uhwi_p (t1)
-	      || !tree_fits_uhwi_p (t2)
+	  poly_uint64 size, bitpos;
+	  if (!poly_int_tree_p (t1, &size)
+	      || !poly_int_tree_p (t2, &bitpos)
 	      || !types_compatible_p (bitsizetype, TREE_TYPE (t1))
 	      || !types_compatible_p (bitsizetype, TREE_TYPE (t2)))
 	    {
@@ -3063,8 +3064,7 @@ #define CHECK_OP(N, MSG) \
 	      return t;
 	    }
 	  if (INTEGRAL_TYPE_P (TREE_TYPE (t))
-	      && (TYPE_PRECISION (TREE_TYPE (t))
-		  != tree_to_uhwi (t1)))
+	      && may_ne (TYPE_PRECISION (TREE_TYPE (t)), size))
 	    {
 	      error ("integral result type precision does not match "
 		     "field size of BIT_FIELD_REF");
@@ -3072,16 +3072,16 @@ #define CHECK_OP(N, MSG) \
 	    }
 	  else if (!INTEGRAL_TYPE_P (TREE_TYPE (t))
 		   && TYPE_MODE (TREE_TYPE (t)) != BLKmode
-		   && (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (t)))
-		       != tree_to_uhwi (t1)))
+		   && may_ne (GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (t))),
+			      size))
 	    {
 	      error ("mode size of non-integral result does not "
 		     "match field size of BIT_FIELD_REF");
 	      return t;
 	    }
 	  if (!AGGREGATE_TYPE_P (TREE_TYPE (t0))
-	      && (tree_to_uhwi (t1) + tree_to_uhwi (t2)
-		  > tree_to_uhwi (TYPE_SIZE (TREE_TYPE (t0)))))
+	      && may_gt (size + bitpos,
+			 tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (t0)))))
 	    {
 	      error ("position plus size exceeds size of referenced object in "
 		     "BIT_FIELD_REF");
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 17:18:47.662057360 +0100
+++ gcc/fold-const.c	2017-10-23 17:20:50.881679906 +0100
@@ -11728,7 +11728,9 @@ fold_ternary_loc (location_t loc, enum t
          fold (nearly) all BIT_FIELD_REFs.  */
       if (CONSTANT_CLASS_P (arg0)
 	  && can_native_interpret_type_p (type)
-	  && BITS_PER_UNIT == 8)
+	  && BITS_PER_UNIT == 8
+	  && tree_fits_uhwi_p (op1)
+	  && tree_fits_uhwi_p (op2))
 	{
 	  unsigned HOST_WIDE_INT bitpos = tree_to_uhwi (op2);
 	  unsigned HOST_WIDE_INT bitsize = tree_to_uhwi (op1);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [053/nnn] poly_int: decode_addr_const
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (51 preceding siblings ...)
  2017-10-23 17:22 ` [052/nnn] poly_int: bit_field_size/offset Richard Sandiford
@ 2017-10-23 17:22 ` Richard Sandiford
  2017-11-28 16:53   ` Jeff Law
  2017-10-23 17:23 ` [055/nnn] poly_int: find_bswap_or_nop_load Richard Sandiford
                   ` (54 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:22 UTC (permalink / raw)
  To: gcc-patches

This patch makes the varasm-local addr_const track polynomial offsets.
I'm not sure how useful this is, but it was easier to convert than not.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* varasm.c (addr_const::offset): Change from HOST_WIDE_INT
	to poly_int64.
	(decode_addr_const): Update accordingly.

Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	2017-10-23 17:11:39.974428235 +0100
+++ gcc/varasm.c	2017-10-23 17:20:52.530629696 +0100
@@ -2873,29 +2873,31 @@ assemble_real (REAL_VALUE_TYPE d, scalar
 
 struct addr_const {
   rtx base;
-  HOST_WIDE_INT offset;
+  poly_int64 offset;
 };
 
 static void
 decode_addr_const (tree exp, struct addr_const *value)
 {
   tree target = TREE_OPERAND (exp, 0);
-  int offset = 0;
+  poly_int64 offset = 0;
   rtx x;
 
   while (1)
     {
+      poly_int64 bytepos;
       if (TREE_CODE (target) == COMPONENT_REF
-	  && tree_fits_shwi_p (byte_position (TREE_OPERAND (target, 1))))
+	  && poly_int_tree_p (byte_position (TREE_OPERAND (target, 1)),
+			      &bytepos))
 	{
-	  offset += int_byte_position (TREE_OPERAND (target, 1));
+	  offset += bytepos;
 	  target = TREE_OPERAND (target, 0);
 	}
       else if (TREE_CODE (target) == ARRAY_REF
 	       || TREE_CODE (target) == ARRAY_RANGE_REF)
 	{
 	  offset += (tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (target)))
-		     * tree_to_shwi (TREE_OPERAND (target, 1)));
+		     * tree_to_poly_int64 (TREE_OPERAND (target, 1)));
 	  target = TREE_OPERAND (target, 0);
 	}
       else if (TREE_CODE (target) == MEM_REF
@@ -3042,14 +3044,14 @@ const_hash_1 (const tree exp)
 	  case SYMBOL_REF:
 	    /* Don't hash the address of the SYMBOL_REF;
 	       only use the offset and the symbol name.  */
-	    hi = value.offset;
+	    hi = value.offset.coeffs[0];
 	    p = XSTR (value.base, 0);
 	    for (i = 0; p[i] != 0; i++)
 	      hi = ((hi * 613) + (unsigned) (p[i]));
 	    break;
 
 	  case LABEL_REF:
-	    hi = (value.offset
+	    hi = (value.offset.coeffs[0]
 		  + CODE_LABEL_NUMBER (label_ref_label (value.base)) * 13);
 	    break;
 
@@ -3242,7 +3244,7 @@ compare_constant (const tree t1, const t
 	decode_addr_const (t1, &value1);
 	decode_addr_const (t2, &value2);
 
-	if (value1.offset != value2.offset)
+	if (may_ne (value1.offset, value2.offset))
 	  return 0;
 
 	code = GET_CODE (value1.base);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [051/nnn] poly_int: emit_group_load/store
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (49 preceding siblings ...)
  2017-10-23 17:21 ` [049/nnn] poly_int: emit_inc Richard Sandiford
@ 2017-10-23 17:22 ` Richard Sandiford
  2017-12-05 23:26   ` Jeff Law
  2017-10-23 17:22 ` [052/nnn] poly_int: bit_field_size/offset Richard Sandiford
                   ` (56 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:22 UTC (permalink / raw)
  To: gcc-patches

This patch changes the sizes passed to emit_group_load and
emit_group_store from int to poly_int64.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expr.h (emit_group_load, emit_group_load_into_temps)
	(emit_group_store): Take the size as a poly_int64 rather than an int.
	* expr.c (emit_group_load_1, emit_group_load): Likewise.
	(emit_group_load_into_temp, emit_group_store): Likewise.

Index: gcc/expr.h
===================================================================
--- gcc/expr.h	2017-10-23 17:18:56.434286222 +0100
+++ gcc/expr.h	2017-10-23 17:20:49.571719793 +0100
@@ -128,10 +128,10 @@ extern rtx gen_group_rtx (rtx);
 
 /* Load a BLKmode value into non-consecutive registers represented by a
    PARALLEL.  */
-extern void emit_group_load (rtx, rtx, tree, int);
+extern void emit_group_load (rtx, rtx, tree, poly_int64);
 
 /* Similarly, but load into new temporaries.  */
-extern rtx emit_group_load_into_temps (rtx, rtx, tree, int);
+extern rtx emit_group_load_into_temps (rtx, rtx, tree, poly_int64);
 
 /* Move a non-consecutive group of registers represented by a PARALLEL into
    a non-consecutive group of registers represented by a PARALLEL.  */
@@ -142,7 +142,7 @@ extern rtx emit_group_move_into_temps (r
 
 /* Store a BLKmode value from non-consecutive registers represented by a
    PARALLEL.  */
-extern void emit_group_store (rtx, rtx, tree, int);
+extern void emit_group_store (rtx, rtx, tree, poly_int64);
 
 extern rtx maybe_emit_group_store (rtx, tree);
 
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:18:57.860160878 +0100
+++ gcc/expr.c	2017-10-23 17:20:49.571719793 +0100
@@ -2095,7 +2095,8 @@ gen_group_rtx (rtx orig)
    into corresponding XEXP (XVECEXP (DST, 0, i), 0) element.  */
 
 static void
-emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree type, int ssize)
+emit_group_load_1 (rtx *tmps, rtx dst, rtx orig_src, tree type,
+		   poly_int64 ssize)
 {
   rtx src;
   int start, i;
@@ -2134,12 +2135,16 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
   for (i = start; i < XVECLEN (dst, 0); i++)
     {
       machine_mode mode = GET_MODE (XEXP (XVECEXP (dst, 0, i), 0));
-      HOST_WIDE_INT bytepos = INTVAL (XEXP (XVECEXP (dst, 0, i), 1));
-      unsigned int bytelen = GET_MODE_SIZE (mode);
-      int shift = 0;
-
-      /* Handle trailing fragments that run over the size of the struct.  */
-      if (ssize >= 0 && bytepos + (HOST_WIDE_INT) bytelen > ssize)
+      poly_int64 bytepos = INTVAL (XEXP (XVECEXP (dst, 0, i), 1));
+      poly_int64 bytelen = GET_MODE_SIZE (mode);
+      poly_int64 shift = 0;
+
+      /* Handle trailing fragments that run over the size of the struct.
+	 It's the target's responsibility to make sure that the fragment
+	 cannot be strictly smaller in some cases and strictly larger
+	 in others.  */
+      gcc_checking_assert (ordered_p (bytepos + bytelen, ssize));
+      if (known_size_p (ssize) && may_gt (bytepos + bytelen, ssize))
 	{
 	  /* Arrange to shift the fragment to where it belongs.
 	     extract_bit_field loads to the lsb of the reg.  */
@@ -2153,7 +2158,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
 	      )
 	    shift = (bytelen - (ssize - bytepos)) * BITS_PER_UNIT;
 	  bytelen = ssize - bytepos;
-	  gcc_assert (bytelen > 0);
+	  gcc_assert (may_gt (bytelen, 0));
 	}
 
       /* If we won't be loading directly from memory, protect the real source
@@ -2177,33 +2182,34 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
       if (MEM_P (src)
 	  && (! targetm.slow_unaligned_access (mode, MEM_ALIGN (src))
 	      || MEM_ALIGN (src) >= GET_MODE_ALIGNMENT (mode))
-	  && bytepos * BITS_PER_UNIT % GET_MODE_ALIGNMENT (mode) == 0
-	  && bytelen == GET_MODE_SIZE (mode))
+	  && multiple_p (bytepos * BITS_PER_UNIT, GET_MODE_ALIGNMENT (mode))
+	  && must_eq (bytelen, GET_MODE_SIZE (mode)))
 	{
 	  tmps[i] = gen_reg_rtx (mode);
 	  emit_move_insn (tmps[i], adjust_address (src, mode, bytepos));
 	}
       else if (COMPLEX_MODE_P (mode)
 	       && GET_MODE (src) == mode
-	       && bytelen == GET_MODE_SIZE (mode))
+	       && must_eq (bytelen, GET_MODE_SIZE (mode)))
 	/* Let emit_move_complex do the bulk of the work.  */
 	tmps[i] = src;
       else if (GET_CODE (src) == CONCAT)
 	{
-	  unsigned int slen = GET_MODE_SIZE (GET_MODE (src));
-	  unsigned int slen0 = GET_MODE_SIZE (GET_MODE (XEXP (src, 0)));
-	  unsigned int elt = bytepos / slen0;
-	  unsigned int subpos = bytepos % slen0;
+	  poly_int64 slen = GET_MODE_SIZE (GET_MODE (src));
+	  poly_int64 slen0 = GET_MODE_SIZE (GET_MODE (XEXP (src, 0)));
+	  unsigned int elt;
+	  poly_int64 subpos;
 
-	  if (subpos + bytelen <= slen0)
+	  if (can_div_trunc_p (bytepos, slen0, &elt, &subpos)
+	      && must_le (subpos + bytelen, slen0))
 	    {
 	      /* The following assumes that the concatenated objects all
 		 have the same size.  In this case, a simple calculation
 		 can be used to determine the object and the bit field
 		 to be extracted.  */
 	      tmps[i] = XEXP (src, elt);
-	      if (subpos != 0
-		  || subpos + bytelen != slen0
+	      if (maybe_nonzero (subpos)
+		  || may_ne (subpos + bytelen, slen0)
 		  || (!CONSTANT_P (tmps[i])
 		      && (!REG_P (tmps[i]) || GET_MODE (tmps[i]) != mode)))
 		tmps[i] = extract_bit_field (tmps[i], bytelen * BITS_PER_UNIT,
@@ -2215,7 +2221,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
 	    {
 	      rtx mem;
 
-	      gcc_assert (!bytepos);
+	      gcc_assert (known_zero (bytepos));
 	      mem = assign_stack_temp (GET_MODE (src), slen);
 	      emit_move_insn (mem, src);
 	      tmps[i] = extract_bit_field (mem, bytelen * BITS_PER_UNIT,
@@ -2234,23 +2240,21 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
 
 	  mem = assign_stack_temp (GET_MODE (src), slen);
 	  emit_move_insn (mem, src);
-	  tmps[i] = adjust_address (mem, mode, (int) bytepos);
+	  tmps[i] = adjust_address (mem, mode, bytepos);
 	}
       else if (CONSTANT_P (src) && GET_MODE (dst) != BLKmode
                && XVECLEN (dst, 0) > 1)
         tmps[i] = simplify_gen_subreg (mode, src, GET_MODE (dst), bytepos);
       else if (CONSTANT_P (src))
 	{
-	  HOST_WIDE_INT len = (HOST_WIDE_INT) bytelen;
-
-	  if (len == ssize)
+	  if (must_eq (bytelen, ssize))
 	    tmps[i] = src;
 	  else
 	    {
 	      rtx first, second;
 
 	      /* TODO: const_wide_int can have sizes other than this...  */
-	      gcc_assert (2 * len == ssize);
+	      gcc_assert (must_eq (2 * bytelen, ssize));
 	      split_double (src, &first, &second);
 	      if (i)
 		tmps[i] = second;
@@ -2265,7 +2269,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
 				     bytepos * BITS_PER_UNIT, 1, NULL_RTX,
 				     mode, mode, false, NULL);
 
-      if (shift)
+      if (maybe_nonzero (shift))
 	tmps[i] = expand_shift (LSHIFT_EXPR, mode, tmps[i],
 				shift, tmps[i], 0);
     }
@@ -2277,7 +2281,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
    if not known.  */
 
 void
-emit_group_load (rtx dst, rtx src, tree type, int ssize)
+emit_group_load (rtx dst, rtx src, tree type, poly_int64 ssize)
 {
   rtx *tmps;
   int i;
@@ -2300,7 +2304,7 @@ emit_group_load (rtx dst, rtx src, tree
    in the right place.  */
 
 rtx
-emit_group_load_into_temps (rtx parallel, rtx src, tree type, int ssize)
+emit_group_load_into_temps (rtx parallel, rtx src, tree type, poly_int64 ssize)
 {
   rtvec vec;
   int i;
@@ -2371,7 +2375,8 @@ emit_group_move_into_temps (rtx src)
    known.  */
 
 void
-emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED, int ssize)
+emit_group_store (rtx orig_dst, rtx src, tree type ATTRIBUTE_UNUSED,
+		  poly_int64 ssize)
 {
   rtx *tmps, dst;
   int start, finish, i;
@@ -2502,24 +2507,28 @@ emit_group_store (rtx orig_dst, rtx src,
   /* Process the pieces.  */
   for (i = start; i < finish; i++)
     {
-      HOST_WIDE_INT bytepos = INTVAL (XEXP (XVECEXP (src, 0, i), 1));
+      poly_int64 bytepos = INTVAL (XEXP (XVECEXP (src, 0, i), 1));
       machine_mode mode = GET_MODE (tmps[i]);
-      unsigned int bytelen = GET_MODE_SIZE (mode);
-      unsigned int adj_bytelen;
+      poly_int64 bytelen = GET_MODE_SIZE (mode);
+      poly_uint64 adj_bytelen;
       rtx dest = dst;
 
-      /* Handle trailing fragments that run over the size of the struct.  */
-      if (ssize >= 0 && bytepos + (HOST_WIDE_INT) bytelen > ssize)
+      /* Handle trailing fragments that run over the size of the struct.
+	 It's the target's responsibility to make sure that the fragment
+	 cannot be strictly smaller in some cases and strictly larger
+	 in others.  */
+      gcc_checking_assert (ordered_p (bytepos + bytelen, ssize));
+      if (known_size_p (ssize) && may_gt (bytepos + bytelen, ssize))
 	adj_bytelen = ssize - bytepos;
       else
 	adj_bytelen = bytelen;
 
       if (GET_CODE (dst) == CONCAT)
 	{
-	  if (bytepos + adj_bytelen
-	      <= GET_MODE_SIZE (GET_MODE (XEXP (dst, 0))))
+	  if (must_le (bytepos + adj_bytelen,
+		       GET_MODE_SIZE (GET_MODE (XEXP (dst, 0)))))
 	    dest = XEXP (dst, 0);
-	  else if (bytepos >= GET_MODE_SIZE (GET_MODE (XEXP (dst, 0))))
+	  else if (must_ge (bytepos, GET_MODE_SIZE (GET_MODE (XEXP (dst, 0)))))
 	    {
 	      bytepos -= GET_MODE_SIZE (GET_MODE (XEXP (dst, 0)));
 	      dest = XEXP (dst, 1);
@@ -2529,7 +2538,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	      machine_mode dest_mode = GET_MODE (dest);
 	      machine_mode tmp_mode = GET_MODE (tmps[i]);
 
-	      gcc_assert (bytepos == 0 && XVECLEN (src, 0));
+	      gcc_assert (known_zero (bytepos) && XVECLEN (src, 0));
 
 	      if (GET_MODE_ALIGNMENT (dest_mode)
 		  >= GET_MODE_ALIGNMENT (tmp_mode))
@@ -2554,7 +2563,7 @@ emit_group_store (rtx orig_dst, rtx src,
 	}
 
       /* Handle trailing fragments that run over the size of the struct.  */
-      if (ssize >= 0 && bytepos + (HOST_WIDE_INT) bytelen > ssize)
+      if (known_size_p (ssize) && may_gt (bytepos + bytelen, ssize))
 	{
 	  /* store_bit_field always takes its value from the lsb.
 	     Move the fragment to the lsb if it's not already there.  */
@@ -2567,7 +2576,7 @@ emit_group_store (rtx orig_dst, rtx src,
 #endif
 	      )
 	    {
-	      int shift = (bytelen - (ssize - bytepos)) * BITS_PER_UNIT;
+	      poly_int64 shift = (bytelen - (ssize - bytepos)) * BITS_PER_UNIT;
 	      tmps[i] = expand_shift (RSHIFT_EXPR, mode, tmps[i],
 				      shift, tmps[i], 0);
 	    }
@@ -2583,8 +2592,9 @@ emit_group_store (rtx orig_dst, rtx src,
       else if (MEM_P (dest)
 	       && (!targetm.slow_unaligned_access (mode, MEM_ALIGN (dest))
 		   || MEM_ALIGN (dest) >= GET_MODE_ALIGNMENT (mode))
-	       && bytepos * BITS_PER_UNIT % GET_MODE_ALIGNMENT (mode) == 0
-	       && bytelen == GET_MODE_SIZE (mode))
+	       && multiple_p (bytepos * BITS_PER_UNIT,
+			      GET_MODE_ALIGNMENT (mode))
+	       && must_eq (bytelen, GET_MODE_SIZE (mode)))
 	emit_move_insn (adjust_address (dest, mode, bytepos), tmps[i]);
 
       else

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [054/nnn] poly_int: adjust_ptr_info_misalignment
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (53 preceding siblings ...)
  2017-10-23 17:23 ` [055/nnn] poly_int: find_bswap_or_nop_load Richard Sandiford
@ 2017-10-23 17:23 ` Richard Sandiford
  2017-11-28 16:53   ` Jeff Law
  2017-10-23 17:24 ` [058/nnn] poly_int: get_binfo_at_offset Richard Sandiford
                   ` (52 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:23 UTC (permalink / raw)
  To: gcc-patches

This patch makes adjust_ptr_info_misalignment take the adjustment
as a poly_uint64 rather than an unsigned int.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-ssanames.h (adjust_ptr_info_misalignment): Take the increment
	as a poly_uint64 rather than an unsigned int.
	* tree-ssanames.c (adjust_ptr_info_misalignment): Likewise.

Index: gcc/tree-ssanames.h
===================================================================
--- gcc/tree-ssanames.h	2017-10-23 17:22:13.147805567 +0100
+++ gcc/tree-ssanames.h	2017-10-23 17:22:15.674312500 +0100
@@ -89,8 +89,7 @@ extern bool get_ptr_info_alignment (stru
 extern void mark_ptr_info_alignment_unknown (struct ptr_info_def *);
 extern void set_ptr_info_alignment (struct ptr_info_def *, unsigned int,
 				    unsigned int);
-extern void adjust_ptr_info_misalignment (struct ptr_info_def *,
-					  unsigned int);
+extern void adjust_ptr_info_misalignment (struct ptr_info_def *, poly_uint64);
 extern struct ptr_info_def *get_ptr_info (tree);
 extern void set_ptr_nonnull (tree);
 extern bool get_ptr_nonnull (const_tree);
Index: gcc/tree-ssanames.c
===================================================================
--- gcc/tree-ssanames.c	2017-10-23 17:22:13.147805567 +0100
+++ gcc/tree-ssanames.c	2017-10-23 17:22:15.674312500 +0100
@@ -643,13 +643,16 @@ set_ptr_info_alignment (struct ptr_info_
    misalignment by INCREMENT modulo its current alignment.  */
 
 void
-adjust_ptr_info_misalignment (struct ptr_info_def *pi,
-			      unsigned int increment)
+adjust_ptr_info_misalignment (struct ptr_info_def *pi, poly_uint64 increment)
 {
   if (pi->align != 0)
     {
-      pi->misalign += increment;
-      pi->misalign &= (pi->align - 1);
+      increment += pi->misalign;
+      if (!known_misalignment (increment, pi->align, &pi->misalign))
+	{
+	  pi->align = known_alignment (increment);
+	  pi->misalign = 0;
+	}
     }
 }
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [055/nnn] poly_int: find_bswap_or_nop_load
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (52 preceding siblings ...)
  2017-10-23 17:22 ` [053/nnn] poly_int: decode_addr_const Richard Sandiford
@ 2017-10-23 17:23 ` Richard Sandiford
  2017-11-28 16:52   ` Jeff Law
  2017-10-23 17:23 ` [054/nnn] poly_int: adjust_ptr_info_misalignment Richard Sandiford
                   ` (53 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:23 UTC (permalink / raw)
  To: gcc-patches

This patch handles polynomial offsets in find_bswap_or_nop_load,
which could be useful for constant-sized data at a variable offset.
It is needed for a later patch to compile.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-ssa-math-opts.c (find_bswap_or_nop_load): Track polynomial
	offsets for MEM_REFs.

Index: gcc/tree-ssa-math-opts.c
===================================================================
--- gcc/tree-ssa-math-opts.c	2017-10-23 17:18:47.667056920 +0100
+++ gcc/tree-ssa-math-opts.c	2017-10-23 17:22:16.929564362 +0100
@@ -2122,35 +2122,31 @@ find_bswap_or_nop_load (gimple *stmt, tr
 
   if (TREE_CODE (base_addr) == MEM_REF)
     {
-      offset_int bit_offset = 0;
+      poly_offset_int bit_offset = 0;
       tree off = TREE_OPERAND (base_addr, 1);
 
       if (!integer_zerop (off))
 	{
-	  offset_int boff, coff = mem_ref_offset (base_addr);
-	  boff = coff << LOG2_BITS_PER_UNIT;
+	  poly_offset_int boff = mem_ref_offset (base_addr);
+	  boff <<= LOG2_BITS_PER_UNIT;
 	  bit_offset += boff;
 	}
 
       base_addr = TREE_OPERAND (base_addr, 0);
 
       /* Avoid returning a negative bitpos as this may wreak havoc later.  */
-      if (wi::neg_p (bit_offset))
+      if (may_lt (bit_offset, 0))
 	{
-	  offset_int mask = wi::mask <offset_int> (LOG2_BITS_PER_UNIT, false);
-	  offset_int tem = wi::bit_and_not (bit_offset, mask);
-	  /* TEM is the bitpos rounded to BITS_PER_UNIT towards -Inf.
-	     Subtract it to BIT_OFFSET and add it (scaled) to OFFSET.  */
-	  bit_offset -= tem;
-	  tem >>= LOG2_BITS_PER_UNIT;
+	  tree byte_offset = wide_int_to_tree
+	    (sizetype, bits_to_bytes_round_down (bit_offset));
+	  bit_offset = num_trailing_bits (bit_offset);
 	  if (offset)
-	    offset = size_binop (PLUS_EXPR, offset,
-				    wide_int_to_tree (sizetype, tem));
+	    offset = size_binop (PLUS_EXPR, offset, byte_offset);
 	  else
-	    offset = wide_int_to_tree (sizetype, tem);
+	    offset = byte_offset;
 	}
 
-      bitpos += bit_offset.to_shwi ();
+      bitpos += bit_offset.force_shwi ();
     }
 
   if (!multiple_p (bitpos, BITS_PER_UNIT, &bytepos))

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [058/nnn] poly_int: get_binfo_at_offset
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (54 preceding siblings ...)
  2017-10-23 17:23 ` [054/nnn] poly_int: adjust_ptr_info_misalignment Richard Sandiford
@ 2017-10-23 17:24 ` Richard Sandiford
  2017-11-28 16:50   ` Jeff Law
  2017-10-23 17:24 ` [056/nnn] poly_int: MEM_REF offsets Richard Sandiford
                   ` (51 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:24 UTC (permalink / raw)
  To: gcc-patches

This patch changes the offset parameter to get_binfo_at_offset
from HOST_WIDE_INT to poly_int64.  This function probably doesn't
need to handle polynomial offsets in practice, but it's easy
to do and avoids forcing the caller to check first.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree.h (get_binfo_at_offset): Take the offset as a poly_int64
	rather than a HOST_WIDE_INT.
	* tree.c (get_binfo_at_offset): Likewise.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 17:20:50.884679814 +0100
+++ gcc/tree.h	2017-10-23 17:22:21.308442966 +0100
@@ -4836,7 +4836,7 @@ extern void tree_set_block (tree, tree);
 extern location_t *block_nonartificial_location (tree);
 extern location_t tree_nonartificial_location (tree);
 extern tree block_ultimate_origin (const_tree);
-extern tree get_binfo_at_offset (tree, HOST_WIDE_INT, tree);
+extern tree get_binfo_at_offset (tree, poly_int64, tree);
 extern bool virtual_method_call_p (const_tree);
 extern tree obj_type_ref_class (const_tree ref);
 extern bool types_same_for_odr (const_tree type1, const_tree type2,
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 17:22:18.236826658 +0100
+++ gcc/tree.c	2017-10-23 17:22:21.307442765 +0100
@@ -12328,7 +12328,7 @@ lookup_binfo_at_offset (tree binfo, tree
    found, return, otherwise return NULL_TREE.  */
 
 tree
-get_binfo_at_offset (tree binfo, HOST_WIDE_INT offset, tree expected_type)
+get_binfo_at_offset (tree binfo, poly_int64 offset, tree expected_type)
 {
   tree type = BINFO_TYPE (binfo);
 
@@ -12340,7 +12340,7 @@ get_binfo_at_offset (tree binfo, HOST_WI
 
       if (types_same_for_odr (type, expected_type))
 	  return binfo;
-      if (offset < 0)
+      if (may_lt (offset, 0))
 	return NULL_TREE;
 
       for (fld = TYPE_FIELDS (type); fld; fld = DECL_CHAIN (fld))
@@ -12350,7 +12350,7 @@ get_binfo_at_offset (tree binfo, HOST_WI
 
 	  pos = int_bit_position (fld);
 	  size = tree_to_uhwi (DECL_SIZE (fld));
-	  if (pos <= offset && (pos + size) > offset)
+	  if (known_in_range_p (offset, pos, size))
 	    break;
 	}
       if (!fld || TREE_CODE (TREE_TYPE (fld)) != RECORD_TYPE)
@@ -12358,7 +12358,7 @@ get_binfo_at_offset (tree binfo, HOST_WI
 
       /* Offset 0 indicates the primary base, whose vtable contents are
 	 represented in the binfo for the derived class.  */
-      else if (offset != 0)
+      else if (maybe_nonzero (offset))
 	{
 	  tree found_binfo = NULL, base_binfo;
 	  /* Offsets in BINFO are in bytes relative to the whole structure

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [057/nnn] poly_int: build_ref_for_offset
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (56 preceding siblings ...)
  2017-10-23 17:24 ` [056/nnn] poly_int: MEM_REF offsets Richard Sandiford
@ 2017-10-23 17:24 ` Richard Sandiford
  2017-11-28 16:51   ` Jeff Law
  2017-10-23 17:25 ` [059/nnn] poly_int: tree-ssa-loop-ivopts.c:iv_use Richard Sandiford
                   ` (49 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:24 UTC (permalink / raw)
  To: gcc-patches

This patch changes the offset parameter to build_ref_for_offset
from HOST_WIDE_INT to poly_int64.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* ipa-prop.h (build_ref_for_offset): Take the offset as a poly_int64
	rather than a HOST_WIDE_INT.
	* tree-sra.c (build_ref_for_offset): Likewise.

Index: gcc/ipa-prop.h
===================================================================
--- gcc/ipa-prop.h	2017-10-23 17:16:58.508429306 +0100
+++ gcc/ipa-prop.h	2017-10-23 17:22:20.152210973 +0100
@@ -878,7 +878,7 @@ void ipa_release_body_info (struct ipa_f
 tree ipa_get_callee_param_type (struct cgraph_edge *e, int i);
 
 /* From tree-sra.c:  */
-tree build_ref_for_offset (location_t, tree, HOST_WIDE_INT, bool, tree,
+tree build_ref_for_offset (location_t, tree, poly_int64, bool, tree,
 			   gimple_stmt_iterator *, bool);
 
 /* In ipa-cp.c  */
Index: gcc/tree-sra.c
===================================================================
--- gcc/tree-sra.c	2017-10-23 17:18:47.667056920 +0100
+++ gcc/tree-sra.c	2017-10-23 17:22:20.153211173 +0100
@@ -1671,7 +1671,7 @@ make_fancy_name (tree expr)
    of handling bitfields.  */
 
 tree
-build_ref_for_offset (location_t loc, tree base, HOST_WIDE_INT offset,
+build_ref_for_offset (location_t loc, tree base, poly_int64 offset,
 		      bool reverse, tree exp_type, gimple_stmt_iterator *gsi,
 		      bool insert_after)
 {
@@ -1689,7 +1689,7 @@ build_ref_for_offset (location_t loc, tr
 				     TYPE_QUALS (exp_type)
 				     | ENCODE_QUAL_ADDR_SPACE (as));
 
-  gcc_checking_assert (offset % BITS_PER_UNIT == 0);
+  poly_int64 byte_offset = exact_div (offset, BITS_PER_UNIT);
   get_object_alignment_1 (base, &align, &misalign);
   base = get_addr_base_and_unit_offset (base, &base_offset);
 
@@ -1711,27 +1711,26 @@ build_ref_for_offset (location_t loc, tr
       else
 	gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 
-      off = build_int_cst (reference_alias_ptr_type (prev_base),
-			   offset / BITS_PER_UNIT);
+      off = build_int_cst (reference_alias_ptr_type (prev_base), byte_offset);
       base = tmp;
     }
   else if (TREE_CODE (base) == MEM_REF)
     {
       off = build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)),
-			   base_offset + offset / BITS_PER_UNIT);
+			   base_offset + byte_offset);
       off = int_const_binop (PLUS_EXPR, TREE_OPERAND (base, 1), off);
       base = unshare_expr (TREE_OPERAND (base, 0));
     }
   else
     {
       off = build_int_cst (reference_alias_ptr_type (prev_base),
-			   base_offset + offset / BITS_PER_UNIT);
+			   base_offset + byte_offset);
       base = build_fold_addr_expr (unshare_expr (base));
     }
 
-  misalign = (misalign + offset) & (align - 1);
-  if (misalign != 0)
-    align = least_bit_hwi (misalign);
+  unsigned int align_bound = known_alignment (misalign + offset);
+  if (align_bound != 0)
+    align = MIN (align, align_bound);
   if (align != TYPE_ALIGN (exp_type))
     exp_type = build_aligned_type (exp_type, align);
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [056/nnn] poly_int: MEM_REF offsets
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (55 preceding siblings ...)
  2017-10-23 17:24 ` [058/nnn] poly_int: get_binfo_at_offset Richard Sandiford
@ 2017-10-23 17:24 ` Richard Sandiford
  2017-12-06  0:46   ` Jeff Law
  2017-10-23 17:24 ` [057/nnn] poly_int: build_ref_for_offset Richard Sandiford
                   ` (50 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:24 UTC (permalink / raw)
  To: gcc-patches

This patch allows MEM_REF offsets to be polynomial, with mem_ref_offset
now returning a poly_offset_int instead of an offset_int.  The
non-mechanical changes to callers of mem_ref_offset were handled by
previous patches.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* fold-const.h (mem_ref_offset): Return a poly_offset_int rather
	than an offset_int.
	* tree.c (mem_ref_offset): Likewise.
	* builtins.c (get_object_alignment_2): Treat MEM_REF offsets as
	poly_ints.
	* expr.c (get_inner_reference, expand_expr_real_1): Likewise.
	* gimple-fold.c (get_base_constructor): Likewise.
	* gimple-ssa-strength-reduction.c (restructure_reference): Likewise.
	* ipa-polymorphic-call.c
	(ipa_polymorphic_call_context::ipa_polymorphic_call_context): Likewise.
	* ipa-prop.c (compute_complex_assign_jump_func, get_ancestor_addr_info)
	(ipa_get_adjustment_candidate): Likewise.
	* match.pd: Likewise.
	* tree-data-ref.c (dr_analyze_innermost): Likewise.
	* tree-dfa.c (get_addr_base_and_unit_offset_1): Likewise.
	* tree-eh.c (tree_could_trap_p): Likewise.
	* tree-object-size.c (addr_object_size): Likewise.
	* tree-ssa-address.c (copy_ref_info): Likewise.
	* tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Likewise.
	(indirect_refs_may_alias_p): Likewise.
	* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
	* tree-ssa.c (maybe_rewrite_mem_ref_base): Likewise.
	(non_rewritable_mem_ref_base): Likewise.
	* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
	* tree-vrp.c (search_for_addr_array): Likewise.
	* varasm.c (decode_addr_const): Likewise.

Index: gcc/fold-const.h
===================================================================
--- gcc/fold-const.h	2017-10-23 17:18:47.662057360 +0100
+++ gcc/fold-const.h	2017-10-23 17:22:18.228825053 +0100
@@ -114,7 +114,7 @@ extern tree fold_indirect_ref_loc (locat
 extern tree build_simple_mem_ref_loc (location_t, tree);
 #define build_simple_mem_ref(T)\
 	build_simple_mem_ref_loc (UNKNOWN_LOCATION, T)
-extern offset_int mem_ref_offset (const_tree);
+extern poly_offset_int mem_ref_offset (const_tree);
 extern tree build_invariant_address (tree, tree, poly_int64);
 extern tree constant_boolean_node (bool, tree);
 extern tree div_if_zero_remainder (const_tree, const_tree);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 17:17:01.436033953 +0100
+++ gcc/tree.c	2017-10-23 17:22:18.236826658 +0100
@@ -4925,10 +4925,11 @@ build_simple_mem_ref_loc (location_t loc
 
 /* Return the constant offset of a MEM_REF or TARGET_MEM_REF tree T.  */
 
-offset_int
+poly_offset_int
 mem_ref_offset (const_tree t)
 {
-  return offset_int::from (wi::to_wide (TREE_OPERAND (t, 1)), SIGNED);
+  return poly_offset_int::from (wi::to_poly_wide (TREE_OPERAND (t, 1)),
+				SIGNED);
 }
 
 /* Return an invariant ADDR_EXPR of type TYPE taking the address of BASE
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	2017-10-23 17:18:57.855161317 +0100
+++ gcc/builtins.c	2017-10-23 17:22:18.226824652 +0100
@@ -350,7 +350,7 @@ get_object_alignment_2 (tree exp, unsign
 	  bitpos += ptr_bitpos;
 	  if (TREE_CODE (exp) == MEM_REF
 	      || TREE_CODE (exp) == TARGET_MEM_REF)
-	    bitpos += mem_ref_offset (exp).to_short_addr () * BITS_PER_UNIT;
+	    bitpos += mem_ref_offset (exp).force_shwi () * BITS_PER_UNIT;
 	}
     }
   else if (TREE_CODE (exp) == STRING_CST)
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:20:49.571719793 +0100
+++ gcc/expr.c	2017-10-23 17:22:18.228825053 +0100
@@ -7165,8 +7165,8 @@ get_inner_reference (tree exp, poly_int6
 	      tree off = TREE_OPERAND (exp, 1);
 	      if (!integer_zerop (off))
 		{
-		  offset_int boff, coff = mem_ref_offset (exp);
-		  boff = coff << LOG2_BITS_PER_UNIT;
+		  poly_offset_int boff = mem_ref_offset (exp);
+		  boff <<= LOG2_BITS_PER_UNIT;
 		  bit_offset += boff;
 		}
 	      exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0);
@@ -10255,9 +10255,9 @@ expand_expr_real_1 (tree exp, rtx target
 	   might end up in a register.  */
 	if (mem_ref_refers_to_non_mem_p (exp))
 	  {
-	    HOST_WIDE_INT offset = mem_ref_offset (exp).to_short_addr ();
+	    poly_int64 offset = mem_ref_offset (exp).force_shwi ();
 	    base = TREE_OPERAND (base, 0);
-	    if (offset == 0
+	    if (known_zero (offset)
 	        && !reverse
 		&& tree_fits_uhwi_p (TYPE_SIZE (type))
 		&& (GET_MODE_BITSIZE (DECL_MODE (base))
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	2017-10-23 17:17:01.430034763 +0100
+++ gcc/gimple-fold.c	2017-10-23 17:22:18.228825053 +0100
@@ -6176,7 +6176,7 @@ get_base_constructor (tree base, poly_in
 	{
 	  if (!tree_fits_shwi_p (TREE_OPERAND (base, 1)))
 	    return NULL_TREE;
-	  *bit_offset += (mem_ref_offset (base).to_short_addr ()
+	  *bit_offset += (mem_ref_offset (base).force_shwi ()
 			  * BITS_PER_UNIT);
 	}
 
Index: gcc/gimple-ssa-strength-reduction.c
===================================================================
--- gcc/gimple-ssa-strength-reduction.c	2017-10-23 17:18:47.663057272 +0100
+++ gcc/gimple-ssa-strength-reduction.c	2017-10-23 17:22:18.229825254 +0100
@@ -970,17 +970,19 @@ restructure_reference (tree *pbase, tree
   widest_int index = *pindex;
   tree mult_op0, t1, t2, type;
   widest_int c1, c2, c3, c4, c5;
+  offset_int mem_offset;
 
   if (!base
       || !offset
       || TREE_CODE (base) != MEM_REF
+      || !mem_ref_offset (base).is_constant (&mem_offset)
       || TREE_CODE (offset) != MULT_EXPR
       || TREE_CODE (TREE_OPERAND (offset, 1)) != INTEGER_CST
       || wi::umod_floor (index, BITS_PER_UNIT) != 0)
     return false;
 
   t1 = TREE_OPERAND (base, 0);
-  c1 = widest_int::from (mem_ref_offset (base), SIGNED);
+  c1 = widest_int::from (mem_offset, SIGNED);
   type = TREE_TYPE (TREE_OPERAND (base, 1));
 
   mult_op0 = TREE_OPERAND (offset, 0);
Index: gcc/ipa-polymorphic-call.c
===================================================================
--- gcc/ipa-polymorphic-call.c	2017-10-23 17:16:59.704267816 +0100
+++ gcc/ipa-polymorphic-call.c	2017-10-23 17:22:18.229825254 +0100
@@ -917,9 +917,11 @@ ipa_polymorphic_call_context::ipa_polymo
 	    {
 	      /* We found dereference of a pointer.  Type of the pointer
 		 and MEM_REF is meaningless, but we can look futher.  */
-	      if (TREE_CODE (base) == MEM_REF)
+	      offset_int mem_offset;
+	      if (TREE_CODE (base) == MEM_REF
+		  && mem_ref_offset (base).is_constant (&mem_offset))
 		{
-		  offset_int o = mem_ref_offset (base) * BITS_PER_UNIT;
+		  offset_int o = mem_offset * BITS_PER_UNIT;
 		  o += offset;
 		  o += offset2;
 		  if (!wi::fits_shwi_p (o))
Index: gcc/ipa-prop.c
===================================================================
--- gcc/ipa-prop.c	2017-10-23 17:17:01.431034628 +0100
+++ gcc/ipa-prop.c	2017-10-23 17:22:18.230825454 +0100
@@ -1267,9 +1267,12 @@ compute_complex_assign_jump_func (struct
   if (TREE_CODE (TREE_TYPE (op1)) != RECORD_TYPE)
     return;
   base = get_ref_base_and_extent_hwi (op1, &offset, &size, &reverse);
-  if (!base || TREE_CODE (base) != MEM_REF)
+  offset_int mem_offset;
+  if (!base
+      || TREE_CODE (base) != MEM_REF
+      || !mem_ref_offset (base).is_constant (&mem_offset))
     return;
-  offset += mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT;
+  offset += mem_offset.to_short_addr () * BITS_PER_UNIT;
   ssa = TREE_OPERAND (base, 0);
   if (TREE_CODE (ssa) != SSA_NAME
       || !SSA_NAME_IS_DEFAULT_DEF (ssa)
@@ -1311,7 +1314,10 @@ get_ancestor_addr_info (gimple *assign,
   obj = expr;
   expr = get_ref_base_and_extent_hwi (expr, offset, &size, &reverse);
 
-  if (!expr || TREE_CODE (expr) != MEM_REF)
+  offset_int mem_offset;
+  if (!expr
+      || TREE_CODE (expr) != MEM_REF
+      || !mem_ref_offset (expr).is_constant (&mem_offset))
     return NULL_TREE;
   parm = TREE_OPERAND (expr, 0);
   if (TREE_CODE (parm) != SSA_NAME
@@ -1319,7 +1325,7 @@ get_ancestor_addr_info (gimple *assign,
       || TREE_CODE (SSA_NAME_VAR (parm)) != PARM_DECL)
     return NULL_TREE;
 
-  *offset += mem_ref_offset (expr).to_short_addr () * BITS_PER_UNIT;
+  *offset += mem_offset.to_short_addr () * BITS_PER_UNIT;
   *obj_p = obj;
   return expr;
 }
@@ -4568,7 +4574,7 @@ ipa_get_adjustment_candidate (tree **exp
 
   if (TREE_CODE (base) == MEM_REF)
     {
-      offset += mem_ref_offset (base).to_short_addr () * BITS_PER_UNIT;
+      offset += mem_ref_offset (base).force_shwi () * BITS_PER_UNIT;
       base = TREE_OPERAND (base, 0);
     }
 
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2017-10-23 17:18:47.664057184 +0100
+++ gcc/match.pd	2017-10-23 17:22:18.230825454 +0100
@@ -3350,12 +3350,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
      tree base1 = get_addr_base_and_unit_offset (TREE_OPERAND (@1, 0), &off1);
      if (base0 && TREE_CODE (base0) == MEM_REF)
        {
-	 off0 += mem_ref_offset (base0).to_short_addr ();
+	 off0 += mem_ref_offset (base0).force_shwi ();
          base0 = TREE_OPERAND (base0, 0);
        }
      if (base1 && TREE_CODE (base1) == MEM_REF)
        {
-         off1 += mem_ref_offset (base1).to_short_addr ();
+	 off1 += mem_ref_offset (base1).force_shwi ();
          base1 = TREE_OPERAND (base1, 0);
        }
    }
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2017-10-23 17:18:47.666057008 +0100
+++ gcc/tree-data-ref.c	2017-10-23 17:22:18.231825655 +0100
@@ -820,16 +820,16 @@ dr_analyze_innermost (innermost_loop_beh
     }
 
   /* Calculate the alignment and misalignment for the inner reference.  */
-  unsigned int HOST_WIDE_INT base_misalignment;
-  unsigned int base_alignment;
-  get_object_alignment_1 (base, &base_alignment, &base_misalignment);
+  unsigned int HOST_WIDE_INT bit_base_misalignment;
+  unsigned int bit_base_alignment;
+  get_object_alignment_1 (base, &bit_base_alignment, &bit_base_misalignment);
 
   /* There are no bitfield references remaining in BASE, so the values
      we got back must be whole bytes.  */
-  gcc_assert (base_alignment % BITS_PER_UNIT == 0
-	      && base_misalignment % BITS_PER_UNIT == 0);
-  base_alignment /= BITS_PER_UNIT;
-  base_misalignment /= BITS_PER_UNIT;
+  gcc_assert (bit_base_alignment % BITS_PER_UNIT == 0
+	      && bit_base_misalignment % BITS_PER_UNIT == 0);
+  unsigned int base_alignment = bit_base_alignment / BITS_PER_UNIT;
+  poly_int64 base_misalignment = bit_base_misalignment / BITS_PER_UNIT;
 
   if (TREE_CODE (base) == MEM_REF)
     {
@@ -837,8 +837,8 @@ dr_analyze_innermost (innermost_loop_beh
 	{
 	  /* Subtract MOFF from the base and add it to POFFSET instead.
 	     Adjust the misalignment to reflect the amount we subtracted.  */
-	  offset_int moff = mem_ref_offset (base);
-	  base_misalignment -= moff.to_short_addr ();
+	  poly_offset_int moff = mem_ref_offset (base);
+	  base_misalignment -= moff.force_shwi ();
 	  tree mofft = wide_int_to_tree (sizetype, moff);
 	  if (!poffset)
 	    poffset = mofft;
@@ -925,8 +925,14 @@ dr_analyze_innermost (innermost_loop_beh
   drb->offset = fold_convert (ssizetype, offset_iv.base);
   drb->init = init;
   drb->step = step;
-  drb->base_alignment = base_alignment;
-  drb->base_misalignment = base_misalignment & (base_alignment - 1);
+  if (known_misalignment (base_misalignment, base_alignment,
+			  &drb->base_misalignment))
+    drb->base_alignment = base_alignment;
+  else
+    {
+      drb->base_alignment = known_alignment (base_misalignment);
+      drb->base_misalignment = 0;
+    }
   drb->offset_alignment = highest_pow2_factor (offset_iv.base);
   drb->step_alignment = highest_pow2_factor (step);
 
Index: gcc/tree-dfa.c
===================================================================
--- gcc/tree-dfa.c	2017-10-23 17:17:01.432034493 +0100
+++ gcc/tree-dfa.c	2017-10-23 17:22:18.231825655 +0100
@@ -797,8 +797,8 @@ get_addr_base_and_unit_offset_1 (tree ex
 	      {
 		if (!integer_zerop (TREE_OPERAND (exp, 1)))
 		  {
-		    offset_int off = mem_ref_offset (exp);
-		    byte_offset += off.to_short_addr ();
+		    poly_offset_int off = mem_ref_offset (exp);
+		    byte_offset += off.force_shwi ();
 		  }
 		exp = TREE_OPERAND (base, 0);
 	      }
@@ -819,8 +819,8 @@ get_addr_base_and_unit_offset_1 (tree ex
 		  return NULL_TREE;
 		if (!integer_zerop (TMR_OFFSET (exp)))
 		  {
-		    offset_int off = mem_ref_offset (exp);
-		    byte_offset += off.to_short_addr ();
+		    poly_offset_int off = mem_ref_offset (exp);
+		    byte_offset += off.force_shwi ();
 		  }
 		exp = TREE_OPERAND (base, 0);
 	      }
Index: gcc/tree-eh.c
===================================================================
--- gcc/tree-eh.c	2017-10-23 16:52:17.994460559 +0100
+++ gcc/tree-eh.c	2017-10-23 17:22:18.231825655 +0100
@@ -2658,14 +2658,15 @@ tree_could_trap_p (tree expr)
       if (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR)
 	{
 	  tree base = TREE_OPERAND (TREE_OPERAND (expr, 0), 0);
-	  offset_int off = mem_ref_offset (expr);
-	  if (wi::neg_p (off, SIGNED))
+	  poly_offset_int off = mem_ref_offset (expr);
+	  if (may_lt (off, 0))
 	    return true;
 	  if (TREE_CODE (base) == STRING_CST)
-	    return wi::leu_p (TREE_STRING_LENGTH (base), off);
-	  else if (DECL_SIZE_UNIT (base) == NULL_TREE
-		   || TREE_CODE (DECL_SIZE_UNIT (base)) != INTEGER_CST
-		   || wi::leu_p (wi::to_offset (DECL_SIZE_UNIT (base)), off))
+	    return may_le (TREE_STRING_LENGTH (base), off);
+	  tree size = DECL_SIZE_UNIT (base);
+	  if (size == NULL_TREE
+	      || !poly_int_tree_p (size)
+	      || may_le (wi::to_poly_offset (size), off))
 	    return true;
 	  /* Now we are sure the first byte of the access is inside
 	     the object.  */
Index: gcc/tree-object-size.c
===================================================================
--- gcc/tree-object-size.c	2017-10-23 16:52:17.994460559 +0100
+++ gcc/tree-object-size.c	2017-10-23 17:22:18.232825856 +0100
@@ -210,11 +210,17 @@ addr_object_size (struct object_size_inf
 	}
       if (sz != unknown[object_size_type])
 	{
-	  offset_int dsz = wi::sub (sz, mem_ref_offset (pt_var));
-	  if (wi::neg_p (dsz))
-	    sz = 0;
-	  else if (wi::fits_uhwi_p (dsz))
-	    sz = dsz.to_uhwi ();
+	  offset_int mem_offset;
+	  if (mem_ref_offset (pt_var).is_constant (&mem_offset))
+	    {
+	      offset_int dsz = wi::sub (sz, mem_offset);
+	      if (wi::neg_p (dsz))
+		sz = 0;
+	      else if (wi::fits_uhwi_p (dsz))
+		sz = dsz.to_uhwi ();
+	      else
+		sz = unknown[object_size_type];
+	    }
 	  else
 	    sz = unknown[object_size_type];
 	}
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	2017-10-23 17:17:03.207794688 +0100
+++ gcc/tree-ssa-address.c	2017-10-23 17:22:18.232825856 +0100
@@ -1008,8 +1008,8 @@ copy_ref_info (tree new_ref, tree old_re
 			   && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
 			       < align)))))
 	    {
-	      unsigned int inc = (mem_ref_offset (old_ref).to_short_addr ()
-				  - mem_ref_offset (new_ref).to_short_addr ());
+	      poly_uint64 inc = (mem_ref_offset (old_ref)
+				 - mem_ref_offset (new_ref)).force_uhwi ();
 	      adjust_ptr_info_misalignment (new_pi, inc);
 	    }
 	  else
Index: gcc/tree-ssa-alias.c
===================================================================
--- gcc/tree-ssa-alias.c	2017-10-23 17:17:01.433034358 +0100
+++ gcc/tree-ssa-alias.c	2017-10-23 17:22:18.232825856 +0100
@@ -1143,7 +1143,7 @@ indirect_ref_may_alias_decl_p (tree ref1
 		       && DECL_P (base2));
 
   ptr1 = TREE_OPERAND (base1, 0);
-  offset_int moff = mem_ref_offset (base1) << LOG2_BITS_PER_UNIT;
+  poly_offset_int moff = mem_ref_offset (base1) << LOG2_BITS_PER_UNIT;
 
   /* If only one reference is based on a variable, they cannot alias if
      the pointer access is beyond the extent of the variable access.
@@ -1299,8 +1299,8 @@ indirect_refs_may_alias_p (tree ref1 ATT
 		      && operand_equal_p (TMR_INDEX2 (base1),
 					  TMR_INDEX2 (base2), 0))))))
     {
-      offset_int moff1 = mem_ref_offset (base1) << LOG2_BITS_PER_UNIT;
-      offset_int moff2 = mem_ref_offset (base2) << LOG2_BITS_PER_UNIT;
+      poly_offset_int moff1 = mem_ref_offset (base1) << LOG2_BITS_PER_UNIT;
+      poly_offset_int moff2 = mem_ref_offset (base2) << LOG2_BITS_PER_UNIT;
       return ranges_may_overlap_p (offset1 + moff1, max_size1,
 				   offset2 + moff2, max_size2);
     }
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	2017-10-23 17:20:50.884679814 +0100
+++ gcc/tree-ssa-sccvn.c	2017-10-23 17:22:18.233826056 +0100
@@ -753,11 +753,8 @@ copy_reference_ops_from_ref (tree ref, v
 	case MEM_REF:
 	  /* The base address gets its own vn_reference_op_s structure.  */
 	  temp.op0 = TREE_OPERAND (ref, 1);
-	    {
-	      offset_int off = mem_ref_offset (ref);
-	      if (wi::fits_shwi_p (off))
-		temp.off = off.to_shwi ();
-	    }
+	  if (!mem_ref_offset (ref).to_shwi (&temp.off))
+	    temp.off = -1;
 	  temp.clique = MR_DEPENDENCE_CLIQUE (ref);
 	  temp.base = MR_DEPENDENCE_BASE (ref);
 	  temp.reverse = REF_REVERSE_STORAGE_ORDER (ref);
Index: gcc/tree-ssa.c
===================================================================
--- gcc/tree-ssa.c	2017-10-23 16:52:17.994460559 +0100
+++ gcc/tree-ssa.c	2017-10-23 17:22:18.233826056 +0100
@@ -1379,10 +1379,10 @@ maybe_rewrite_mem_ref_base (tree *tp, bi
 	}
       else if (DECL_SIZE (sym)
 	       && TREE_CODE (DECL_SIZE (sym)) == INTEGER_CST
-	       && mem_ref_offset (*tp) >= 0
-	       && wi::leu_p (mem_ref_offset (*tp)
-			     + wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (*tp))),
-			     wi::to_offset (DECL_SIZE_UNIT (sym)))
+	       && (known_subrange_p
+		   (mem_ref_offset (*tp),
+		    wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (*tp))),
+		    0, wi::to_offset (DECL_SIZE_UNIT (sym))))
 	       && (! INTEGRAL_TYPE_P (TREE_TYPE (*tp)) 
 		   || (wi::to_offset (TYPE_SIZE (TREE_TYPE (*tp)))
 		       == TYPE_PRECISION (TREE_TYPE (*tp))))
@@ -1433,9 +1433,8 @@ non_rewritable_mem_ref_base (tree ref)
 	   || TREE_CODE (TREE_TYPE (decl)) == COMPLEX_TYPE)
 	  && useless_type_conversion_p (TREE_TYPE (base),
 					TREE_TYPE (TREE_TYPE (decl)))
-	  && wi::fits_uhwi_p (mem_ref_offset (base))
-	  && wi::gtu_p (wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))),
-			mem_ref_offset (base))
+	  && must_gt (wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (decl))),
+		      mem_ref_offset (base))
 	  && multiple_of_p (sizetype, TREE_OPERAND (base, 1),
 			    TYPE_SIZE_UNIT (TREE_TYPE (base))))
 	return NULL_TREE;
@@ -1445,11 +1444,10 @@ non_rewritable_mem_ref_base (tree ref)
 	return NULL_TREE;
       /* For integral typed extracts we can use a BIT_FIELD_REF.  */
       if (DECL_SIZE (decl)
-	  && TREE_CODE (DECL_SIZE (decl)) == INTEGER_CST
-	  && mem_ref_offset (base) >= 0
-	  && wi::leu_p (mem_ref_offset (base)
-			+ wi::to_offset (TYPE_SIZE_UNIT (TREE_TYPE (base))),
-			wi::to_offset (DECL_SIZE_UNIT (decl)))
+	  && (known_subrange_p
+	      (mem_ref_offset (base),
+	       wi::to_poly_offset (TYPE_SIZE_UNIT (TREE_TYPE (base))),
+	       0, wi::to_poly_offset (DECL_SIZE_UNIT (decl))))
 	  /* ???  We can't handle bitfield precision extracts without
 	     either using an alternate type for the BIT_FIELD_REF and
 	     then doing a conversion or possibly adjusting the offset
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 17:18:47.668056833 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:22:18.234826257 +0100
@@ -3265,10 +3265,7 @@ vect_check_gather_scatter (gimple *stmt,
       if (!integer_zerop (TREE_OPERAND (base, 1)))
 	{
 	  if (off == NULL_TREE)
-	    {
-	      offset_int moff = mem_ref_offset (base);
-	      off = wide_int_to_tree (sizetype, moff);
-	    }
+	    off = wide_int_to_tree (sizetype, mem_ref_offset (base));
 	  else
 	    off = size_binop (PLUS_EXPR, off,
 			      fold_convert (sizetype, TREE_OPERAND (base, 1)));
Index: gcc/tree-vrp.c
===================================================================
--- gcc/tree-vrp.c	2017-10-23 17:11:40.251958611 +0100
+++ gcc/tree-vrp.c	2017-10-23 17:22:18.235826458 +0100
@@ -6808,7 +6808,9 @@ search_for_addr_array (tree t, location_
 	  || TREE_CODE (el_sz) != INTEGER_CST)
 	return;
 
-      idx = mem_ref_offset (t);
+      if (!mem_ref_offset (t).is_constant (&idx))
+	return;
+
       idx = wi::sdiv_trunc (idx, wi::to_offset (el_sz));
       if (idx < 0)
 	{
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	2017-10-23 17:20:52.530629696 +0100
+++ gcc/varasm.c	2017-10-23 17:22:18.236826658 +0100
@@ -2903,7 +2903,7 @@ decode_addr_const (tree exp, struct addr
       else if (TREE_CODE (target) == MEM_REF
 	       && TREE_CODE (TREE_OPERAND (target, 0)) == ADDR_EXPR)
 	{
-	  offset += mem_ref_offset (target).to_short_addr ();
+	  offset += mem_ref_offset (target).force_shwi ();
 	  target = TREE_OPERAND (TREE_OPERAND (target, 0), 0);
 	}
       else if (TREE_CODE (target) == INDIRECT_REF

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [059/nnn] poly_int: tree-ssa-loop-ivopts.c:iv_use
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (57 preceding siblings ...)
  2017-10-23 17:24 ` [057/nnn] poly_int: build_ref_for_offset Richard Sandiford
@ 2017-10-23 17:25 ` Richard Sandiford
  2017-12-05 17:26   ` Jeff Law
  2017-10-23 17:25 ` [061/nnn] poly_int: compute_data_ref_alignment Richard Sandiford
                   ` (48 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:25 UTC (permalink / raw)
  To: gcc-patches

This patch makes ivopts handle polynomial address offsets
when recording potential IV uses.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-ssa-loop-ivopts.c (iv_use::addr_offset): Change from
	an unsigned HOST_WIDE_INT to a poly_uint64_pod.
	(group_compare_offset): Update accordingly.
	(split_small_address_groups_p): Likewise.
	(record_use): Take addr_offset as a poly_uint64 rather than
	an unsigned HOST_WIDE_INT.
	(strip_offset): Return the offset as a poly_uint64 rather than
	an unsigned HOST_WIDE_INT.
	(record_group_use, split_address_groups): Track polynomial offsets.
	(add_iv_candidate_for_use): Likewise.
	(addr_offset_valid_p): Take the offset as a poly_int64 rather
	than a HOST_WIDE_INT.
	(strip_offset_1): Return the offset as a poly_int64 rather than
	a HOST_WIDE_INT.

Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	2017-10-23 17:17:03.208794553 +0100
+++ gcc/tree-ssa-loop-ivopts.c	2017-10-23 17:22:22.298641645 +0100
@@ -367,7 +367,7 @@ struct iv_use
   tree *op_p;		/* The place where it occurs.  */
 
   tree addr_base;	/* Base address with const offset stripped.  */
-  unsigned HOST_WIDE_INT addr_offset;
+  poly_uint64_pod addr_offset;
 			/* Const offset stripped from base address.  */
 };
 
@@ -1508,7 +1508,7 @@ find_induction_variables (struct ivopts_
 static struct iv_use *
 record_use (struct iv_group *group, tree *use_p, struct iv *iv,
 	    gimple *stmt, enum use_type type, tree addr_base,
-	    unsigned HOST_WIDE_INT addr_offset)
+	    poly_uint64 addr_offset)
 {
   struct iv_use *use = XCNEW (struct iv_use);
 
@@ -1553,7 +1553,7 @@ record_invariant (struct ivopts_data *da
 }
 
 static tree
-strip_offset (tree expr, unsigned HOST_WIDE_INT *offset);
+strip_offset (tree expr, poly_uint64 *offset);
 
 /* Record a group of TYPE.  */
 
@@ -1580,7 +1580,7 @@ record_group_use (struct ivopts_data *da
 {
   tree addr_base = NULL;
   struct iv_group *group = NULL;
-  unsigned HOST_WIDE_INT addr_offset = 0;
+  poly_uint64 addr_offset = 0;
 
   /* Record non address type use in a new group.  */
   if (type == USE_ADDRESS && iv->base_object)
@@ -2514,7 +2514,7 @@ find_interesting_uses_outside (struct iv
 static GTY (()) vec<rtx, va_gc> *addr_list;
 
 static bool
-addr_offset_valid_p (struct iv_use *use, HOST_WIDE_INT offset)
+addr_offset_valid_p (struct iv_use *use, poly_int64 offset)
 {
   rtx reg, addr;
   unsigned list_index;
@@ -2548,10 +2548,7 @@ group_compare_offset (const void *a, con
   const struct iv_use *const *u1 = (const struct iv_use *const *) a;
   const struct iv_use *const *u2 = (const struct iv_use *const *) b;
 
-  if ((*u1)->addr_offset != (*u2)->addr_offset)
-    return (*u1)->addr_offset < (*u2)->addr_offset ? -1 : 1;
-  else
-    return 0;
+  return compare_sizes_for_sort ((*u1)->addr_offset, (*u2)->addr_offset);
 }
 
 /* Check if small groups should be split.  Return true if no group
@@ -2582,7 +2579,8 @@ split_small_address_groups_p (struct ivo
       gcc_assert (group->type == USE_ADDRESS);
       if (group->vuses.length () == 2)
 	{
-	  if (group->vuses[0]->addr_offset > group->vuses[1]->addr_offset)
+	  if (compare_sizes_for_sort (group->vuses[0]->addr_offset,
+				      group->vuses[1]->addr_offset) > 0)
 	    std::swap (group->vuses[0], group->vuses[1]);
 	}
       else
@@ -2594,7 +2592,7 @@ split_small_address_groups_p (struct ivo
       distinct = 1;
       for (pre = group->vuses[0], j = 1; j < group->vuses.length (); j++)
 	{
-	  if (group->vuses[j]->addr_offset != pre->addr_offset)
+	  if (may_ne (group->vuses[j]->addr_offset, pre->addr_offset))
 	    {
 	      pre = group->vuses[j];
 	      distinct++;
@@ -2635,13 +2633,13 @@ split_address_groups (struct ivopts_data
       for (j = 1; j < group->vuses.length ();)
 	{
 	  struct iv_use *next = group->vuses[j];
-	  HOST_WIDE_INT offset = next->addr_offset - use->addr_offset;
+	  poly_int64 offset = next->addr_offset - use->addr_offset;
 
 	  /* Split group if aksed to, or the offset against the first
 	     use can't fit in offset part of addressing mode.  IV uses
 	     having the same offset are still kept in one group.  */
-	  if (offset != 0 &&
-	      (split_p || !addr_offset_valid_p (use, offset)))
+	  if (maybe_nonzero (offset)
+	      && (split_p || !addr_offset_valid_p (use, offset)))
 	    {
 	      if (!new_group)
 		new_group = record_group (data, group->type);
@@ -2702,12 +2700,13 @@ find_interesting_uses (struct ivopts_dat
 
 static tree
 strip_offset_1 (tree expr, bool inside_addr, bool top_compref,
-		HOST_WIDE_INT *offset)
+		poly_int64 *offset)
 {
   tree op0 = NULL_TREE, op1 = NULL_TREE, tmp, step;
   enum tree_code code;
   tree type, orig_type = TREE_TYPE (expr);
-  HOST_WIDE_INT off0, off1, st;
+  poly_int64 off0, off1;
+  HOST_WIDE_INT st;
   tree orig_expr = expr;
 
   STRIP_NOPS (expr);
@@ -2718,14 +2717,6 @@ strip_offset_1 (tree expr, bool inside_a
 
   switch (code)
     {
-    case INTEGER_CST:
-      if (!cst_and_fits_in_hwi (expr)
-	  || integer_zerop (expr))
-	return orig_expr;
-
-      *offset = int_cst_value (expr);
-      return build_int_cst (orig_type, 0);
-
     case POINTER_PLUS_EXPR:
     case PLUS_EXPR:
     case MINUS_EXPR:
@@ -2843,6 +2834,8 @@ strip_offset_1 (tree expr, bool inside_a
       break;
 
     default:
+      if (ptrdiff_tree_p (expr, offset) && maybe_nonzero (*offset))
+	return build_int_cst (orig_type, 0);
       return orig_expr;
     }
 
@@ -2872,9 +2865,9 @@ strip_offset_1 (tree expr, bool inside_a
 /* Strips constant offsets from EXPR and stores them to OFFSET.  */
 
 static tree
-strip_offset (tree expr, unsigned HOST_WIDE_INT *offset)
+strip_offset (tree expr, poly_uint64 *offset)
 {
-  HOST_WIDE_INT off;
+  poly_int64 off;
   tree core = strip_offset_1 (expr, false, false, &off);
   *offset = off;
   return core;
@@ -3401,7 +3394,7 @@ add_iv_candidate_derived_from_uses (stru
 static void
 add_iv_candidate_for_use (struct ivopts_data *data, struct iv_use *use)
 {
-  unsigned HOST_WIDE_INT offset;
+  poly_uint64 offset;
   tree base;
   tree basetype;
   struct iv *iv = use->iv;
@@ -3420,7 +3413,7 @@ add_iv_candidate_for_use (struct ivopts_
   /* Record common candidate with constant offset stripped in base.
      Like the use itself, we also add candidate directly for it.  */
   base = strip_offset (iv->base, &offset);
-  if (offset || base != iv->base)
+  if (maybe_nonzero (offset) || base != iv->base)
     {
       record_common_cand (data, base, iv->step, use);
       add_candidate (data, base, iv->step, false, use);
@@ -3439,7 +3432,7 @@ add_iv_candidate_for_use (struct ivopts_
       record_common_cand (data, base, step, use);
       /* Also record common candidate with offset stripped.  */
       base = strip_offset (base, &offset);
-      if (offset)
+      if (maybe_nonzero (offset))
 	record_common_cand (data, base, step, use);
     }
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [060/nnn] poly_int: loop versioning threshold
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (59 preceding siblings ...)
  2017-10-23 17:25 ` [061/nnn] poly_int: compute_data_ref_alignment Richard Sandiford
@ 2017-10-23 17:25 ` Richard Sandiford
  2017-12-05 17:31   ` Jeff Law
  2017-10-23 17:26 ` [063/nnn] poly_int: vectoriser vf and uf Richard Sandiford
                   ` (46 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:25 UTC (permalink / raw)
  To: gcc-patches

This patch splits the loop versioning threshold out from the
cost model threshold so that the former can become a poly_uint64.
We still use a single test to enforce both limits where possible.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (_loop_vec_info): Add a versioning_threshold
	field.
	(LOOP_VINFO_VERSIONING_THRESHOLD): New macro
	(vect_loop_versioning): Take the loop versioning threshold as a
	separate parameter.
	* tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
	versioning_threshold.
	(vect_analyze_loop_2): Compute the loop versioning threshold
	whenever loop versioning is needed, and store it in the new
	field rather than combining it with the cost model threshold.
	(vect_transform_loop): Update call to vect_loop_versioning.
	Try to combine the loop versioning and cost thresholds here.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2017-10-23 17:11:39.817127625 +0100
+++ gcc/tree-vectorizer.h	2017-10-23 17:22:23.377858186 +0100
@@ -238,6 +238,12 @@ typedef struct _loop_vec_info : public v
      PARAM_MIN_VECT_LOOP_BOUND.  */
   unsigned int th;
 
+  /* When applying loop versioning, the vector form should only be used
+     if the number of scalar iterations is >= this value, on top of all
+     the other requirements.  Ignored when loop versioning is not being
+     used.  */
+  poly_uint64 versioning_threshold;
+
   /* Unrolling factor  */
   int vectorization_factor;
 
@@ -357,6 +363,7 @@ #define LOOP_VINFO_NITERS(L)
 #define LOOP_VINFO_NITERS_UNCHANGED(L)     (L)->num_iters_unchanged
 #define LOOP_VINFO_NITERS_ASSUMPTIONS(L)   (L)->num_iters_assumptions
 #define LOOP_VINFO_COST_MODEL_THRESHOLD(L) (L)->th
+#define LOOP_VINFO_VERSIONING_THRESHOLD(L) (L)->versioning_threshold
 #define LOOP_VINFO_VECTORIZABLE_P(L)       (L)->vectorizable
 #define LOOP_VINFO_VECT_FACTOR(L)          (L)->vectorization_factor
 #define LOOP_VINFO_MAX_VECT_FACTOR(L)      (L)->max_vectorization_factor
@@ -1143,7 +1150,8 @@ extern void slpeel_make_loop_iterate_nti
 extern bool slpeel_can_duplicate_loop_p (const struct loop *, const_edge);
 struct loop *slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *,
 						     struct loop *, edge);
-extern void vect_loop_versioning (loop_vec_info, unsigned int, bool);
+extern void vect_loop_versioning (loop_vec_info, unsigned int, bool,
+				  poly_uint64);
 extern struct loop *vect_do_peeling (loop_vec_info, tree, tree,
 				     tree *, tree *, tree *, int, bool, bool);
 extern source_location find_loop_location (struct loop *);
Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c	2017-10-23 17:11:39.816125711 +0100
+++ gcc/tree-vect-loop-manip.c	2017-10-23 17:22:23.376857985 +0100
@@ -2295,7 +2295,8 @@ vect_create_cond_for_alias_checks (loop_
 
 void
 vect_loop_versioning (loop_vec_info loop_vinfo,
-		      unsigned int th, bool check_profitability)
+		      unsigned int th, bool check_profitability,
+		      poly_uint64 versioning_threshold)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo), *nloop;
   struct loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo);
@@ -2320,6 +2321,17 @@ vect_loop_versioning (loop_vec_info loop
     cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
 			     build_int_cst (TREE_TYPE (scalar_loop_iters),
 					    th - 1));
+  if (maybe_nonzero (versioning_threshold))
+    {
+      tree expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
+			       build_int_cst (TREE_TYPE (scalar_loop_iters),
+					      versioning_threshold - 1));
+      if (cond_expr)
+	cond_expr = fold_build2 (BIT_AND_EXPR, boolean_type_node,
+				 expr, cond_expr);
+      else
+	cond_expr = expr;
+    }
 
   if (version_niter)
     vect_create_cond_for_niters_checks (loop_vinfo, &cond_expr);
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-10-23 17:11:39.816125711 +0100
+++ gcc/tree-vect-loop.c	2017-10-23 17:22:23.377858186 +0100
@@ -1110,6 +1110,7 @@ _loop_vec_info::_loop_vec_info (struct l
     num_iters_unchanged (NULL_TREE),
     num_iters_assumptions (NULL_TREE),
     th (0),
+    versioning_threshold (0),
     vectorization_factor (0),
     max_vectorization_factor (0),
     unaligned_dr (NULL),
@@ -2174,11 +2175,9 @@ vect_analyze_loop_2 (loop_vec_info loop_
      enough for both peeled prolog loop and vector loop.  This check
      can be merged along with threshold check of loop versioning, so
      increase threshold for this case if necessary.  */
-  if (LOOP_REQUIRES_VERSIONING (loop_vinfo)
-      && (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
-	  || LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)))
+  if (LOOP_REQUIRES_VERSIONING (loop_vinfo))
     {
-      unsigned niters_th;
+      poly_uint64 niters_th;
 
       /* Niters for peeled prolog loop.  */
       if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) < 0)
@@ -2195,9 +2194,8 @@ vect_analyze_loop_2 (loop_vec_info loop_
       niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
       /* One additional iteration because of peeling for gap.  */
       if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
-	niters_th++;
-      if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
-	LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
+	niters_th += 1;
+      LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo) = niters_th;
     }
 
   gcc_assert (vectorization_factor
@@ -2300,6 +2298,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
   LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
   LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = false;
   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = 0;
+  LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo) = 0;
 
   goto start_over;
 }
@@ -7320,7 +7319,17 @@ vect_transform_loop (loop_vec_info loop_
 
   if (LOOP_REQUIRES_VERSIONING (loop_vinfo))
     {
-      vect_loop_versioning (loop_vinfo, th, check_profitability);
+      poly_uint64 versioning_threshold
+	= LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo);
+      if (check_profitability
+	  && ordered_p (poly_uint64 (th), versioning_threshold))
+	{
+	  versioning_threshold = ordered_max (poly_uint64 (th),
+					      versioning_threshold);
+	  check_profitability = false;
+	}
+      vect_loop_versioning (loop_vinfo, th, check_profitability,
+			    versioning_threshold);
       check_profitability = false;
     }
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [061/nnn] poly_int: compute_data_ref_alignment
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (58 preceding siblings ...)
  2017-10-23 17:25 ` [059/nnn] poly_int: tree-ssa-loop-ivopts.c:iv_use Richard Sandiford
@ 2017-10-23 17:25 ` Richard Sandiford
  2017-11-28 16:49   ` Jeff Law
  2017-10-23 17:25 ` [060/nnn] poly_int: loop versioning threshold Richard Sandiford
                   ` (47 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:25 UTC (permalink / raw)
  To: gcc-patches

This patch makes vect_compute_data_ref_alignment treat DR_INIT as a
poly_int and handles cases in which the calculated misalignment might
not be constant.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
	Treat drb->init as a poly_int.  Fail if its misalignment wrt
	vector_alignment isn't known.

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 17:22:18.234826257 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:22:24.456074525 +0100
@@ -944,8 +944,8 @@ vect_compute_data_ref_alignment (struct
       DR_VECT_AUX (dr)->base_misaligned = true;
       base_misalignment = 0;
     }
-  unsigned int misalignment = (base_misalignment
-			       + TREE_INT_CST_LOW (drb->init));
+  poly_int64 misalignment
+    = base_misalignment + wi::to_poly_offset (drb->init).force_shwi ();
 
   /* If this is a backward running DR then first access in the larger
      vectype actually is N-1 elements before the address in the DR.
@@ -955,7 +955,21 @@ vect_compute_data_ref_alignment (struct
     misalignment += ((TYPE_VECTOR_SUBPARTS (vectype) - 1)
 		     * TREE_INT_CST_LOW (drb->step));
 
-  SET_DR_MISALIGNMENT (dr, misalignment & (vector_alignment - 1));
+  unsigned int const_misalignment;
+  if (!known_misalignment (misalignment, vector_alignment,
+			   &const_misalignment))
+    {
+      if (dump_enabled_p ())
+	{
+	  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			   "Non-constant misalignment for access: ");
+	  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, ref);
+	  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+	}
+      return true;
+    }
+
+  SET_DR_MISALIGNMENT (dr, const_misalignment);
 
   if (dump_enabled_p ())
     {

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [063/nnn] poly_int: vectoriser vf and uf
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (60 preceding siblings ...)
  2017-10-23 17:25 ` [060/nnn] poly_int: loop versioning threshold Richard Sandiford
@ 2017-10-23 17:26 ` Richard Sandiford
  2017-12-06  2:46   ` Jeff Law
  2018-01-03 21:23   ` [PATCH] Fix gcc.dg/vect-opt-info-1.c testcase Jakub Jelinek
  2017-10-23 17:26 ` [062/nnn] poly_int: prune_runtime_alias_test_list Richard Sandiford
                   ` (45 subsequent siblings)
  107 siblings, 2 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:26 UTC (permalink / raw)
  To: gcc-patches

This patch changes the type of the vectorisation factor and SLP
unrolling factor to poly_uint64.  This in turn required some knock-on
changes in signedness elsewhere.

Cost decisions are generally based on estimated_poly_value,
which for VF is wrapped up as vect_vf_for_cost.

The patch doesn't on its own enable variable-length vectorisation.
It just makes the minimum changes necessary for the code to build
with the new VF and UF types.  Later patches also make the
vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable
GET_MODE_NUNITS, at which point the code really does handle
variable-length vectors.

The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX,
to avoid hard-coding a particular architectural limit.

The patch includes a new test because a development version of the patch
accidentally used file print routines instead of dump_*, which would
fail with -fopt-info.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (_slp_instance::unrolling_factor): Change
	from an unsigned int to a poly_uint64.
	(_loop_vec_info::slp_unrolling_factor): Likewise.
	(_loop_vec_info::vectorization_factor): Change from an int
	to a poly_uint64.
	(MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX.
	(vect_get_num_vectors): New function.
	(vect_update_max_nunits, vect_vf_for_cost): Likewise.
	(vect_get_num_copies): Use vect_get_num_vectors.
	(vect_analyze_data_ref_dependences): Change max_vf from an int *
	to an unsigned int *.
	(vect_analyze_data_refs): Change min_vf from an int * to a
	poly_uint64 *.
	(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
	than an unsigned HOST_WIDE_INT.
	* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
	(vect_analyze_data_ref_dependence): Change max_vf from an int *
	to an unsigned int *.
	(vect_analyze_data_ref_dependences): Likewise.
	(vect_compute_data_ref_alignment): Handle polynomial vf.
	(vect_enhance_data_refs_alignment): Likewise.
	(vect_prune_runtime_alias_test_list): Likewise.
	(vect_shift_permute_load_chain): Likewise.
	(vect_supportable_dr_alignment): Likewise.
	(dependence_distance_ge_vf): Take the vectorization factor as a
	poly_uint64 rather than an unsigned HOST_WIDE_INT.
	(vect_analyze_data_refs): Change min_vf from an int * to a
	poly_uint64 *.
	* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take
	vfm1 as a poly_uint64 rather than an int.  Make the same change
	for the returned bound_scalar.
	(vect_gen_vector_loop_niters): Handle polynomial vf.
	(vect_do_peeling): Likewise.  Update call to
	vect_gen_scalar_loop_niters and handle polynomial bound_scalars.
	(vect_gen_vector_loop_niters_mult_vf): Assert that the vf must
	be constant.
	* tree-vect-loop.c (vect_determine_vectorization_factor)
	(vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf.
	(vect_get_known_peeling_cost): Likewise.
	(vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise.
	(vect_worthwhile_without_simd_p, vectorizable_induction): Likewise.
	(vect_transform_loop): Likewise.  Use the lowest possible VF when
	updating the upper bounds of the loop.
	(vect_min_worthwhile_factor): Make static.  Return an unsigned int
	rather than an int.
	* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with
	polynomial unroll factors.
	(vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise.
	(vect_make_slp_decision): Likewise.
	(vect_supported_load_permutation_p): Likewise, and polynomial
	vf too.
	(vect_analyze_slp_cost): Handle polynomial vf.
	(vect_slp_analyze_node_operations): Likewise.
	(vect_slp_analyze_bb_1): Likewise.
	(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
	than an unsigned HOST_WIDE_INT.
	* tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store)
	(vectorizable_load): Handle polynomial vf.
	* tree-vectorizer.c (simduid_to_vf::vf): Change from an int to
	a poly_uint64.
	(adjust_simduid_builtins, shrink_simd_arrays): Update accordingly.

gcc/testsuite/
	* gcc.dg/vect-opt-info-1.c: New test.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2017-10-23 17:22:23.377858186 +0100
+++ gcc/tree-vectorizer.h	2017-10-23 17:22:26.575499779 +0100
@@ -129,7 +129,7 @@ typedef struct _slp_instance {
   unsigned int group_size;
 
   /* The unrolling factor required to vectorized this SLP instance.  */
-  unsigned int unrolling_factor;
+  poly_uint64 unrolling_factor;
 
   /* The group of nodes that contain loads of this SLP instance.  */
   vec<slp_tree> loads;
@@ -245,7 +245,7 @@ typedef struct _loop_vec_info : public v
   poly_uint64 versioning_threshold;
 
   /* Unrolling factor  */
-  int vectorization_factor;
+  poly_uint64 vectorization_factor;
 
   /* Maximum runtime vectorization factor, or MAX_VECTORIZATION_FACTOR
      if there is no particular limit.  */
@@ -297,7 +297,7 @@ typedef struct _loop_vec_info : public v
 
   /* The unrolling factor needed to SLP the loop. In case of that pure SLP is
      applied to the loop, i.e., no unrolling is needed, this is 1.  */
-  unsigned slp_unrolling_factor;
+  poly_uint64 slp_unrolling_factor;
 
   /* Cost of a single scalar iteration.  */
   int single_scalar_iteration_cost;
@@ -815,8 +815,7 @@ #define VECT_MAX_COST 1000
    conversion.  */
 #define MAX_INTERM_CVT_STEPS         3
 
-/* The maximum vectorization factor supported by any target (V64QI).  */
-#define MAX_VECTORIZATION_FACTOR 64
+#define MAX_VECTORIZATION_FACTOR INT_MAX
 
 /* Nonzero if TYPE represents a (scalar) boolean type or type
    in the middle-end compatible with it (unsigned precision 1 integral
@@ -1109,6 +1108,16 @@ unlimited_cost_model (loop_p loop)
   return (flag_vect_cost_model == VECT_COST_MODEL_UNLIMITED);
 }
 
+/* Return the number of vectors of type VECTYPE that are needed to get
+   NUNITS elements.  NUNITS should be based on the vectorization factor,
+   so it is always a known multiple of the number of elements in VECTYPE.  */
+
+static inline unsigned int
+vect_get_num_vectors (poly_uint64 nunits, tree vectype)
+{
+  return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
+}
+
 /* Return the number of copies needed for loop vectorization when
    a statement operates on vectors of type VECTYPE.  This is the
    vectorization factor divided by the number of elements in
@@ -1117,10 +1126,32 @@ unlimited_cost_model (loop_p loop)
 static inline unsigned int
 vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype)
 {
-  gcc_checking_assert (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-		       % TYPE_VECTOR_SUBPARTS (vectype) == 0);
-  return (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-	  / TYPE_VECTOR_SUBPARTS (vectype));
+  return vect_get_num_vectors (LOOP_VINFO_VECT_FACTOR (loop_vinfo), vectype);
+}
+
+/* Update maximum unit count *MAX_NUNITS so that it accounts for
+   the number of units in vector type VECTYPE.  *MAX_NUNITS can be 1
+   if we haven't yet recorded any vector types.  */
+
+static inline void
+vect_update_max_nunits (poly_uint64 *max_nunits, tree vectype)
+{
+  /* All unit counts have the form current_vector_size * X for some
+     rational X, so two unit sizes must have a common multiple.
+     Everything is a multiple of the initial value of 1.  */
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  *max_nunits = force_common_multiple (*max_nunits, nunits);
+}
+
+/* Return the vectorization factor that should be used for costing
+   purposes while vectorizing the loop described by LOOP_VINFO.
+   Pick a reasonable estimate if the vectorization factor isn't
+   known at compile time.  */
+
+static inline unsigned int
+vect_vf_for_cost (loop_vec_info loop_vinfo)
+{
+  return estimated_poly_value (LOOP_VINFO_VECT_FACTOR (loop_vinfo));
 }
 
 /* Return the size of the value accessed by unvectorized data reference DR.
@@ -1223,7 +1254,7 @@ extern bool vect_can_force_dr_alignment_
                                            (struct data_reference *, bool);
 extern tree vect_get_smallest_scalar_type (gimple *, HOST_WIDE_INT *,
                                            HOST_WIDE_INT *);
-extern bool vect_analyze_data_ref_dependences (loop_vec_info, int *);
+extern bool vect_analyze_data_ref_dependences (loop_vec_info, unsigned int *);
 extern bool vect_slp_analyze_instance_dependence (slp_instance);
 extern bool vect_enhance_data_refs_alignment (loop_vec_info);
 extern bool vect_analyze_data_refs_alignment (loop_vec_info);
@@ -1233,7 +1264,7 @@ extern bool vect_analyze_data_ref_access
 extern bool vect_prune_runtime_alias_test_list (loop_vec_info);
 extern bool vect_check_gather_scatter (gimple *, loop_vec_info,
 				       gather_scatter_info *);
-extern bool vect_analyze_data_refs (vec_info *, int *);
+extern bool vect_analyze_data_refs (vec_info *, poly_uint64 *);
 extern void vect_record_base_alignments (vec_info *);
 extern tree vect_create_data_ref_ptr (gimple *, tree, struct loop *, tree,
 				      tree *, gimple_stmt_iterator *,
@@ -1288,8 +1319,8 @@ extern int vect_get_known_peeling_cost (
 /* In tree-vect-slp.c.  */
 extern void vect_free_slp_instance (slp_instance);
 extern bool vect_transform_slp_perm_load (slp_tree, vec<tree> ,
-                                          gimple_stmt_iterator *, int,
-                                          slp_instance, bool, unsigned *);
+					  gimple_stmt_iterator *, poly_uint64,
+					  slp_instance, bool, unsigned *);
 extern bool vect_slp_analyze_operations (vec_info *);
 extern bool vect_schedule_slp (vec_info *);
 extern bool vect_analyze_slp (vec_info *, unsigned);
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 17:22:24.456074525 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:22:26.571498977 +0100
@@ -179,7 +179,7 @@ vect_mark_for_runtime_alias_test (ddr_p
 static bool
 vect_analyze_possibly_independent_ddr (data_dependence_relation *ddr,
 				       loop_vec_info loop_vinfo,
-				       int loop_depth, int *max_vf)
+				       int loop_depth, unsigned int *max_vf)
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   lambda_vector dist_v;
@@ -199,7 +199,7 @@ vect_analyze_possibly_independent_ddr (d
 	     would be a win.  */
 	  if (loop->safelen >= 2 && abs_hwi (dist) <= loop->safelen)
 	    {
-	      if (loop->safelen < *max_vf)
+	      if ((unsigned int) loop->safelen < *max_vf)
 		*max_vf = loop->safelen;
 	      LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) = false;
 	      continue;
@@ -228,7 +228,8 @@ vect_analyze_possibly_independent_ddr (d
 
 static bool
 vect_analyze_data_ref_dependence (struct data_dependence_relation *ddr,
-                                  loop_vec_info loop_vinfo, int *max_vf)
+				  loop_vec_info loop_vinfo,
+				  unsigned int *max_vf)
 {
   unsigned int i;
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
@@ -277,7 +278,7 @@ vect_analyze_data_ref_dependence (struct
 	 executed concurrently, assume independence.  */
       if (loop->safelen >= 2)
 	{
-	  if (loop->safelen < *max_vf)
+	  if ((unsigned int) loop->safelen < *max_vf)
 	    *max_vf = loop->safelen;
 	  LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) = false;
 	  return false;
@@ -325,7 +326,7 @@ vect_analyze_data_ref_dependence (struct
 	 executed concurrently, assume independence.  */
       if (loop->safelen >= 2)
 	{
-	  if (loop->safelen < *max_vf)
+	  if ((unsigned int) loop->safelen < *max_vf)
 	    *max_vf = loop->safelen;
 	  LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) = false;
 	  return false;
@@ -445,8 +446,8 @@ vect_analyze_data_ref_dependence (struct
 	  continue;
 	}
 
-      if (abs (dist) >= 2
-	  && abs (dist) < *max_vf)
+      unsigned int abs_dist = abs (dist);
+      if (abs_dist >= 2 && abs_dist < *max_vf)
 	{
 	  /* The dependence distance requires reduction of the maximal
 	     vectorization factor.  */
@@ -457,7 +458,7 @@ vect_analyze_data_ref_dependence (struct
 	                     *max_vf);
 	}
 
-      if (abs (dist) >= *max_vf)
+      if (abs_dist >= *max_vf)
 	{
 	  /* Dependence distance does not create dependence, as far as
 	     vectorization is concerned, in this case.  */
@@ -491,7 +492,8 @@ vect_analyze_data_ref_dependence (struct
    the maximum vectorization factor the data dependences allow.  */
 
 bool
-vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo, int *max_vf)
+vect_analyze_data_ref_dependences (loop_vec_info loop_vinfo,
+				   unsigned int *max_vf)
 {
   unsigned int i;
   struct data_dependence_relation *ddr;
@@ -862,9 +864,9 @@ vect_compute_data_ref_alignment (struct
      the dataref evenly divides by the alignment.  */
   else
     {
-      unsigned vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
       step_preserves_misalignment_p
-	= ((DR_STEP_ALIGNMENT (dr) * vf) % vector_alignment) == 0;
+	= multiple_p (DR_STEP_ALIGNMENT (dr) * vf, vector_alignment);
 
       if (!step_preserves_misalignment_p && dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -1610,10 +1612,10 @@ vect_enhance_data_refs_alignment (loop_v
   bool one_misalignment_unknown = false;
   bool one_dr_unsupportable = false;
   struct data_reference *unsupportable_dr = NULL;
-  unsigned int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned possible_npeel_number = 1;
   tree vectype;
-  unsigned int nelements, mis, same_align_drs_max = 0;
+  unsigned int mis, same_align_drs_max = 0;
   hash_table<peel_info_hasher> peeling_htab (1);
 
   if (dump_enabled_p ())
@@ -1691,7 +1693,6 @@ vect_enhance_data_refs_alignment (loop_v
 						    size_zero_node) < 0;
 
 	      vectype = STMT_VINFO_VECTYPE (stmt_info);
-	      nelements = TYPE_VECTOR_SUBPARTS (vectype);
 	      unsigned int target_align = DR_TARGET_ALIGNMENT (dr);
 	      unsigned int dr_size = vect_get_scalar_dr_size (dr);
 	      mis = (negative ? DR_MISALIGNMENT (dr) : -DR_MISALIGNMENT (dr));
@@ -1713,11 +1714,10 @@ vect_enhance_data_refs_alignment (loop_v
 		 cost for every peeling option.  */
               if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
 		{
-		  if (STMT_SLP_TYPE (stmt_info))
-		    possible_npeel_number
-		      = (vf * GROUP_SIZE (stmt_info)) / nelements;
-		  else
-		    possible_npeel_number = vf / nelements;
+		  poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
+					  ? vf * GROUP_SIZE (stmt_info) : vf);
+		  possible_npeel_number
+		    = vect_get_num_vectors (nscalars, vectype);
 
 		  /* NPEEL_TMP is 0 when there is no misalignment, but also
 		     allow peeling NELEMENTS.  */
@@ -1816,13 +1816,14 @@ vect_enhance_data_refs_alignment (loop_v
       unsigned int load_outside_cost = 0;
       unsigned int store_inside_cost = 0;
       unsigned int store_outside_cost = 0;
+      unsigned int estimated_npeels = vect_vf_for_cost (loop_vinfo) / 2;
 
       stmt_vector_for_cost dummy;
       dummy.create (2);
       vect_get_peeling_costs_all_drs (datarefs, dr0,
 				      &load_inside_cost,
 				      &load_outside_cost,
-				      &dummy, vf / 2, true);
+				      &dummy, estimated_npeels, true);
       dummy.release ();
 
       if (first_store)
@@ -1831,7 +1832,7 @@ vect_enhance_data_refs_alignment (loop_v
 	  vect_get_peeling_costs_all_drs (datarefs, first_store,
 					  &store_inside_cost,
 					  &store_outside_cost,
-					  &dummy, vf / 2, true);
+					  &dummy, estimated_npeels, true);
 	  dummy.release ();
 	}
       else
@@ -1860,7 +1861,7 @@ vect_enhance_data_refs_alignment (loop_v
 
       int dummy2;
       peel_for_unknown_alignment.outside_cost += vect_get_known_peeling_cost
-	(loop_vinfo, vf / 2, &dummy2,
+	(loop_vinfo, estimated_npeels, &dummy2,
 	 &LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
 	 &prologue_cost_vec, &epilogue_cost_vec);
 
@@ -2020,14 +2021,16 @@ vect_enhance_data_refs_alignment (loop_v
         }
 
       /* Cost model #2 - if peeling may result in a remaining loop not
-	 iterating enough to be vectorized then do not peel.  */
+	 iterating enough to be vectorized then do not peel.  Since this
+	 is a cost heuristic rather than a correctness decision, use the
+	 most likely runtime value for variable vectorization factors.  */
       if (do_peeling
 	  && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
 	{
-	  unsigned max_peel
-	    = npeel == 0 ? LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1 : npeel;
-	  if (LOOP_VINFO_INT_NITERS (loop_vinfo)
-	      < LOOP_VINFO_VECT_FACTOR (loop_vinfo) + max_peel)
+	  unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
+	  unsigned int max_peel = npeel == 0 ? assumed_vf - 1 : npeel;
+	  if ((unsigned HOST_WIDE_INT) LOOP_VINFO_INT_NITERS (loop_vinfo)
+	      < assumed_vf + max_peel)
 	    do_peeling = false;
 	}
 
@@ -3038,7 +3041,7 @@ vect_no_alias_p (struct data_reference *
 
 static bool
 dependence_distance_ge_vf (data_dependence_relation *ddr,
-			   unsigned int loop_depth, unsigned HOST_WIDE_INT vf)
+			   unsigned int loop_depth, poly_uint64 vf)
 {
   if (DDR_ARE_DEPENDENT (ddr) != NULL_TREE
       || DDR_NUM_DIST_VECTS (ddr) == 0)
@@ -3054,7 +3057,7 @@ dependence_distance_ge_vf (data_dependen
       HOST_WIDE_INT dist = dist_v[loop_depth];
       if (dist != 0
 	  && !(dist > 0 && DDR_REVERSED_P (ddr))
-	  && (unsigned HOST_WIDE_INT) abs_hwi (dist) < vf)
+	  && may_lt ((unsigned HOST_WIDE_INT) abs_hwi (dist), vf))
 	return false;
     }
 
@@ -3089,7 +3092,7 @@ vect_prune_runtime_alias_test_list (loop
     = LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo);
   vec<vec_object_pair> &check_unequal_addrs
     = LOOP_VINFO_CHECK_UNEQUAL_ADDRS (loop_vinfo);
-  int vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  poly_uint64 vect_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   tree scalar_loop_iters = LOOP_VINFO_NITERS (loop_vinfo);
 
   ddr_p ddr;
@@ -3200,8 +3203,7 @@ vect_prune_runtime_alias_test_list (loop
       comp_alias_ddrs.safe_push (dr_with_seg_len_pair);
     }
 
-  prune_runtime_alias_test_list (&comp_alias_ddrs,
-				 (unsigned HOST_WIDE_INT) vect_factor);
+  prune_runtime_alias_test_list (&comp_alias_ddrs, vect_factor);
 
   unsigned int count = (comp_alias_ddrs.length ()
 			+ check_unequal_addrs.length ());
@@ -3453,7 +3455,7 @@ vect_check_gather_scatter (gimple *stmt,
 */
 
 bool
-vect_analyze_data_refs (vec_info *vinfo, int *min_vf)
+vect_analyze_data_refs (vec_info *vinfo, poly_uint64 *min_vf)
 {
   struct loop *loop = NULL;
   unsigned int i;
@@ -3478,7 +3480,7 @@ vect_analyze_data_refs (vec_info *vinfo,
       tree base, offset, init;
       enum { SG_NONE, GATHER, SCATTER } gatherscatter = SG_NONE;
       bool simd_lane_access = false;
-      int vf;
+      poly_uint64 vf;
 
 again:
       if (!dr || !DR_REF (dr))
@@ -3862,8 +3864,7 @@ vect_analyze_data_refs (vec_info *vinfo,
       /* Adjust the minimal vectorization factor according to the
 	 vector type.  */
       vf = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
-      if (vf > *min_vf)
-	*min_vf = vf;
+      *min_vf = upper_bound (*min_vf, vf);
 
       if (gatherscatter != SG_NONE)
 	{
@@ -5515,6 +5516,11 @@ vect_shift_permute_load_chain (vec<tree>
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
 
+  unsigned HOST_WIDE_INT vf;
+  if (!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&vf))
+    /* Not supported for variable-length vectors.  */
+    return false;
+
   auto_vec_perm_indices sel (nelt);
   sel.quick_grow (nelt);
 
@@ -5522,7 +5528,7 @@ vect_shift_permute_load_chain (vec<tree>
   memcpy (result_chain->address (), dr_chain.address (),
 	  length * sizeof (tree));
 
-  if (pow2p_hwi (length) && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 4)
+  if (pow2p_hwi (length) && vf > 4)
     {
       unsigned int j, log_length = exact_log2 (length);
       for (i = 0; i < nelt / 2; ++i)
@@ -5619,7 +5625,7 @@ vect_shift_permute_load_chain (vec<tree>
 	}
       return true;
     }
-  if (length == 3 && LOOP_VINFO_VECT_FACTOR (loop_vinfo) > 2)
+  if (length == 3 && vf > 2)
     {
       unsigned int k = 0, l = 0;
 
@@ -5987,9 +5993,10 @@ vect_supportable_dr_alignment (struct da
 	     same alignment, instead it depends on the SLP group size.  */
 	  if (loop_vinfo
 	      && STMT_SLP_TYPE (stmt_info)
-	      && (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-		  * GROUP_SIZE (vinfo_for_stmt (GROUP_FIRST_ELEMENT (stmt_info)))
-		  % TYPE_VECTOR_SUBPARTS (vectype) != 0))
+	      && !multiple_p (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+			      * GROUP_SIZE (vinfo_for_stmt
+					    (GROUP_FIRST_ELEMENT (stmt_info))),
+			      TYPE_VECTOR_SUBPARTS (vectype)))
 	    ;
 	  else if (!loop_vinfo
 		   || (nested_in_vect_loop
Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c	2017-10-23 17:22:23.376857985 +0100
+++ gcc/tree-vect-loop-manip.c	2017-10-23 17:22:26.572499177 +0100
@@ -1235,8 +1235,9 @@ vect_build_loop_niters (loop_vec_info lo
 
 static tree
 vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
-			     int bound_prolog, int vfm1, int th,
-			     int *bound_scalar, bool check_profitability)
+			     int bound_prolog, poly_int64 vfm1, int th,
+			     poly_uint64 *bound_scalar,
+			     bool check_profitability)
 {
   tree type = TREE_TYPE (niters_prolog);
   tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
@@ -1251,21 +1252,23 @@ vect_gen_scalar_loop_niters (tree niters
       /* Peeling for constant times.  */
       if (int_niters_prolog >= 0)
 	{
-	  *bound_scalar = (int_niters_prolog + vfm1 < th
-			    ? th
-			    : vfm1 + int_niters_prolog);
+	  *bound_scalar = upper_bound (int_niters_prolog + vfm1, th);
 	  return build_int_cst (type, *bound_scalar);
 	}
       /* Peeling for unknown times.  Note BOUND_PROLOG is the upper
 	 bound (inlcuded) of niters of prolog loop.  */
-      if (th >=  vfm1 + bound_prolog)
+      if (must_ge (th, vfm1 + bound_prolog))
 	{
 	  *bound_scalar = th;
 	  return build_int_cst (type, th);
 	}
-      /* Need to do runtime comparison, but BOUND_SCALAR remains the same.  */
-      else if (th > vfm1)
-	return fold_build2 (MAX_EXPR, type, build_int_cst (type, th), niters);
+      /* Need to do runtime comparison.  */
+      else if (may_gt (th, vfm1))
+	{
+	  *bound_scalar = upper_bound (*bound_scalar, th);
+	  return fold_build2 (MAX_EXPR, type,
+			      build_int_cst (type, th), niters);
+	}
     }
   return niters;
 }
@@ -1293,7 +1296,7 @@ vect_gen_vector_loop_niters (loop_vec_in
 {
   tree ni_minus_gap, var;
   tree niters_vector, step_vector, type = TREE_TYPE (niters);
-  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   edge pe = loop_preheader_edge (LOOP_VINFO_LOOP (loop_vinfo));
   tree log_vf = NULL_TREE;
 
@@ -1316,14 +1319,15 @@ vect_gen_vector_loop_niters (loop_vec_in
   else
     ni_minus_gap = niters;
 
-  if (1)
+  unsigned HOST_WIDE_INT const_vf;
+  if (vf.is_constant (&const_vf))
     {
       /* Create: niters >> log2(vf) */
       /* If it's known that niters == number of latch executions + 1 doesn't
 	 overflow, we can generate niters >> log2(vf); otherwise we generate
 	 (niters - vf) >> log2(vf) + 1 by using the fact that we know ratio
 	 will be at least one.  */
-      log_vf = build_int_cst (type, exact_log2 (vf));
+      log_vf = build_int_cst (type, exact_log2 (const_vf));
       if (niters_no_overflow)
 	niters_vector = fold_build2 (RSHIFT_EXPR, type, ni_minus_gap, log_vf);
       else
@@ -1374,7 +1378,8 @@ vect_gen_vector_loop_niters_mult_vf (loo
 				     tree niters_vector,
 				     tree *niters_vector_mult_vf_ptr)
 {
-  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  /* We should be using a step_vector of VF if VF is variable.  */
+  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo).to_constant ();
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree type = TREE_TYPE (niters_vector);
   tree log_vf = build_int_cst (type, exact_log2 (vf));
@@ -1791,8 +1796,9 @@ vect_do_peeling (loop_vec_info loop_vinf
   tree type = TREE_TYPE (niters), guard_cond;
   basic_block guard_bb, guard_to;
   profile_probability prob_prolog, prob_vector, prob_epilog;
-  int bound_prolog = 0, bound_scalar = 0, bound = 0;
-  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  int bound_prolog = 0;
+  poly_uint64 bound_scalar = 0;
+  int estimated_vf;
   int prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
   bool epilog_peeling = (LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
 			 || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo));
@@ -1801,11 +1807,12 @@ vect_do_peeling (loop_vec_info loop_vinf
     return NULL;
 
   prob_vector = profile_probability::guessed_always ().apply_scale (9, 10);
-  if ((vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo)) == 2)
-    vf = 3;
+  estimated_vf = vect_vf_for_cost (loop_vinfo);
+  if (estimated_vf == 2)
+    estimated_vf = 3;
   prob_prolog = prob_epilog = profile_probability::guessed_always ()
-			.apply_scale (vf - 1, vf);
-  vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+			.apply_scale (estimated_vf - 1, estimated_vf);
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
 
   struct loop *prolog, *epilog = NULL, *loop = LOOP_VINFO_LOOP (loop_vinfo);
   struct loop *first_loop = loop;
@@ -1825,13 +1832,15 @@ vect_do_peeling (loop_vec_info loop_vinf
   /* Skip to epilog if scalar loop may be preferred.  It's only needed
      when we peel for epilog loop and when it hasn't been checked with
      loop versioning.  */
-  bool skip_vector = (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-		      && !LOOP_REQUIRES_VERSIONING (loop_vinfo));
+  bool skip_vector = ((!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+		       && !LOOP_REQUIRES_VERSIONING (loop_vinfo))
+		      || !vf.is_constant ());
   /* Epilog loop must be executed if the number of iterations for epilog
      loop is known at compile time, otherwise we need to add a check at
      the end of vector loop and skip to the end of epilog loop.  */
   bool skip_epilog = (prolog_peeling < 0
-		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo));
+		      || !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+		      || !vf.is_constant ());
   /* PEELING_FOR_GAPS is special because epilog loop must be executed.  */
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
     skip_epilog = false;
@@ -1850,8 +1859,10 @@ vect_do_peeling (loop_vec_info loop_vinf
 	 needs to be scaled back later.  */
       basic_block bb_before_loop = loop_preheader_edge (loop)->src;
       if (prob_vector.initialized_p ())
-      scale_bbs_frequencies (&bb_before_loop, 1, prob_vector);
-      scale_loop_profile (loop, prob_vector, bound);
+	{
+	  scale_bbs_frequencies (&bb_before_loop, 1, prob_vector);
+	  scale_loop_profile (loop, prob_vector, 0);
+	}
     }
 
   tree niters_prolog = build_int_cst (type, 0);
@@ -2037,15 +2048,20 @@ vect_do_peeling (loop_vec_info loop_vinf
 
 	      scale_bbs_frequencies (&bb_before_epilog, 1, prob_epilog);
 	    }
-	  scale_loop_profile (epilog, prob_epilog, bound);
+	  scale_loop_profile (epilog, prob_epilog, 0);
 	}
       else
 	slpeel_update_phi_nodes_for_lcssa (epilog);
 
-      bound = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? vf - 1 : vf - 2;
-      /* We share epilog loop with scalar version loop.  */
-      bound = MAX (bound, bound_scalar - 1);
-      record_niter_bound (epilog, bound, false, true);
+      unsigned HOST_WIDE_INT bound1, bound2;
+      if (vf.is_constant (&bound1) && bound_scalar.is_constant (&bound2))
+	{
+	  bound1 -= LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? 1 : 2;
+	  if (bound2)
+	    /* We share epilog loop with scalar version loop.  */
+	    bound1 = MAX (bound1, bound2 - 1);
+	  record_niter_bound (epilog, bound1, false, true);
+	}
 
       delete_update_ssa ();
       adjust_vec_debug_stmts ();
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-10-23 17:22:23.377858186 +0100
+++ gcc/tree-vect-loop.c	2017-10-23 17:22:26.573499378 +0100
@@ -182,11 +182,10 @@ vect_determine_vectorization_factor (loo
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
   unsigned nbbs = loop->num_nodes;
-  unsigned int vectorization_factor = 0;
+  poly_uint64 vectorization_factor = 1;
   tree scalar_type = NULL_TREE;
   gphi *phi;
   tree vectype;
-  unsigned int nunits;
   stmt_vec_info stmt_info;
   unsigned i;
   HOST_WIDE_INT dummy;
@@ -255,14 +254,12 @@ vect_determine_vectorization_factor (loo
                   dump_printf (MSG_NOTE, "\n");
 		}
 
-	      nunits = TYPE_VECTOR_SUBPARTS (vectype);
 	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, vect_location, "nunits = %d\n",
-                                 nunits);
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "nunits = " HOST_WIDE_INT_PRINT_DEC "\n",
+                                 TYPE_VECTOR_SUBPARTS (vectype));
 
-	      if (!vectorization_factor
-		  || (nunits > vectorization_factor))
-		vectorization_factor = nunits;
+	      vect_update_max_nunits (&vectorization_factor, vectype);
 	    }
 	}
 
@@ -550,12 +547,12 @@ vect_determine_vectorization_factor (loo
               dump_printf (MSG_NOTE, "\n");
 	    }
 
-	  nunits = TYPE_VECTOR_SUBPARTS (vf_vectype);
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, vect_location, "nunits = %d\n", nunits);
-	  if (!vectorization_factor
-	      || (nunits > vectorization_factor))
-	    vectorization_factor = nunits;
+	    dump_printf_loc (MSG_NOTE, vect_location,
+			     "nunits = " HOST_WIDE_INT_PRINT_DEC "\n",
+			     TYPE_VECTOR_SUBPARTS (vf_vectype));
+
+	  vect_update_max_nunits (&vectorization_factor, vf_vectype);
 
 	  if (!analyze_pattern_stmt && gsi_end_p (pattern_def_si))
 	    {
@@ -567,9 +564,13 @@ vect_determine_vectorization_factor (loo
 
   /* TODO: Analyze cost. Decide if worth while to vectorize.  */
   if (dump_enabled_p ())
-    dump_printf_loc (MSG_NOTE, vect_location, "vectorization factor = %d\n",
-                     vectorization_factor);
-  if (vectorization_factor <= 1)
+    {
+      dump_printf_loc (MSG_NOTE, vect_location, "vectorization factor = ");
+      dump_dec (MSG_NOTE, vectorization_factor);
+      dump_printf (MSG_NOTE, "\n");
+    }
+
+  if (must_le (vectorization_factor, 1U))
     {
       if (dump_enabled_p ())
         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -1561,7 +1562,7 @@ vect_update_vf_for_slp (loop_vec_info lo
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
   int nbbs = loop->num_nodes;
-  unsigned int vectorization_factor;
+  poly_uint64 vectorization_factor;
   int i;
 
   if (dump_enabled_p ())
@@ -1569,7 +1570,7 @@ vect_update_vf_for_slp (loop_vec_info lo
 		     "=== vect_update_vf_for_slp ===\n");
 
   vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  gcc_assert (vectorization_factor != 0);
+  gcc_assert (known_nonzero (vectorization_factor));
 
   /* If all the stmts in the loop can be SLPed, we perform only SLP, and
      vectorization factor of the loop is the unrolling factor required by
@@ -1609,16 +1610,22 @@ vect_update_vf_for_slp (loop_vec_info lo
     {
       dump_printf_loc (MSG_NOTE, vect_location,
 		       "Loop contains SLP and non-SLP stmts\n");
+      /* Both the vectorization factor and unroll factor have the form
+	 current_vector_size * X for some rational X, so they must have
+	 a common multiple.  */
       vectorization_factor
-	= least_common_multiple (vectorization_factor,
+	= force_common_multiple (vectorization_factor,
 				 LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo));
     }
 
   LOOP_VINFO_VECT_FACTOR (loop_vinfo) = vectorization_factor;
   if (dump_enabled_p ())
-    dump_printf_loc (MSG_NOTE, vect_location,
-		     "Updating vectorization factor to %d\n",
-		     vectorization_factor);
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "Updating vectorization factor to ");
+      dump_dec (MSG_NOTE, vectorization_factor);
+      dump_printf (MSG_NOTE, ".\n");
+    }
 }
 
 /* Function vect_analyze_loop_operations.
@@ -1789,8 +1796,8 @@ vect_analyze_loop_operations (loop_vec_i
 vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal)
 {
   bool ok;
-  int max_vf = MAX_VECTORIZATION_FACTOR;
-  int min_vf = 2;
+  unsigned int max_vf = MAX_VECTORIZATION_FACTOR;
+  poly_uint64 min_vf = 2;
   unsigned int n_stmts = 0;
 
   /* The first group of checks is independent of the vector size.  */
@@ -1915,7 +1922,8 @@ vect_analyze_loop_2 (loop_vec_info loop_
 
   ok = vect_analyze_data_ref_dependences (loop_vinfo, &max_vf);
   if (!ok
-      || max_vf < min_vf)
+      || (max_vf != MAX_VECTORIZATION_FACTOR
+	  && may_lt (max_vf, min_vf)))
     {
       if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -1932,7 +1940,8 @@ vect_analyze_loop_2 (loop_vec_info loop_
 			 "can't determine vectorization factor.\n");
       return false;
     }
-  if (max_vf < LOOP_VINFO_VECT_FACTOR (loop_vinfo))
+  if (max_vf != MAX_VECTORIZATION_FACTOR
+      && may_lt (max_vf, LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -1943,7 +1952,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
   /* Compute the scalar iteration cost.  */
   vect_compute_single_scalar_iteration_cost (loop_vinfo);
 
-  int saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  poly_uint64 saved_vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   HOST_WIDE_INT estimated_niter;
   unsigned th;
   int min_scalar_loop_bound;
@@ -1968,21 +1977,25 @@ vect_analyze_loop_2 (loop_vec_info loop_
 start_over:
 
   /* Now the vectorization factor is final.  */
-  unsigned vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
-  gcc_assert (vectorization_factor != 0);
+  poly_uint64 vectorization_factor = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  gcc_assert (known_nonzero (vectorization_factor));
+  unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
 
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && dump_enabled_p ())
-    dump_printf_loc (MSG_NOTE, vect_location,
-		     "vectorization_factor = %d, niters = "
-		     HOST_WIDE_INT_PRINT_DEC "\n", vectorization_factor,
-		     LOOP_VINFO_INT_NITERS (loop_vinfo));
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "vectorization_factor = ");
+      dump_dec (MSG_NOTE, vectorization_factor);
+      dump_printf (MSG_NOTE, ", niters = " HOST_WIDE_INT_PRINT_DEC "\n",
+		   LOOP_VINFO_INT_NITERS (loop_vinfo));
+    }
 
   HOST_WIDE_INT max_niter
     = likely_max_stmt_executions_int (LOOP_VINFO_LOOP (loop_vinfo));
   if ((LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-       && (LOOP_VINFO_INT_NITERS (loop_vinfo) < vectorization_factor))
+       && (LOOP_VINFO_INT_NITERS (loop_vinfo) < assumed_vf))
       || (max_niter != -1
-	  && (unsigned HOST_WIDE_INT) max_niter < vectorization_factor))
+	  && (unsigned HOST_WIDE_INT) max_niter < assumed_vf))
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -2054,10 +2067,10 @@ vect_analyze_loop_2 (loop_vec_info loop_
   if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
       && LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     {
-      int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+      poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
       tree scalar_niters = LOOP_VINFO_NITERSM1 (loop_vinfo);
 
-      if (wi::to_widest (scalar_niters) < vf)
+      if (must_lt (wi::to_widest (scalar_niters), vf))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location,
@@ -2085,7 +2098,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
     }
 
   min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
-			   * vectorization_factor);
+			   * assumed_vf);
 
   /* Use the cost model only if it is more conservative than user specified
      threshold.  */
@@ -2130,26 +2143,27 @@ vect_analyze_loop_2 (loop_vec_info loop_
 
   /* Decide whether we need to create an epilogue loop to handle
      remaining scalar iterations.  */
-  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo)
-	 / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
-	* LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+  th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
 
+  unsigned HOST_WIDE_INT const_vf;
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
     {
-      if (ctz_hwi (LOOP_VINFO_INT_NITERS (loop_vinfo)
-		   - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo))
-	  < exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
+      if (!multiple_p (LOOP_VINFO_INT_NITERS (loop_vinfo)
+		       - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo),
+		       LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
 	LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
     }
   else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
-	   || (tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
-	       < (unsigned)exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
-               /* In case of versioning, check if the maximum number of
-                  iterations is greater than th.  If they are identical,
-                  the epilogue is unnecessary.  */
+	   || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf)
+	   || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
+		< (unsigned) exact_log2 (const_vf))
+	       /* In case of versioning, check if the maximum number of
+		  iterations is greater than th.  If they are identical,
+		  the epilogue is unnecessary.  */
 	       && (!LOOP_REQUIRES_VERSIONING (loop_vinfo)
-                   || (unsigned HOST_WIDE_INT) max_niter > th)))
+		   || ((unsigned HOST_WIDE_INT) max_niter
+		       > (th / const_vf) * const_vf))))
     LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
 
   /* If an epilogue loop is required make sure we can create one.  */
@@ -2198,8 +2212,8 @@ vect_analyze_loop_2 (loop_vec_info loop_
       LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo) = niters_th;
     }
 
-  gcc_assert (vectorization_factor
-	      == (unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo));
+  gcc_assert (must_eq (vectorization_factor,
+		       LOOP_VINFO_VECT_FACTOR (loop_vinfo)));
 
   /* Ok to vectorize!  */
   return true;
@@ -3271,11 +3285,11 @@ vect_get_known_peeling_cost (loop_vec_in
 			     stmt_vector_for_cost *epilogue_cost_vec)
 {
   int retval = 0;
-  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  int assumed_vf = vect_vf_for_cost (loop_vinfo);
 
   if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     {
-      *peel_iters_epilogue = vf/2;
+      *peel_iters_epilogue = assumed_vf / 2;
       if (dump_enabled_p ())
         dump_printf_loc (MSG_NOTE, vect_location,
 			 "cost model: epilogue peel iters set to vf/2 "
@@ -3293,11 +3307,11 @@ vect_get_known_peeling_cost (loop_vec_in
       int niters = LOOP_VINFO_INT_NITERS (loop_vinfo);
       peel_iters_prologue = niters < peel_iters_prologue ?
                             niters : peel_iters_prologue;
-      *peel_iters_epilogue = (niters - peel_iters_prologue) % vf;
+      *peel_iters_epilogue = (niters - peel_iters_prologue) % assumed_vf;
       /* If we need to peel for gaps, but no peeling is required, we have to
 	 peel VF iterations.  */
       if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) && !*peel_iters_epilogue)
-        *peel_iters_epilogue = vf;
+	*peel_iters_epilogue = assumed_vf;
     }
 
   stmt_info_for_cost *si;
@@ -3355,7 +3369,7 @@ vect_estimate_min_profitable_iters (loop
   unsigned vec_epilogue_cost = 0;
   int scalar_single_iter_cost = 0;
   int scalar_outside_cost = 0;
-  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  int assumed_vf = vect_vf_for_cost (loop_vinfo);
   int npeel = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
   void *target_cost_data = LOOP_VINFO_TARGET_COST_DATA (loop_vinfo);
 
@@ -3434,13 +3448,13 @@ vect_estimate_min_profitable_iters (loop
 
   if (npeel  < 0)
     {
-      peel_iters_prologue = vf/2;
+      peel_iters_prologue = assumed_vf / 2;
       dump_printf (MSG_NOTE, "cost model: "
                    "prologue peel iters set to vf/2.\n");
 
       /* If peeling for alignment is unknown, loop bound of main loop becomes
          unknown.  */
-      peel_iters_epilogue = vf/2;
+      peel_iters_epilogue = assumed_vf / 2;
       dump_printf (MSG_NOTE, "cost model: "
                    "epilogue peel iters set to vf/2 because "
                    "peeling for alignment is unknown.\n");
@@ -3619,22 +3633,24 @@ vect_estimate_min_profitable_iters (loop
      PL_ITERS = prologue iterations, EP_ITERS= epilogue iterations
      SOC = scalar outside cost for run time cost model check.  */
 
-  if ((scalar_single_iter_cost * vf) > (int) vec_inside_cost)
+  if ((scalar_single_iter_cost * assumed_vf) > (int) vec_inside_cost)
     {
       if (vec_outside_cost <= 0)
         min_profitable_iters = 0;
       else
         {
-          min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * vf
+	  min_profitable_iters = ((vec_outside_cost - scalar_outside_cost)
+				  * assumed_vf
 				  - vec_inside_cost * peel_iters_prologue
-                                  - vec_inside_cost * peel_iters_epilogue)
-                                 / ((scalar_single_iter_cost * vf)
-                                    - vec_inside_cost);
-
-          if ((scalar_single_iter_cost * vf * min_profitable_iters)
-              <= (((int) vec_inside_cost * min_profitable_iters)
-                  + (((int) vec_outside_cost - scalar_outside_cost) * vf)))
-            min_profitable_iters++;
+				  - vec_inside_cost * peel_iters_epilogue)
+				 / ((scalar_single_iter_cost * assumed_vf)
+				    - vec_inside_cost);
+
+	  if ((scalar_single_iter_cost * assumed_vf * min_profitable_iters)
+	      <= (((int) vec_inside_cost * min_profitable_iters)
+		  + (((int) vec_outside_cost - scalar_outside_cost)
+		     * assumed_vf)))
+	    min_profitable_iters++;
         }
     }
   /* vector version will never be profitable.  */
@@ -3650,7 +3666,7 @@ vect_estimate_min_profitable_iters (loop
 			 "divided by the scalar iteration cost = %d "
 			 "is greater or equal to the vectorization factor = %d"
                          ".\n",
-			 vec_inside_cost, scalar_single_iter_cost, vf);
+			 vec_inside_cost, scalar_single_iter_cost, assumed_vf);
       *ret_min_profitable_niters = -1;
       *ret_min_profitable_estimate = -1;
       return;
@@ -3661,8 +3677,8 @@ vect_estimate_min_profitable_iters (loop
 	       min_profitable_iters);
 
   /* We want the vectorized loop to execute at least once.  */
-  if (min_profitable_iters < (vf + peel_iters_prologue))
-    min_profitable_iters = vf + peel_iters_prologue;
+  if (min_profitable_iters < (assumed_vf + peel_iters_prologue))
+    min_profitable_iters = assumed_vf + peel_iters_prologue;
 
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -3682,10 +3698,11 @@ vect_estimate_min_profitable_iters (loop
     min_profitable_estimate = 0;
   else
     {
-      min_profitable_estimate = ((vec_outside_cost + scalar_outside_cost) * vf
+      min_profitable_estimate = ((vec_outside_cost + scalar_outside_cost)
+				 * assumed_vf
 				 - vec_inside_cost * peel_iters_prologue
 				 - vec_inside_cost * peel_iters_epilogue)
-				 / ((scalar_single_iter_cost * vf)
+				 / ((scalar_single_iter_cost * assumed_vf)
 				   - vec_inside_cost);
     }
   min_profitable_estimate = MAX (min_profitable_estimate, min_profitable_iters);
@@ -5702,9 +5719,10 @@ vectorizable_reduction (gimple *stmt, gi
 
       if (slp_node)
 	/* The size vect_schedule_slp_instance computes is off for us.  */
-	vec_num = ((LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-		    * SLP_TREE_SCALAR_STMTS (slp_node).length ())
-		   / TYPE_VECTOR_SUBPARTS (vectype_in));
+	vec_num = vect_get_num_vectors
+	  (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+	   * SLP_TREE_SCALAR_STMTS (slp_node).length (),
+	   vectype_in);
       else
 	vec_num = 1;
 
@@ -6469,7 +6487,7 @@ vectorizable_reduction (gimple *stmt, gi
    For a loop where we could vectorize the operation indicated by CODE,
    return the minimum vectorization factor that makes it worthwhile
    to use generic vectors.  */
-int
+static unsigned int
 vect_min_worthwhile_factor (enum tree_code code)
 {
   switch (code)
@@ -6498,9 +6516,10 @@ vect_min_worthwhile_factor (enum tree_co
 vect_worthwhile_without_simd_p (vec_info *vinfo, tree_code code)
 {
   loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo);
+  unsigned HOST_WIDE_INT value;
   return (loop_vinfo
-	  && (LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-	      >= vect_min_worthwhile_factor (code)));
+	  && LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&value)
+	  && value >= vect_min_worthwhile_factor (code));
 }
 
 /* Function vectorizable_induction
@@ -6530,7 +6549,7 @@ vectorizable_induction (gimple *phi,
   gphi *induction_phi;
   tree induc_def, vec_dest;
   tree init_expr, step_expr;
-  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   unsigned i;
   tree expr;
   gimple_seq stmts;
@@ -7275,7 +7294,8 @@ vect_transform_loop (loop_vec_info loop_
   tree niters_vector = NULL_TREE;
   tree step_vector = NULL_TREE;
   tree niters_vector_mult_vf = NULL_TREE;
-  int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  poly_uint64 vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  unsigned int lowest_vf = constant_lower_bound (vf);
   bool grouped_store;
   bool slp_scheduled = false;
   gimple *stmt, *pattern_stmt;
@@ -7283,7 +7303,7 @@ vect_transform_loop (loop_vec_info loop_
   gimple_stmt_iterator pattern_def_si = gsi_none ();
   bool transform_pattern_stmt = false;
   bool check_profitability = false;
-  int th;
+  unsigned int th;
 
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location, "=== vec_transform_loop ===\n");
@@ -7291,10 +7311,10 @@ vect_transform_loop (loop_vec_info loop_
   /* Use the more conservative vectorization threshold.  If the number
      of iterations is constant assume the cost check has been performed
      by our caller.  If the threshold makes all loops profitable that
-     run at least the vectorization factor number of times checking
-     is pointless, too.  */
+     run at least the (estimated) vectorization factor number of times
+     checking is pointless, too.  */
   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
-  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+  if (th >= vect_vf_for_cost (loop_vinfo)
       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     {
       if (dump_enabled_p ())
@@ -7358,11 +7378,11 @@ vect_transform_loop (loop_vec_info loop_
 			      check_profitability, niters_no_overflow);
   if (niters_vector == NULL_TREE)
     {
-      if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
+      if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) && must_eq (lowest_vf, vf))
 	{
 	  niters_vector
 	    = build_int_cst (TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo)),
-			     LOOP_VINFO_INT_NITERS (loop_vinfo) / vf);
+			     LOOP_VINFO_INT_NITERS (loop_vinfo) / lowest_vf);
 	  step_vector = build_one_cst (TREE_TYPE (niters));
 	}
       else
@@ -7409,8 +7429,8 @@ vect_transform_loop (loop_vec_info loop_
 	    continue;
 
 	  if (STMT_VINFO_VECTYPE (stmt_info)
-	      && (TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info))
-		  != (unsigned HOST_WIDE_INT) vf)
+	      && may_ne (TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info)),
+			 vf)
 	      && dump_enabled_p ())
 	    dump_printf_loc (MSG_NOTE, vect_location, "multiple-types.\n");
 
@@ -7546,7 +7566,7 @@ vect_transform_loop (loop_vec_info loop_
 		= (unsigned int)
 		  TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
 	      if (!STMT_SLP_TYPE (stmt_info)
-		  && nunits != (unsigned int) vf
+		  && may_ne (nunits, vf)
 		  && dump_enabled_p ())
 		  /* For SLP VF is set according to unrolling factor, and not
 		     to vector size, hence for SLP this print is not valid.  */
@@ -7626,7 +7646,8 @@ vect_transform_loop (loop_vec_info loop_
 				   niters_vector_mult_vf,
 				   !niters_no_overflow);
 
-  scale_profile_for_vect_loop (loop, vf);
+  unsigned int assumed_vf = vect_vf_for_cost (loop_vinfo);
+  scale_profile_for_vect_loop (loop, assumed_vf);
 
   /* The minimum number of iterations performed by the epilogue.  This
      is 1 when peeling for gaps because we always need a final scalar
@@ -7640,13 +7661,16 @@ vect_transform_loop (loop_vec_info loop_
      back to latch counts.  */
   if (loop->any_upper_bound)
     loop->nb_iterations_upper_bound
-      = wi::udiv_floor (loop->nb_iterations_upper_bound + bias, vf) - 1;
+      = wi::udiv_floor (loop->nb_iterations_upper_bound + bias,
+			lowest_vf) - 1;
   if (loop->any_likely_upper_bound)
     loop->nb_iterations_likely_upper_bound
-      = wi::udiv_floor (loop->nb_iterations_likely_upper_bound + bias, vf) - 1;
+      = wi::udiv_floor (loop->nb_iterations_likely_upper_bound + bias,
+			lowest_vf) - 1;
   if (loop->any_estimate)
     loop->nb_iterations_estimate
-      = wi::udiv_floor (loop->nb_iterations_estimate + bias, vf) - 1;
+      = wi::udiv_floor (loop->nb_iterations_estimate + bias,
+			assumed_vf) - 1;
 
   if (dump_enabled_p ())
     {
@@ -7690,17 +7714,18 @@ vect_transform_loop (loop_vec_info loop_
 	else if (!vector_sizes)
 	  epilogue = NULL;
 	else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-		 && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
+		 && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0
+		 && must_eq (vf, lowest_vf))
 	  {
 	    int smallest_vec_size = 1 << ctz_hwi (vector_sizes);
 	    int ratio = current_vector_size / smallest_vec_size;
-	    int eiters = LOOP_VINFO_INT_NITERS (loop_vinfo)
+	    unsigned HOST_WIDE_INT eiters = LOOP_VINFO_INT_NITERS (loop_vinfo)
 	      - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
-	    eiters = eiters % vf;
+	    eiters = eiters % lowest_vf;
 
 	    epilogue->nb_iterations_upper_bound = eiters - 1;
 
-	    if (eiters < vf / ratio)
+	    if (eiters < lowest_vf / ratio)
 	      epilogue = NULL;
 	    }
     }
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-10-09 11:50:52.542711115 +0100
+++ gcc/tree-vect-slp.c	2017-10-23 17:22:26.573499378 +0100
@@ -1501,14 +1501,14 @@ vect_attempt_slp_rearrange_stmts (slp_in
 			    node->load_permutation);
 
   /* We are done, no actual permutations need to be generated.  */
-  unsigned int unrolling_factor = SLP_INSTANCE_UNROLLING_FACTOR (slp_instn);
+  poly_uint64 unrolling_factor = SLP_INSTANCE_UNROLLING_FACTOR (slp_instn);
   FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node)
     {
       gimple *first_stmt = SLP_TREE_SCALAR_STMTS (node)[0];
       first_stmt = GROUP_FIRST_ELEMENT (vinfo_for_stmt (first_stmt));
       /* But we have to keep those permutations that are required because
          of handling of gaps.  */
-      if (unrolling_factor == 1
+      if (must_eq (unrolling_factor, 1U)
 	  || (group_size == GROUP_SIZE (vinfo_for_stmt (first_stmt))
 	      && GROUP_GAP (vinfo_for_stmt (first_stmt)) == 0))
 	SLP_TREE_LOAD_PERMUTATION (node).release ();
@@ -1635,10 +1635,10 @@ vect_supported_load_permutation_p (slp_i
      and the vectorization factor is not yet final.
      ???  The SLP instance unrolling factor might not be the maximum one.  */
   unsigned n_perms;
-  unsigned test_vf
-    = least_common_multiple (SLP_INSTANCE_UNROLLING_FACTOR (slp_instn),
+  poly_uint64 test_vf
+    = force_common_multiple (SLP_INSTANCE_UNROLLING_FACTOR (slp_instn),
 			     LOOP_VINFO_VECT_FACTOR
-			       (STMT_VINFO_LOOP_VINFO (vinfo_for_stmt (stmt))));
+			     (STMT_VINFO_LOOP_VINFO (vinfo_for_stmt (stmt))));
   FOR_EACH_VEC_ELT (SLP_INSTANCE_LOADS (slp_instn), i, node)
     if (node->load_permutation.exists ()
 	&& !vect_transform_slp_perm_load (node, vNULL, NULL, test_vf,
@@ -1743,7 +1743,8 @@ vect_analyze_slp_cost_1 (slp_instance in
 	      gcc_assert (ncopies_for_cost
 			  <= (GROUP_SIZE (stmt_info) - GROUP_GAP (stmt_info)
 			      + nunits - 1) / nunits);
-	      ncopies_for_cost *= SLP_INSTANCE_UNROLLING_FACTOR (instance);
+	      poly_uint64 uf = SLP_INSTANCE_UNROLLING_FACTOR (instance);
+	      ncopies_for_cost *= estimated_poly_value (uf);
 	    }
 	  /* Record the cost for the vector loads.  */
 	  vect_model_load_cost (stmt_info, ncopies_for_cost,
@@ -1847,10 +1848,13 @@ vect_analyze_slp_cost (slp_instance inst
   unsigned group_size = SLP_INSTANCE_GROUP_SIZE (instance);
   slp_tree node = SLP_INSTANCE_TREE (instance);
   stmt_vec_info stmt_info = vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (node)[0]);
-  /* Adjust the group_size by the vectorization factor which is always one
-     for basic-block vectorization.  */
+  /* Get the estimated vectorization factor, which is always one for
+     basic-block vectorization.  */
+  unsigned int assumed_vf;
   if (STMT_VINFO_LOOP_VINFO (stmt_info))
-    group_size *= LOOP_VINFO_VECT_FACTOR (STMT_VINFO_LOOP_VINFO (stmt_info));
+    assumed_vf = vect_vf_for_cost (STMT_VINFO_LOOP_VINFO (stmt_info));
+  else
+    assumed_vf = 1;
   unsigned nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
   /* For reductions look at a reduction operand in case the reduction
      operation is widening like DOT_PROD or SAD.  */
@@ -1867,7 +1871,8 @@ vect_analyze_slp_cost (slp_instance inst
 	default:;
 	}
     }
-  ncopies_for_cost = least_common_multiple (nunits, group_size) / nunits;
+  ncopies_for_cost = least_common_multiple (nunits,
+					    group_size * assumed_vf) / nunits;
 
   prologue_cost_vec.create (10);
   body_cost_vec.create (10);
@@ -1957,7 +1962,7 @@ vect_analyze_slp_instance (vec_info *vin
   slp_instance new_instance;
   slp_tree node;
   unsigned int group_size = GROUP_SIZE (vinfo_for_stmt (stmt));
-  unsigned int unrolling_factor = 1, nunits;
+  unsigned int nunits;
   tree vectype, scalar_type = NULL_TREE;
   gimple *next;
   unsigned int i;
@@ -2045,10 +2050,10 @@ vect_analyze_slp_instance (vec_info *vin
   if (node != NULL)
     {
       /* Calculate the unrolling factor based on the smallest type.  */
-      unrolling_factor
+      poly_uint64 unrolling_factor
 	= least_common_multiple (max_nunits, group_size) / group_size;
 
-      if (unrolling_factor != 1
+      if (may_ne (unrolling_factor, 1U)
 	  && is_a <bb_vec_info> (vinfo))
 	{
 
@@ -2101,7 +2106,7 @@ vect_analyze_slp_instance (vec_info *vin
 	      /* The load requires permutation when unrolling exposes
 	         a gap either because the group is larger than the SLP
 		 group-size or because there is a gap between the groups.  */
-	      && (unrolling_factor == 1
+	      && (must_eq (unrolling_factor, 1U)
 		  || (group_size == GROUP_SIZE (vinfo_for_stmt (first_stmt))
 		      && GROUP_GAP (vinfo_for_stmt (first_stmt)) == 0)))
 	    {
@@ -2276,7 +2281,8 @@ vect_analyze_slp (vec_info *vinfo, unsig
 bool
 vect_make_slp_decision (loop_vec_info loop_vinfo)
 {
-  unsigned int i, unrolling_factor = 1;
+  unsigned int i;
+  poly_uint64 unrolling_factor = 1;
   vec<slp_instance> slp_instances = LOOP_VINFO_SLP_INSTANCES (loop_vinfo);
   slp_instance instance;
   int decided_to_slp = 0;
@@ -2288,8 +2294,11 @@ vect_make_slp_decision (loop_vec_info lo
   FOR_EACH_VEC_ELT (slp_instances, i, instance)
     {
       /* FORNOW: SLP if you can.  */
-      if (unrolling_factor < SLP_INSTANCE_UNROLLING_FACTOR (instance))
-	unrolling_factor = SLP_INSTANCE_UNROLLING_FACTOR (instance);
+      /* All unroll factors have the form current_vector_size * X for some
+	 rational X, so they must have a common multiple.  */
+      unrolling_factor
+	= force_common_multiple (unrolling_factor,
+				 SLP_INSTANCE_UNROLLING_FACTOR (instance));
 
       /* Mark all the stmts that belong to INSTANCE as PURE_SLP stmts.  Later we
 	 call vect_detect_hybrid_slp () to find stmts that need hybrid SLP and
@@ -2301,9 +2310,13 @@ vect_make_slp_decision (loop_vec_info lo
   LOOP_VINFO_SLP_UNROLLING_FACTOR (loop_vinfo) = unrolling_factor;
 
   if (decided_to_slp && dump_enabled_p ())
-    dump_printf_loc (MSG_NOTE, vect_location,
-		     "Decided to SLP %d instances. Unrolling factor %d\n",
-		     decided_to_slp, unrolling_factor);
+    {
+      dump_printf_loc (MSG_NOTE, vect_location,
+		       "Decided to SLP %d instances. Unrolling factor ",
+		       decided_to_slp);
+      dump_dec (MSG_NOTE, unrolling_factor);
+      dump_printf (MSG_NOTE, "\n");
+    }
 
   return (decided_to_slp > 0);
 }
@@ -2613,7 +2626,7 @@ vect_slp_analyze_node_operations (vec_in
       = SLP_TREE_NUMBER_OF_VEC_STMTS (SLP_TREE_CHILDREN (node)[0]);
   else
     {
-      int vf;
+      poly_uint64 vf;
       if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
 	vf = loop_vinfo->vectorization_factor;
       else
@@ -2621,7 +2634,7 @@ vect_slp_analyze_node_operations (vec_in
       unsigned int group_size = SLP_INSTANCE_GROUP_SIZE (node_instance);
       tree vectype = STMT_VINFO_VECTYPE (stmt_info);
       SLP_TREE_NUMBER_OF_VEC_STMTS (node)
-	= vf * group_size / TYPE_VECTOR_SUBPARTS (vectype);
+	= vect_get_num_vectors (vf * group_size, vectype);
     }
 
   /* Push SLP node def-type to stmt operands.  */
@@ -2827,7 +2840,7 @@ vect_slp_analyze_bb_1 (gimple_stmt_itera
   bb_vec_info bb_vinfo;
   slp_instance instance;
   int i;
-  int min_vf = 2;
+  poly_uint64 min_vf = 2;
 
   /* The first group of checks is independent of the vector size.  */
   fatal = true;
@@ -3529,8 +3542,8 @@ vect_get_slp_defs (vec<tree> ops, slp_tr
 
 bool
 vect_transform_slp_perm_load (slp_tree node, vec<tree> dr_chain,
-                              gimple_stmt_iterator *gsi, int vf,
-                              slp_instance slp_node_instance, bool analyze_only,
+			      gimple_stmt_iterator *gsi, poly_uint64 vf,
+			      slp_instance slp_node_instance, bool analyze_only,
 			      unsigned *n_perms)
 {
   gimple *stmt = SLP_TREE_SCALAR_STMTS (node)[0];
@@ -3541,6 +3554,7 @@ vect_transform_slp_perm_load (slp_tree n
   int group_size = SLP_INSTANCE_GROUP_SIZE (slp_node_instance);
   int mask_element;
   machine_mode mode;
+  unsigned HOST_WIDE_INT const_vf;
 
   if (!STMT_VINFO_GROUPED_ACCESS (stmt_info))
     return false;
@@ -3549,6 +3563,11 @@ vect_transform_slp_perm_load (slp_tree n
 
   mode = TYPE_MODE (vectype);
 
+  /* At the moment, all permutations are represented using per-element
+     indices, so we can't cope with variable vectorization factors.  */
+  if (!vf.is_constant (&const_vf))
+    return false;
+
   /* The generic VEC_PERM_EXPR code always uses an integral type of the
      same size as the vector element being permuted.  */
   mask_element_type = lang_hooks.types.type_for_mode
@@ -3590,7 +3609,7 @@ vect_transform_slp_perm_load (slp_tree n
   bool noop_p = true;
   *n_perms = 0;
 
-  for (int j = 0; j < vf; j++)
+  for (unsigned int j = 0; j < const_vf; j++)
     {
       for (int k = 0; k < group_size; k++)
 	{
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-10-23 11:41:25.830146497 +0100
+++ gcc/tree-vect-stmts.c	2017-10-23 17:22:26.574499579 +0100
@@ -3347,6 +3347,16 @@ vectorizable_simd_clone_call (gimple *st
       arginfo.quick_push (thisarginfo);
     }
 
+  unsigned HOST_WIDE_INT vf;
+  if (!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&vf))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "not considering SIMD clones; not yet supported"
+			 " for variable-width vectors.\n");
+      return NULL;
+    }
+
   unsigned int badness = 0;
   struct cgraph_node *bestn = NULL;
   if (STMT_VINFO_SIMD_CLONE_INFO (stmt_info).exists ())
@@ -3356,13 +3366,11 @@ vectorizable_simd_clone_call (gimple *st
 	 n = n->simdclone->next_clone)
       {
 	unsigned int this_badness = 0;
-	if (n->simdclone->simdlen
-	    > (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo)
+	if (n->simdclone->simdlen > vf
 	    || n->simdclone->nargs != nargs)
 	  continue;
-	if (n->simdclone->simdlen
-	    < (unsigned) LOOP_VINFO_VECT_FACTOR (loop_vinfo))
-	  this_badness += (exact_log2 (LOOP_VINFO_VECT_FACTOR (loop_vinfo))
+	if (n->simdclone->simdlen < vf)
+	  this_badness += (exact_log2 (vf)
 			   - exact_log2 (n->simdclone->simdlen)) * 1024;
 	if (n->simdclone->inbranch)
 	  this_badness += 2048;
@@ -3451,7 +3459,7 @@ vectorizable_simd_clone_call (gimple *st
 
   fndecl = bestn->decl;
   nunits = bestn->simdclone->simdlen;
-  ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits;
+  ncopies = vf / nunits;
 
   /* If the function isn't const, only allow it in simd loops where user
      has asserted that at least nunits consecutive iterations can be
@@ -5661,7 +5669,7 @@ vectorizable_store (gimple *stmt, gimple
   gather_scatter_info gs_info;
   enum vect_def_type scatter_src_dt = vect_unknown_def_type;
   gimple *new_stmt;
-  int vf;
+  poly_uint64 vf;
   vec_load_store_type vls_type;
   tree ref_type;
 
@@ -6636,7 +6644,8 @@ vectorizable_load (gimple *stmt, gimple_
   tree dataref_offset = NULL_TREE;
   gimple *ptr_incr = NULL;
   int ncopies;
-  int i, j, group_size, group_gap_adj;
+  int i, j, group_size;
+  poly_int64 group_gap_adj;
   tree msq = NULL_TREE, lsq;
   tree offset = NULL_TREE;
   tree byte_offset = NULL_TREE;
@@ -6654,7 +6663,7 @@ vectorizable_load (gimple *stmt, gimple_
   bool slp_perm = false;
   enum tree_code code;
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
-  int vf;
+  poly_uint64 vf;
   tree aggr_type;
   gather_scatter_info gs_info;
   vec_info *vinfo = stmt_info->vinfo;
@@ -6724,8 +6733,8 @@ vectorizable_load (gimple *stmt, gimple_
      on the unrolled body effectively re-orders stmts.  */
   if (ncopies > 1
       && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
-      && ((unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-	  > STMT_VINFO_MIN_NEG_DIST (stmt_info)))
+      && may_gt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+		 STMT_VINFO_MIN_NEG_DIST (stmt_info)))
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -6765,8 +6774,8 @@ vectorizable_load (gimple *stmt, gimple_
 	 on the unrolled body effectively re-orders stmts.  */
       if (!PURE_SLP_STMT (stmt_info)
 	  && STMT_VINFO_MIN_NEG_DIST (stmt_info) != 0
-	  && ((unsigned)LOOP_VINFO_VECT_FACTOR (loop_vinfo)
-	      > STMT_VINFO_MIN_NEG_DIST (stmt_info)))
+	  && may_gt (LOOP_VINFO_VECT_FACTOR (loop_vinfo),
+		     STMT_VINFO_MIN_NEG_DIST (stmt_info)))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -7125,7 +7134,10 @@ vectorizable_load (gimple *stmt, gimple_
 	     fits in.  */
 	  if (slp_perm)
 	    {
-	      ncopies = (group_size * vf + nunits - 1) / nunits;
+	      /* We don't yet generate SLP_TREE_LOAD_PERMUTATIONs for
+		 variable VF.  */
+	      unsigned int const_vf = vf.to_constant ();
+	      ncopies = (group_size * const_vf + nunits - 1) / nunits;
 	      dr_chain.create (ncopies);
 	    }
 	  else
@@ -7243,7 +7255,10 @@ vectorizable_load (gimple *stmt, gimple_
 	     fits in.  */
 	  if (slp_perm)
 	    {
-	      vec_num = (group_size * vf + nunits - 1) / nunits;
+	      /* We don't yet generate SLP_TREE_LOAD_PERMUTATIONs for
+		 variable VF.  */
+	      unsigned int const_vf = vf.to_constant ();
+	      vec_num = (group_size * const_vf + nunits - 1) / nunits;
 	      group_gap_adj = vf * group_size - nunits * vec_num;
 	    }
 	  else
@@ -7709,11 +7724,13 @@ vectorizable_load (gimple *stmt, gimple_
 	         we need to skip the gaps after we manage to fully load
 		 all elements.  group_gap_adj is GROUP_SIZE here.  */
 	      group_elt += nunits;
-	      if (group_gap_adj != 0 && ! slp_perm
-		  && group_elt == group_size - group_gap_adj)
+	      if (maybe_nonzero (group_gap_adj)
+		  && !slp_perm
+		  && must_eq (group_elt, group_size - group_gap_adj))
 		{
-		  wide_int bump_val = (wi::to_wide (TYPE_SIZE_UNIT (elem_type))
-				       * group_gap_adj);
+		  poly_wide_int bump_val
+		    = (wi::to_wide (TYPE_SIZE_UNIT (elem_type))
+		       * group_gap_adj);
 		  tree bump = wide_int_to_tree (sizetype, bump_val);
 		  dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
 						 stmt, bump);
@@ -7722,10 +7739,11 @@ vectorizable_load (gimple *stmt, gimple_
 	    }
 	  /* Bump the vector pointer to account for a gap or for excess
 	     elements loaded for a permuted SLP load.  */
-	  if (group_gap_adj != 0 && slp_perm)
+	  if (maybe_nonzero (group_gap_adj) && slp_perm)
 	    {
-	      wide_int bump_val = (wi::to_wide (TYPE_SIZE_UNIT (elem_type))
-				   * group_gap_adj);
+	      poly_wide_int bump_val
+		= (wi::to_wide (TYPE_SIZE_UNIT (elem_type))
+		   * group_gap_adj);
 	      tree bump = wide_int_to_tree (sizetype, bump_val);
 	      dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, gsi,
 					     stmt, bump);
Index: gcc/tree-vectorizer.c
===================================================================
--- gcc/tree-vectorizer.c	2017-08-10 14:36:06.363532454 +0100
+++ gcc/tree-vectorizer.c	2017-10-23 17:22:26.574499579 +0100
@@ -91,7 +91,7 @@ Software Foundation; either version 3, o
 struct simduid_to_vf : free_ptr_hash<simduid_to_vf>
 {
   unsigned int simduid;
-  int vf;
+  poly_uint64 vf;
 
   /* hash_table support.  */
   static inline hashval_t hash (const simduid_to_vf *);
@@ -161,7 +161,7 @@ adjust_simduid_builtins (hash_table<simd
 
       for (i = gsi_start_bb (bb); !gsi_end_p (i); )
 	{
-	  unsigned int vf = 1;
+	  poly_uint64 vf = 1;
 	  enum internal_fn ifn;
 	  gimple *stmt = gsi_stmt (i);
 	  tree t;
@@ -338,7 +338,7 @@ note_simd_array_uses (hash_table<simd_ar
     if ((*iter)->simduid != -1U)
       {
 	tree decl = (*iter)->decl;
-	int vf = 1;
+	poly_uint64 vf = 1;
 	if (simduid_to_vf_htab)
 	  {
 	    simduid_to_vf *p = NULL, data;
Index: gcc/testsuite/gcc.dg/vect-opt-info-1.c
===================================================================
--- /dev/null	2017-10-21 08:51:42.385141415 +0100
+++ gcc/testsuite/gcc.dg/vect-opt-info-1.c	2017-10-23 17:22:26.571498977 +0100
@@ -0,0 +1,11 @@
+/* { dg-options "-std=c99 -fopt-info -O3" } */
+
+void
+vadd (int *dst, int *op1, int *op2, int count)
+{
+  for (int i = 0; i < count; ++i)
+    dst[i] = op1[i] + op2[i];
+}
+
+/* { dg-message "loop vectorized" "" { target *-*-* } 6 } */
+/* { dg-message "loop versioned for vectorization because of possible aliasing" "" { target *-*-* } 6 } */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [062/nnn] poly_int: prune_runtime_alias_test_list
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (61 preceding siblings ...)
  2017-10-23 17:26 ` [063/nnn] poly_int: vectoriser vf and uf Richard Sandiford
@ 2017-10-23 17:26 ` Richard Sandiford
  2017-12-05 17:33   ` Jeff Law
  2017-10-23 17:27 ` [066/nnn] poly_int: omp_max_vf Richard Sandiford
                   ` (44 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:26 UTC (permalink / raw)
  To: gcc-patches

This patch makes prune_runtime_alias_test_list take the iteration
factor as a poly_int and tracks polynomial offsets internally
as well.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-data-ref.h (prune_runtime_alias_test_list): Take the
	factor as a poly_uint64 rather than an unsigned HOST_WIDE_INT.
	* tree-data-ref.c (prune_runtime_alias_test_list): Likewise.
	Track polynomial offsets.

Index: gcc/tree-data-ref.h
===================================================================
--- gcc/tree-data-ref.h	2017-10-13 10:23:39.775145588 +0100
+++ gcc/tree-data-ref.h	2017-10-23 17:22:25.492282436 +0100
@@ -472,7 +472,7 @@ extern bool dr_equal_offsets_p (struct d
 extern bool runtime_alias_check_p (ddr_p, struct loop *, bool);
 extern int data_ref_compare_tree (tree, tree);
 extern void prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *,
-					   unsigned HOST_WIDE_INT);
+					   poly_uint64);
 extern void create_runtime_alias_checks (struct loop *,
 					 vec<dr_with_seg_len_pair_t> *, tree*);
 /* Return true when the base objects of data references A and B are
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2017-10-23 17:22:18.231825655 +0100
+++ gcc/tree-data-ref.c	2017-10-23 17:22:25.492282436 +0100
@@ -1417,7 +1417,7 @@ comp_dr_with_seg_len_pair (const void *p
 
 void
 prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *alias_pairs,
-			       unsigned HOST_WIDE_INT factor)
+			       poly_uint64 factor)
 {
   /* Sort the collected data ref pairs so that we can scan them once to
      combine all possible aliasing checks.  */
@@ -1462,51 +1462,63 @@ prune_runtime_alias_test_list (vec<dr_wi
 	      std::swap (dr_a2, dr_b2);
 	    }
 
+	  poly_int64 init_a1, init_a2;
 	  if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1->dr),
 				DR_BASE_ADDRESS (dr_a2->dr), 0)
 	      || !operand_equal_p (DR_OFFSET (dr_a1->dr),
 				   DR_OFFSET (dr_a2->dr), 0)
-	      || !tree_fits_shwi_p (DR_INIT (dr_a1->dr))
-	      || !tree_fits_shwi_p (DR_INIT (dr_a2->dr)))
+	      || !poly_int_tree_p (DR_INIT (dr_a1->dr), &init_a1)
+	      || !poly_int_tree_p (DR_INIT (dr_a2->dr), &init_a2))
 	    continue;
 
+	  /* Don't combine if we can't tell which one comes first.  */
+	  if (!ordered_p (init_a1, init_a2))
+	    continue;
+
+	  /* Make sure dr_a1 starts left of dr_a2.  */
+	  if (may_gt (init_a1, init_a2))
+	    {
+	      std::swap (*dr_a1, *dr_a2);
+	      std::swap (init_a1, init_a2);
+	    }
+
 	  /* Only merge const step data references.  */
-	  if (TREE_CODE (DR_STEP (dr_a1->dr)) != INTEGER_CST
-	      || TREE_CODE (DR_STEP (dr_a2->dr)) != INTEGER_CST)
+	  poly_int64 step_a1, step_a2;
+	  if (!poly_int_tree_p (DR_STEP (dr_a1->dr), &step_a1)
+	      || !poly_int_tree_p (DR_STEP (dr_a2->dr), &step_a2))
 	    continue;
 
-	  /* DR_A1 and DR_A2 must goes in the same direction.  */
-	  if (tree_int_cst_compare (DR_STEP (dr_a1->dr), size_zero_node)
-	      != tree_int_cst_compare (DR_STEP (dr_a2->dr), size_zero_node))
+	  bool neg_step = may_lt (step_a1, 0) || may_lt (step_a2, 0);
+
+	  /* DR_A1 and DR_A2 must go in the same direction.  */
+	  if (neg_step && (may_gt (step_a1, 0) || may_gt (step_a2, 0)))
 	    continue;
 
-	  bool neg_step
-	    = (tree_int_cst_compare (DR_STEP (dr_a1->dr), size_zero_node) < 0);
+	  poly_uint64 seg_len_a1 = 0, seg_len_a2 = 0;
+	  bool const_seg_len_a1 = poly_int_tree_p (dr_a1->seg_len,
+						   &seg_len_a1);
+	  bool const_seg_len_a2 = poly_int_tree_p (dr_a2->seg_len,
+						   &seg_len_a2);
 
 	  /* We need to compute merged segment length at compilation time for
 	     dr_a1 and dr_a2, which is impossible if either one has non-const
 	     segment length.  */
-	  if ((!tree_fits_uhwi_p (dr_a1->seg_len)
-	       || !tree_fits_uhwi_p (dr_a2->seg_len))
-	      && tree_int_cst_compare (DR_STEP (dr_a1->dr),
-				       DR_STEP (dr_a2->dr)) != 0)
+	  if ((!const_seg_len_a1 || !const_seg_len_a2)
+	      && may_ne (step_a1, step_a2))
 	    continue;
 
-	  /* Make sure dr_a1 starts left of dr_a2.  */
-	  if (tree_int_cst_lt (DR_INIT (dr_a2->dr), DR_INIT (dr_a1->dr)))
-	    std::swap (*dr_a1, *dr_a2);
-
 	  bool do_remove = false;
-	  wide_int diff = (wi::to_wide (DR_INIT (dr_a2->dr))
-			   - wi::to_wide (DR_INIT (dr_a1->dr)));
-	  wide_int min_seg_len_b;
+	  poly_uint64 diff = init_a2 - init_a1;
+	  poly_uint64 min_seg_len_b;
 	  tree new_seg_len;
 
-	  if (TREE_CODE (dr_b1->seg_len) == INTEGER_CST)
-	    min_seg_len_b = wi::abs (wi::to_wide (dr_b1->seg_len));
-	  else
-	    min_seg_len_b
-	      = factor * wi::abs (wi::to_wide (DR_STEP (dr_b1->dr)));
+	  if (!poly_int_tree_p (dr_b1->seg_len, &min_seg_len_b))
+	    {
+	      tree step_b = DR_STEP (dr_b1->dr);
+	      if (!tree_fits_shwi_p (step_b))
+		continue;
+	      min_seg_len_b = factor * abs_hwi (tree_to_shwi (step_b));
+	    }
 
 	  /* Now we try to merge alias check dr_a1 & dr_b and dr_a2 & dr_b.
 
@@ -1543,26 +1555,24 @@ prune_runtime_alias_test_list (vec<dr_wi
 	  if (neg_step)
 	    {
 	      /* Adjust diff according to access size of both references.  */
-	      tree size_a1 = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a1->dr)));
-	      tree size_a2 = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a2->dr)));
-	      diff += wi::to_wide (size_a2) - wi::to_wide (size_a1);
+	      diff += tree_to_poly_uint64
+		(TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a2->dr))));
+	      diff -= tree_to_poly_uint64
+		(TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr_a1->dr))));
 	      /* Case A.1.  */
-	      if (wi::leu_p (diff, min_seg_len_b)
+	      if (must_le (diff, min_seg_len_b)
 		  /* Case A.2 and B combined.  */
-		  || (tree_fits_uhwi_p (dr_a2->seg_len)))
+		  || const_seg_len_a2)
 		{
-		  if (tree_fits_uhwi_p (dr_a1->seg_len)
-		      && tree_fits_uhwi_p (dr_a2->seg_len))
-		    {
-		      wide_int min_len
-			= wi::umin (wi::to_wide (dr_a1->seg_len) - diff,
-				    wi::to_wide (dr_a2->seg_len));
-		      new_seg_len = wide_int_to_tree (sizetype, min_len);
-		    }
+		  if (const_seg_len_a1 || const_seg_len_a2)
+		    new_seg_len
+		      = build_int_cstu (sizetype,
+					lower_bound (seg_len_a1 - diff,
+						     seg_len_a2));
 		  else
 		    new_seg_len
 		      = size_binop (MINUS_EXPR, dr_a2->seg_len,
-				    wide_int_to_tree (sizetype, diff));
+				    build_int_cstu (sizetype, diff));
 
 		  dr_a2->seg_len = new_seg_len;
 		  do_remove = true;
@@ -1571,22 +1581,19 @@ prune_runtime_alias_test_list (vec<dr_wi
 	  else
 	    {
 	      /* Case A.1.  */
-	      if (wi::leu_p (diff, min_seg_len_b)
+	      if (must_le (diff, min_seg_len_b)
 		  /* Case A.2 and B combined.  */
-		  || (tree_fits_uhwi_p (dr_a1->seg_len)))
+		  || const_seg_len_a1)
 		{
-		  if (tree_fits_uhwi_p (dr_a1->seg_len)
-		      && tree_fits_uhwi_p (dr_a2->seg_len))
-		    {
-		      wide_int max_len
-			= wi::umax (wi::to_wide (dr_a2->seg_len) + diff,
-				    wi::to_wide (dr_a1->seg_len));
-		      new_seg_len = wide_int_to_tree (sizetype, max_len);
-		    }
+		  if (const_seg_len_a1 && const_seg_len_a2)
+		    new_seg_len
+		      = build_int_cstu (sizetype,
+					upper_bound (seg_len_a2 + diff,
+						     seg_len_a1));
 		  else
 		    new_seg_len
 		      = size_binop (PLUS_EXPR, dr_a2->seg_len,
-				    wide_int_to_tree (sizetype, diff));
+				    build_int_cstu (sizetype, diff));
 
 		  dr_a1->seg_len = new_seg_len;
 		  do_remove = true;

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [065/nnn] poly_int: vect_nunits_for_cost
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (63 preceding siblings ...)
  2017-10-23 17:27 ` [066/nnn] poly_int: omp_max_vf Richard Sandiford
@ 2017-10-23 17:27 ` Richard Sandiford
  2017-12-05 17:35   ` Jeff Law
  2017-10-23 17:27 ` [064/nnn] poly_int: SLP max_units Richard Sandiford
                   ` (42 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:27 UTC (permalink / raw)
  To: gcc-patches

This patch adds a function for getting the number of elements in
a vector for cost purposes, which is always constant.  It makes
it possible for a later patch to change GET_MODE_NUNITS and
TYPE_VECTOR_SUBPARTS to a poly_int.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vectorizer.h (vect_nunits_for_cost): New function.
	* tree-vect-loop.c (vect_model_reduction_cost): Use it.
	* tree-vect-slp.c (vect_analyze_slp_cost_1): Likewise.
	(vect_analyze_slp_cost): Likewise.
	* tree-vect-stmts.c (vect_model_store_cost): Likewise.
	(vect_model_load_cost): Likewise.

Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2017-10-23 17:22:26.575499779 +0100
+++ gcc/tree-vectorizer.h	2017-10-23 17:22:28.837953732 +0100
@@ -1154,6 +1154,16 @@ vect_vf_for_cost (loop_vec_info loop_vin
   return estimated_poly_value (LOOP_VINFO_VECT_FACTOR (loop_vinfo));
 }
 
+/* Estimate the number of elements in VEC_TYPE for costing purposes.
+   Pick a reasonable estimate if the exact number isn't known at
+   compile time.  */
+
+static inline unsigned int
+vect_nunits_for_cost (tree vec_type)
+{
+  return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vec_type));
+}
+
 /* Return the size of the value accessed by unvectorized data reference DR.
    This is only valid once STMT_VINFO_VECTYPE has been calculated for the
    associated gimple statement, since that guarantees that DR accesses
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-10-23 17:22:26.573499378 +0100
+++ gcc/tree-vect-loop.c	2017-10-23 17:22:28.835953330 +0100
@@ -3844,13 +3844,15 @@ vect_model_reduction_cost (stmt_vec_info
 	}
       else if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
 	{
-	  unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype);
+	  unsigned estimated_nunits = vect_nunits_for_cost (vectype);
 	  /* Extraction of scalar elements.  */
-	  epilogue_cost += add_stmt_cost (target_cost_data, 2 * nunits,
+	  epilogue_cost += add_stmt_cost (target_cost_data,
+					  2 * estimated_nunits,
 					  vec_to_scalar, stmt_info, 0,
 					  vect_epilogue);
 	  /* Scalar max reductions via COND_EXPR / MAX_EXPR.  */
-	  epilogue_cost += add_stmt_cost (target_cost_data, 2 * nunits - 3,
+	  epilogue_cost += add_stmt_cost (target_cost_data,
+					  2 * estimated_nunits - 3,
 					  scalar_stmt, stmt_info, 0,
 					  vect_epilogue);
 	}
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-10-23 17:22:27.793744215 +0100
+++ gcc/tree-vect-slp.c	2017-10-23 17:22:28.836953531 +0100
@@ -1718,8 +1718,8 @@ vect_analyze_slp_cost_1 (slp_instance in
 					    &n_perms);
 	      record_stmt_cost (body_cost_vec, n_perms, vec_perm,
 				stmt_info, 0, vect_body);
-	      unsigned nunits
-		= TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
+	      unsigned assumed_nunits
+		= vect_nunits_for_cost (STMT_VINFO_VECTYPE (stmt_info));
 	      /* And adjust the number of loads performed.  This handles
 	         redundancies as well as loads that are later dead.  */
 	      auto_sbitmap perm (GROUP_SIZE (stmt_info));
@@ -1730,7 +1730,7 @@ vect_analyze_slp_cost_1 (slp_instance in
 	      bool load_seen = false;
 	      for (i = 0; i < GROUP_SIZE (stmt_info); ++i)
 		{
-		  if (i % nunits == 0)
+		  if (i % assumed_nunits == 0)
 		    {
 		      if (load_seen)
 			ncopies_for_cost++;
@@ -1743,7 +1743,7 @@ vect_analyze_slp_cost_1 (slp_instance in
 		ncopies_for_cost++;
 	      gcc_assert (ncopies_for_cost
 			  <= (GROUP_SIZE (stmt_info) - GROUP_GAP (stmt_info)
-			      + nunits - 1) / nunits);
+			      + assumed_nunits - 1) / assumed_nunits);
 	      poly_uint64 uf = SLP_INSTANCE_UNROLLING_FACTOR (instance);
 	      ncopies_for_cost *= estimated_poly_value (uf);
 	    }
@@ -1856,9 +1856,9 @@ vect_analyze_slp_cost (slp_instance inst
     assumed_vf = vect_vf_for_cost (STMT_VINFO_LOOP_VINFO (stmt_info));
   else
     assumed_vf = 1;
-  unsigned nunits = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
   /* For reductions look at a reduction operand in case the reduction
      operation is widening like DOT_PROD or SAD.  */
+  tree vectype_for_cost = STMT_VINFO_VECTYPE (stmt_info);
   if (!STMT_VINFO_GROUPED_ACCESS (stmt_info))
     {
       gimple *stmt = SLP_TREE_SCALAR_STMTS (node)[0];
@@ -1866,14 +1866,16 @@ vect_analyze_slp_cost (slp_instance inst
 	{
 	case DOT_PROD_EXPR:
 	case SAD_EXPR:
-	  nunits = TYPE_VECTOR_SUBPARTS (get_vectype_for_scalar_type
-				(TREE_TYPE (gimple_assign_rhs1 (stmt))));
+	  vectype_for_cost = get_vectype_for_scalar_type
+	    (TREE_TYPE (gimple_assign_rhs1 (stmt)));
 	  break;
 	default:;
 	}
     }
-  ncopies_for_cost = least_common_multiple (nunits,
-					    group_size * assumed_vf) / nunits;
+  unsigned int assumed_nunits = vect_nunits_for_cost (vectype_for_cost);
+  ncopies_for_cost = (least_common_multiple (assumed_nunits,
+					     group_size * assumed_vf)
+		      / assumed_nunits);
 
   prologue_cost_vec.create (10);
   body_cost_vec.create (10);
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-10-23 17:22:26.574499579 +0100
+++ gcc/tree-vect-stmts.c	2017-10-23 17:22:28.837953732 +0100
@@ -950,18 +950,25 @@ vect_model_store_cost (stmt_vec_info stm
   /* Costs of the stores.  */
   if (memory_access_type == VMAT_ELEMENTWISE
       || memory_access_type == VMAT_GATHER_SCATTER)
-    /* N scalar stores plus extracting the elements.  */
-    inside_cost += record_stmt_cost (body_cost_vec,
-				     ncopies * TYPE_VECTOR_SUBPARTS (vectype),
-				     scalar_store, stmt_info, 0, vect_body);
+    {
+      /* N scalar stores plus extracting the elements.  */
+      unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
+      inside_cost += record_stmt_cost (body_cost_vec,
+				       ncopies * assumed_nunits,
+				       scalar_store, stmt_info, 0, vect_body);
+    }
   else
     vect_get_store_cost (dr, ncopies, &inside_cost, body_cost_vec);
 
   if (memory_access_type == VMAT_ELEMENTWISE
       || memory_access_type == VMAT_STRIDED_SLP)
-    inside_cost += record_stmt_cost (body_cost_vec,
-				     ncopies * TYPE_VECTOR_SUBPARTS (vectype),
-				     vec_to_scalar, stmt_info, 0, vect_body);
+    {
+      /* N scalar stores plus extracting the elements.  */
+      unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
+      inside_cost += record_stmt_cost (body_cost_vec,
+				       ncopies * assumed_nunits,
+				       vec_to_scalar, stmt_info, 0, vect_body);
+    }
 
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -1081,8 +1088,9 @@ vect_model_load_cost (stmt_vec_info stmt
     {
       /* N scalar loads plus gathering them into a vector.  */
       tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+      unsigned int assumed_nunits = vect_nunits_for_cost (vectype);
       inside_cost += record_stmt_cost (body_cost_vec,
-				       ncopies * TYPE_VECTOR_SUBPARTS (vectype),
+				       ncopies * assumed_nunits,
 				       scalar_load, stmt_info, 0, vect_body);
     }
   else

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [066/nnn] poly_int: omp_max_vf
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (62 preceding siblings ...)
  2017-10-23 17:26 ` [062/nnn] poly_int: prune_runtime_alias_test_list Richard Sandiford
@ 2017-10-23 17:27 ` Richard Sandiford
  2017-12-05 17:40   ` Jeff Law
  2017-10-23 17:27 ` [065/nnn] poly_int: vect_nunits_for_cost Richard Sandiford
                   ` (43 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:27 UTC (permalink / raw)
  To: gcc-patches

This patch makes omp_max_vf return a polynomial vectorization factor.
We then need to be able to stash a polynomial value in
OMP_CLAUSE_SAFELEN_EXPR too:

   /* If max_vf is non-zero, then we can use only a vectorization factor
      up to the max_vf we chose.  So stick it into the safelen clause.  */

For now the cfgloop safelen is still constant though.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* omp-general.h (omp_max_vf): Return a poly_uint64 instead of an int.
	* omp-general.c (omp_max_vf): Likewise.
	* omp-expand.c (omp_adjust_chunk_size): Update call to omp_max_vf.
	(expand_omp_simd): Handle polynomial safelen.
	* omp-low.c (omplow_simd_context): Add a default constructor.
	(omplow_simd_context::max_vf): Change from int to poly_uint64.
	(lower_rec_simd_input_clauses): Update accordingly.
	(lower_rec_input_clauses): Likewise.

Index: gcc/omp-general.h
===================================================================
--- gcc/omp-general.h	2017-05-18 07:51:12.357753671 +0100
+++ gcc/omp-general.h	2017-10-23 17:22:29.881163047 +0100
@@ -78,7 +78,7 @@ extern tree omp_get_for_step_from_incr (
 extern void omp_extract_for_data (gomp_for *for_stmt, struct omp_for_data *fd,
 				  struct omp_for_data_loop *loops);
 extern gimple *omp_build_barrier (tree lhs);
-extern int omp_max_vf (void);
+extern poly_uint64 omp_max_vf (void);
 extern int omp_max_simt_vf (void);
 extern tree oacc_launch_pack (unsigned code, tree device, unsigned op);
 extern void oacc_replace_fn_attrib (tree fn, tree dims);
Index: gcc/omp-general.c
===================================================================
--- gcc/omp-general.c	2017-08-10 14:36:08.449457108 +0100
+++ gcc/omp-general.c	2017-10-23 17:22:29.881163047 +0100
@@ -423,7 +423,7 @@ omp_build_barrier (tree lhs)
 
 /* Return maximum possible vectorization factor for the target.  */
 
-int
+poly_uint64
 omp_max_vf (void)
 {
   if (!optimize
Index: gcc/omp-expand.c
===================================================================
--- gcc/omp-expand.c	2017-10-02 09:10:57.525659817 +0100
+++ gcc/omp-expand.c	2017-10-23 17:22:29.881163047 +0100
@@ -206,8 +206,8 @@ omp_adjust_chunk_size (tree chunk_size,
   if (!simd_schedule)
     return chunk_size;
 
-  int vf = omp_max_vf ();
-  if (vf == 1)
+  poly_uint64 vf = omp_max_vf ();
+  if (must_eq (vf, 1U))
     return chunk_size;
 
   tree type = TREE_TYPE (chunk_size);
@@ -4609,11 +4609,12 @@ expand_omp_simd (struct omp_region *regi
 
   if (safelen)
     {
+      poly_uint64 val;
       safelen = OMP_CLAUSE_SAFELEN_EXPR (safelen);
-      if (TREE_CODE (safelen) != INTEGER_CST)
+      if (!poly_int_tree_p (safelen, &val))
 	safelen_int = 0;
-      else if (tree_fits_uhwi_p (safelen) && tree_to_uhwi (safelen) < INT_MAX)
-	safelen_int = tree_to_uhwi (safelen);
+      else
+	safelen_int = MIN (constant_lower_bound (val), INT_MAX);
       if (safelen_int == 1)
 	safelen_int = 0;
     }
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	2017-10-23 17:17:01.432034493 +0100
+++ gcc/omp-low.c	2017-10-23 17:22:29.882163248 +0100
@@ -3487,11 +3487,12 @@ omp_clause_aligned_alignment (tree claus
    and lower_rec_input_clauses.  */
 
 struct omplow_simd_context {
+  omplow_simd_context () { memset (this, 0, sizeof (*this)); }
   tree idx;
   tree lane;
   vec<tree, va_heap> simt_eargs;
   gimple_seq simt_dlist;
-  int max_vf;
+  poly_uint64_pod max_vf;
   bool is_simt;
 };
 
@@ -3502,28 +3503,30 @@ struct omplow_simd_context {
 lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
 			      omplow_simd_context *sctx, tree &ivar, tree &lvar)
 {
-  if (sctx->max_vf == 0)
+  if (known_zero (sctx->max_vf))
     {
       sctx->max_vf = sctx->is_simt ? omp_max_simt_vf () : omp_max_vf ();
-      if (sctx->max_vf > 1)
+      if (may_gt (sctx->max_vf, 1U))
 	{
 	  tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
 				    OMP_CLAUSE_SAFELEN);
-	  if (c
-	      && (TREE_CODE (OMP_CLAUSE_SAFELEN_EXPR (c)) != INTEGER_CST
-		  || tree_int_cst_sgn (OMP_CLAUSE_SAFELEN_EXPR (c)) != 1))
-	    sctx->max_vf = 1;
-	  else if (c && compare_tree_int (OMP_CLAUSE_SAFELEN_EXPR (c),
-					  sctx->max_vf) == -1)
-	    sctx->max_vf = tree_to_shwi (OMP_CLAUSE_SAFELEN_EXPR (c));
+	  if (c)
+	    {
+	      poly_uint64 safe_len;
+	      if (!poly_int_tree_p (OMP_CLAUSE_SAFELEN_EXPR (c), &safe_len)
+		  || may_lt (safe_len, 1U))
+		sctx->max_vf = 1;
+	      else
+		sctx->max_vf = lower_bound (sctx->max_vf, safe_len);
+	    }
 	}
-      if (sctx->max_vf > 1)
+      if (may_gt (sctx->max_vf, 1U))
 	{
 	  sctx->idx = create_tmp_var (unsigned_type_node);
 	  sctx->lane = create_tmp_var (unsigned_type_node);
 	}
     }
-  if (sctx->max_vf == 1)
+  if (must_eq (sctx->max_vf, 1U))
     return false;
 
   if (sctx->is_simt)
@@ -3637,7 +3640,7 @@ lower_rec_input_clauses (tree clauses, g
 	}
 
   /* Add a placeholder for simduid.  */
-  if (sctx.is_simt && sctx.max_vf != 1)
+  if (sctx.is_simt && may_ne (sctx.max_vf, 1U))
     sctx.simt_eargs.safe_push (NULL_TREE);
 
   /* Do all the fixed sized types in the first pass, and the variable sized
@@ -4527,7 +4530,7 @@ lower_rec_input_clauses (tree clauses, g
 	}
     }
 
-  if (sctx.max_vf == 1)
+  if (must_eq (sctx.max_vf, 1U))
     sctx.is_simt = false;
 
   if (sctx.lane || sctx.is_simt)
@@ -4664,14 +4667,14 @@ lower_rec_input_clauses (tree clauses, g
 
   /* If max_vf is non-zero, then we can use only a vectorization factor
      up to the max_vf we chose.  So stick it into the safelen clause.  */
-  if (sctx.max_vf)
+  if (maybe_nonzero (sctx.max_vf))
     {
       tree c = omp_find_clause (gimple_omp_for_clauses (ctx->stmt),
 				OMP_CLAUSE_SAFELEN);
+      poly_uint64 safe_len;
       if (c == NULL_TREE
-	  || (TREE_CODE (OMP_CLAUSE_SAFELEN_EXPR (c)) == INTEGER_CST
-	      && compare_tree_int (OMP_CLAUSE_SAFELEN_EXPR (c),
-				   sctx.max_vf) == 1))
+	  || (poly_int_tree_p (OMP_CLAUSE_SAFELEN_EXPR (c), &safe_len)
+	      && may_gt (safe_len, sctx.max_vf)))
 	{
 	  c = build_omp_clause (UNKNOWN_LOCATION, OMP_CLAUSE_SAFELEN);
 	  OMP_CLAUSE_SAFELEN_EXPR (c) = build_int_cst (integer_type_node,

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [064/nnn] poly_int: SLP max_units
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (64 preceding siblings ...)
  2017-10-23 17:27 ` [065/nnn] poly_int: vect_nunits_for_cost Richard Sandiford
@ 2017-10-23 17:27 ` Richard Sandiford
  2017-12-05 17:41   ` Jeff Law
  2017-10-23 17:28 ` [067/nnn] poly_int: get_mask_mode Richard Sandiford
                   ` (41 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:27 UTC (permalink / raw)
  To: gcc-patches

This match makes tree-vect-slp.c track the maximum number of vector
units as a poly_uint64 rather than an unsigned int.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-slp.c (vect_record_max_nunits, vect_build_slp_tree_1)
	(vect_build_slp_tree_2, vect_build_slp_tree): Change max_nunits
	from an unsigned int * to a poly_uint64_pod *.
	(calculate_unrolling_factor): New function.
	(vect_analyze_slp_instance): Use it.  Track polynomial max_nunits.

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-10-23 17:22:26.573499378 +0100
+++ gcc/tree-vect-slp.c	2017-10-23 17:22:27.793744215 +0100
@@ -489,7 +489,7 @@ vect_get_and_check_slp_defs (vec_info *v
 
 static bool
 vect_record_max_nunits (vec_info *vinfo, gimple *stmt, unsigned int group_size,
-			tree vectype, unsigned int *max_nunits)
+			tree vectype, poly_uint64 *max_nunits)
 {
   if (!vectype)
     {
@@ -506,8 +506,11 @@ vect_record_max_nunits (vec_info *vinfo,
 
   /* If populating the vector type requires unrolling then fail
      before adjusting *max_nunits for basic-block vectorization.  */
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned HOST_WIDE_INT const_nunits;
   if (is_a <bb_vec_info> (vinfo)
-      && TYPE_VECTOR_SUBPARTS (vectype) > group_size)
+      && (!nunits.is_constant (&const_nunits)
+	  || const_nunits > group_size))
     {
       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 		       "Build SLP failed: unrolling required "
@@ -517,9 +520,7 @@ vect_record_max_nunits (vec_info *vinfo,
     }
 
   /* In case of multiple types we need to detect the smallest type.  */
-  if (*max_nunits < TYPE_VECTOR_SUBPARTS (vectype))
-    *max_nunits = TYPE_VECTOR_SUBPARTS (vectype);
-
+  vect_update_max_nunits (max_nunits, vectype);
   return true;
 }
 
@@ -540,7 +541,7 @@ vect_record_max_nunits (vec_info *vinfo,
 static bool
 vect_build_slp_tree_1 (vec_info *vinfo, unsigned char *swap,
 		       vec<gimple *> stmts, unsigned int group_size,
-		       unsigned nops, unsigned int *max_nunits,
+		       unsigned nops, poly_uint64 *max_nunits,
 		       bool *matches, bool *two_operators)
 {
   unsigned int i;
@@ -966,16 +967,15 @@ bst_traits::equal (value_type existing,
 static slp_tree
 vect_build_slp_tree_2 (vec_info *vinfo,
 		       vec<gimple *> stmts, unsigned int group_size,
-		       unsigned int *max_nunits,
+		       poly_uint64 *max_nunits,
 		       vec<slp_tree> *loads,
 		       bool *matches, unsigned *npermutes, unsigned *tree_size,
 		       unsigned max_tree_size);
 
 static slp_tree
 vect_build_slp_tree (vec_info *vinfo,
-                     vec<gimple *> stmts, unsigned int group_size,
-                     unsigned int *max_nunits,
-                     vec<slp_tree> *loads,
+		     vec<gimple *> stmts, unsigned int group_size,
+		     poly_uint64 *max_nunits, vec<slp_tree> *loads,
 		     bool *matches, unsigned *npermutes, unsigned *tree_size,
 		     unsigned max_tree_size)
 {
@@ -1007,12 +1007,13 @@ vect_build_slp_tree (vec_info *vinfo,
 static slp_tree
 vect_build_slp_tree_2 (vec_info *vinfo,
 		       vec<gimple *> stmts, unsigned int group_size,
-		       unsigned int *max_nunits,
+		       poly_uint64 *max_nunits,
 		       vec<slp_tree> *loads,
 		       bool *matches, unsigned *npermutes, unsigned *tree_size,
 		       unsigned max_tree_size)
 {
-  unsigned nops, i, this_tree_size = 0, this_max_nunits = *max_nunits;
+  unsigned nops, i, this_tree_size = 0;
+  poly_uint64 this_max_nunits = *max_nunits;
   gimple *stmt;
   slp_tree node;
 
@@ -1951,6 +1952,15 @@ vect_split_slp_store_group (gimple *firs
   return group2;
 }
 
+/* Calculate the unrolling factor for an SLP instance with GROUP_SIZE
+   statements and a vector of NUNITS elements.  */
+
+static poly_uint64
+calculate_unrolling_factor (poly_uint64 nunits, unsigned int group_size)
+{
+  return exact_div (common_multiple (nunits, group_size), group_size);
+}
+
 /* Analyze an SLP instance starting from a group of grouped stores.  Call
    vect_build_slp_tree to build a tree of packed stmts if possible.
    Return FALSE if it's impossible to SLP any stmt in the loop.  */
@@ -1962,11 +1972,9 @@ vect_analyze_slp_instance (vec_info *vin
   slp_instance new_instance;
   slp_tree node;
   unsigned int group_size = GROUP_SIZE (vinfo_for_stmt (stmt));
-  unsigned int nunits;
   tree vectype, scalar_type = NULL_TREE;
   gimple *next;
   unsigned int i;
-  unsigned int max_nunits = 0;
   vec<slp_tree> loads;
   struct data_reference *dr = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
   vec<gimple *> scalar_stmts;
@@ -2005,7 +2013,7 @@ vect_analyze_slp_instance (vec_info *vin
 
       return false;
     }
-  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
   /* Create a node (a root of the SLP tree) for the packed grouped stores.  */
   scalar_stmts.create (group_size);
@@ -2043,32 +2051,35 @@ vect_analyze_slp_instance (vec_info *vin
   bool *matches = XALLOCAVEC (bool, group_size);
   unsigned npermutes = 0;
   bst_fail = new hash_set <vec <gimple *>, bst_traits> ();
+  poly_uint64 max_nunits = nunits;
   node = vect_build_slp_tree (vinfo, scalar_stmts, group_size,
-				   &max_nunits, &loads, matches, &npermutes,
+			      &max_nunits, &loads, matches, &npermutes,
 			      NULL, max_tree_size);
   delete bst_fail;
   if (node != NULL)
     {
       /* Calculate the unrolling factor based on the smallest type.  */
       poly_uint64 unrolling_factor
-	= least_common_multiple (max_nunits, group_size) / group_size;
+	= calculate_unrolling_factor (max_nunits, group_size);
 
       if (may_ne (unrolling_factor, 1U)
 	  && is_a <bb_vec_info> (vinfo))
 	{
-
-	  if (max_nunits > group_size)
-        {
-            dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-			       "Build SLP failed: store group "
-			       "size not a multiple of the vector size "
-			       "in basic block SLP\n");
-	  vect_free_slp_tree (node);
-	  loads.release ();
-          return false;
-        }
+	  unsigned HOST_WIDE_INT const_max_nunits;
+	  if (!max_nunits.is_constant (&const_max_nunits)
+	      || const_max_nunits > group_size)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				 "Build SLP failed: store group "
+				 "size not a multiple of the vector size "
+				 "in basic block SLP\n");
+	      vect_free_slp_tree (node);
+	      loads.release ();
+	      return false;
+	    }
 	  /* Fatal mismatch.  */
-	  matches[group_size/max_nunits * max_nunits] = false;
+	  matches[group_size / const_max_nunits * const_max_nunits] = false;
 	  vect_free_slp_tree (node);
 	  loads.release ();
 	}
@@ -2187,20 +2198,22 @@ vect_analyze_slp_instance (vec_info *vin
 
   /* For basic block SLP, try to break the group up into multiples of the
      vector size.  */
+  unsigned HOST_WIDE_INT const_nunits;
   if (is_a <bb_vec_info> (vinfo)
       && GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt))
-      && STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (stmt)))
+      && STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (stmt))
+      && nunits.is_constant (&const_nunits))
     {
       /* We consider breaking the group only on VF boundaries from the existing
 	 start.  */
       for (i = 0; i < group_size; i++)
 	if (!matches[i]) break;
 
-      if (i >= nunits && i < group_size)
+      if (i >= const_nunits && i < group_size)
 	{
 	  /* Split into two groups at the first vector boundary before i.  */
-	  gcc_assert ((nunits & (nunits - 1)) == 0);
-	  unsigned group1_size = i & ~(nunits - 1);
+	  gcc_assert ((const_nunits & (const_nunits - 1)) == 0);
+	  unsigned group1_size = i & ~(const_nunits - 1);
 
 	  gimple *rest = vect_split_slp_store_group (stmt, group1_size);
 	  bool res = vect_analyze_slp_instance (vinfo, stmt, max_tree_size);
@@ -2208,9 +2221,9 @@ vect_analyze_slp_instance (vec_info *vin
 	     skip the rest of that vector.  */
 	  if (group1_size < i)
 	    {
-	      i = group1_size + nunits;
+	      i = group1_size + const_nunits;
 	      if (i < group_size)
-		rest = vect_split_slp_store_group (rest, nunits);
+		rest = vect_split_slp_store_group (rest, const_nunits);
 	    }
 	  if (i < group_size)
 	    res |= vect_analyze_slp_instance (vinfo, rest, max_tree_size);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [068/nnn] poly_int: current_vector_size and TARGET_AUTOVECTORIZE_VECTOR_SIZES
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (66 preceding siblings ...)
  2017-10-23 17:28 ` [067/nnn] poly_int: get_mask_mode Richard Sandiford
@ 2017-10-23 17:28 ` Richard Sandiford
  2017-12-06  1:52   ` Jeff Law
  2017-10-23 17:29 ` [070/nnn] poly_int: vectorizable_reduction Richard Sandiford
                   ` (39 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:28 UTC (permalink / raw)
  To: gcc-patches

This patch changes the type of current_vector_size to poly_uint64.
It also changes TARGET_AUTOVECTORIZE_VECTOR_SIZES so that it fills
in a vector of possible sizes (as poly_uint64s) instead of returning
a bitmask.  The documentation claimed that the hook didn't need to
include the default vector size (returned by preferred_simd_mode),
but that wasn't consistent with the omp-low.c usage.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* target.h (vector_sizes, auto_vector_sizes): New typedefs.
	* target.def (autovectorize_vector_sizes): Return the vector sizes
	by pointer, using vector_sizes rather than a bitmask.
	* targhooks.h (default_autovectorize_vector_sizes): Update accordingly.
	* targhooks.c (default_autovectorize_vector_sizes): Likewise.
	* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
	Likewise.
	* config/arc/arc.c (arc_autovectorize_vector_sizes): Likewise.
	* config/arm/arm.c (arm_autovectorize_vector_sizes): Likewise.
	* config/i386/i386.c (ix86_autovectorize_vector_sizes): Likewise.
	* config/mips/mips.c (mips_autovectorize_vector_sizes): Likewise.
	* omp-general.c (omp_max_vf): Likewise.
	* omp-low.c (omp_clause_aligned_alignment): Likewise.
	* optabs-query.c (can_vec_mask_load_store_p): Likewise.
	* tree-vect-loop.c (vect_analyze_loop): Likewise.
	* tree-vect-slp.c (vect_slp_bb): Likewise.
	* doc/tm.texi: Regenerate.
	* tree-vectorizer.h (current_vector_size): Change from an unsigned int
	to a poly_uint64.
	* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Take
	the vector size as a poly_uint64 rather than an unsigned int.
	(current_vector_size): Change from an unsigned int to a poly_uint64.
	(get_vectype_for_scalar_type): Update accordingly.
	* tree.h (build_truth_vector_type): Take the size and number of
	units as a poly_uint64 rather than an unsigned int.
	(build_vector_type): Add a temporary overload that takes
	the number of units as a poly_uint64 rather than an unsigned int.
	* tree.c (make_vector_type): Likewise.
	(build_truth_vector_type): Take the number of units as a poly_uint64
	rather than an unsigned int.

Index: gcc/target.h
===================================================================
--- gcc/target.h	2017-10-23 17:11:40.126719272 +0100
+++ gcc/target.h	2017-10-23 17:22:32.724227435 +0100
@@ -199,6 +199,13 @@ typedef vec<unsigned short> vec_perm_ind
    automatically freed.  */
 typedef auto_vec<unsigned short, 32> auto_vec_perm_indices;
 
+/* The type to use for lists of vector sizes.  */
+typedef vec<poly_uint64> vector_sizes;
+
+/* Same, but can be used to construct local lists that are
+   automatically freed.  */
+typedef auto_vec<poly_uint64, 8> auto_vector_sizes;
+
 /* The target structure.  This holds all the backend hooks.  */
 #define DEFHOOKPOD(NAME, DOC, TYPE, INIT) TYPE NAME;
 #define DEFHOOK(NAME, DOC, TYPE, PARAMS, INIT) TYPE (* NAME) PARAMS;
Index: gcc/target.def
===================================================================
--- gcc/target.def	2017-10-23 17:22:30.980383601 +0100
+++ gcc/target.def	2017-10-23 17:22:32.724227435 +0100
@@ -1880,12 +1880,16 @@ transformations even in absence of speci
    after processing the preferred one derived from preferred_simd_mode.  */
 DEFHOOK
 (autovectorize_vector_sizes,
- "This hook should return a mask of sizes that should be iterated over\n\
-after trying to autovectorize using the vector size derived from the\n\
-mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.\n\
-The default is zero which means to not iterate over other vector sizes.",
- unsigned int,
- (void),
+ "If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not\n\
+the only one that is worth considering, this hook should add all suitable\n\
+vector sizes to @var{sizes}, in order of decreasing preference.  The first\n\
+one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.\n\
+\n\
+The hook does not need to do anything if the vector returned by\n\
+@code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is the only one relevant\n\
+for autovectorization.  The default implementation does nothing.",
+ void,
+ (vector_sizes *sizes),
  default_autovectorize_vector_sizes)
 
 /* Function to get a target mode for a vector mask.  */
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	2017-10-23 17:22:30.980383601 +0100
+++ gcc/targhooks.h	2017-10-23 17:22:32.725227332 +0100
@@ -106,7 +106,7 @@ default_builtin_support_vector_misalignm
 					     const_tree,
 					     int, bool);
 extern machine_mode default_preferred_simd_mode (scalar_mode mode);
-extern unsigned int default_autovectorize_vector_sizes (void);
+extern void default_autovectorize_vector_sizes (vector_sizes *);
 extern opt_machine_mode default_get_mask_mode (poly_uint64, poly_uint64);
 extern void *default_init_cost (struct loop *);
 extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2017-10-23 17:22:30.980383601 +0100
+++ gcc/targhooks.c	2017-10-23 17:22:32.725227332 +0100
@@ -1248,10 +1248,9 @@ default_preferred_simd_mode (scalar_mode
 /* By default only the size derived from the preferred vector mode
    is tried.  */
 
-unsigned int
-default_autovectorize_vector_sizes (void)
+void
+default_autovectorize_vector_sizes (vector_sizes *)
 {
-  return 0;
 }
 
 /* By default a vector of integers is used as a mask.  */
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2017-10-23 17:11:40.139744163 +0100
+++ gcc/config/aarch64/aarch64.c	2017-10-23 17:22:32.709228991 +0100
@@ -11310,12 +11310,13 @@ aarch64_preferred_simd_mode (scalar_mode
   return aarch64_simd_container_mode (mode, 128);
 }
 
-/* Return the bitmask of possible vector sizes for the vectorizer
+/* Return a list of possible vector sizes for the vectorizer
    to iterate over.  */
-static unsigned int
-aarch64_autovectorize_vector_sizes (void)
+static void
+aarch64_autovectorize_vector_sizes (vector_sizes *sizes)
 {
-  return (16 | 8);
+  sizes->safe_push (16);
+  sizes->safe_push (8);
 }
 
 /* Implement TARGET_MANGLE_TYPE.  */
Index: gcc/config/arc/arc.c
===================================================================
--- gcc/config/arc/arc.c	2017-10-23 17:11:40.141747992 +0100
+++ gcc/config/arc/arc.c	2017-10-23 17:22:32.710228887 +0100
@@ -404,10 +404,14 @@ arc_preferred_simd_mode (scalar_mode mod
 /* Implements target hook
    TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES.  */
 
-static unsigned int
-arc_autovectorize_vector_sizes (void)
+static void
+arc_autovectorize_vector_sizes (vector_sizes *sizes)
 {
-  return TARGET_PLUS_QMACW ? (8 | 4) : 0;
+  if (TARGET_PLUS_QMACW)
+    {
+      sizes->quick_push (8);
+      sizes->quick_push (4);
+    }
 }
 
 /* TARGET_PRESERVE_RELOAD_P is still awaiting patch re-evaluation / review.  */
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2017-10-23 17:19:01.398170131 +0100
+++ gcc/config/arm/arm.c	2017-10-23 17:22:32.713228576 +0100
@@ -283,7 +283,7 @@ static bool arm_builtin_support_vector_m
 static void arm_conditional_register_usage (void);
 static enum flt_eval_method arm_excess_precision (enum excess_precision_type);
 static reg_class_t arm_preferred_rename_class (reg_class_t rclass);
-static unsigned int arm_autovectorize_vector_sizes (void);
+static void arm_autovectorize_vector_sizes (vector_sizes *);
 static int arm_default_branch_cost (bool, bool);
 static int arm_cortex_a5_branch_cost (bool, bool);
 static int arm_cortex_m_branch_cost (bool, bool);
@@ -27947,10 +27947,14 @@ arm_vector_alignment (const_tree type)
   return align;
 }
 
-static unsigned int
-arm_autovectorize_vector_sizes (void)
+static void
+arm_autovectorize_vector_sizes (vector_sizes *sizes)
 {
-  return TARGET_NEON_VECTORIZE_DOUBLE ? 0 : (16 | 8);
+  if (!TARGET_NEON_VECTORIZE_DOUBLE)
+    {
+      sizes->safe_push (16);
+      sizes->safe_push (8);
+    }
 }
 
 static bool
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	2017-10-23 17:22:30.978383200 +0100
+++ gcc/config/i386/i386.c	2017-10-23 17:22:32.719227954 +0100
@@ -48105,17 +48105,20 @@ ix86_preferred_simd_mode (scalar_mode mo
    vectors.  If AVX512F is enabled then try vectorizing with 512bit,
    256bit and 128bit vectors.  */
 
-static unsigned int
-ix86_autovectorize_vector_sizes (void)
+static void
+ix86_autovectorize_vector_sizes (vector_sizes *sizes)
 {
-  unsigned int bytesizes = 0;
-
   if (TARGET_AVX512F && !TARGET_PREFER_AVX256)
-    bytesizes |= (64 | 32 | 16);
+    {
+      sizes->safe_push (64);
+      sizes->safe_push (32);
+      sizes->safe_push (16);
+    }
   else if (TARGET_AVX && !TARGET_PREFER_AVX128)
-    bytesizes |= (32 | 16);
-
-  return bytesizes;
+    {
+      sizes->safe_push (32);
+      sizes->safe_push (16);
+    }
 }
 
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2017-10-23 17:18:47.656057887 +0100
+++ gcc/config/mips/mips.c	2017-10-23 17:22:32.721227746 +0100
@@ -13401,10 +13401,11 @@ mips_preferred_simd_mode (scalar_mode mo
 
 /* Implement TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES.  */
 
-static unsigned int
-mips_autovectorize_vector_sizes (void)
+static void
+mips_autovectorize_vector_sizes (vector_sizes *sizes)
 {
-  return ISA_HAS_MSA ? 16 : 0;
+  if (ISA_HAS_MSA)
+    sizes->safe_push (16);
 }
 
 /* Implement TARGET_INIT_LIBFUNCS.  */
Index: gcc/omp-general.c
===================================================================
--- gcc/omp-general.c	2017-10-23 17:22:29.881163047 +0100
+++ gcc/omp-general.c	2017-10-23 17:22:32.722227643 +0100
@@ -433,17 +433,21 @@ omp_max_vf (void)
 	  && global_options_set.x_flag_tree_loop_vectorize))
     return 1;
 
-  int vf = 1;
-  int vs = targetm.vectorize.autovectorize_vector_sizes ();
-  if (vs)
-    vf = 1 << floor_log2 (vs);
-  else
+  auto_vector_sizes sizes;
+  targetm.vectorize.autovectorize_vector_sizes (&sizes);
+  if (!sizes.is_empty ())
     {
-      machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
-      if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
-	vf = GET_MODE_NUNITS (vqimode);
+      poly_uint64 vf = 0;
+      for (unsigned int i = 0; i < sizes.length (); ++i)
+	vf = ordered_max (vf, sizes[i]);
+      return vf;
     }
-  return vf;
+
+  machine_mode vqimode = targetm.vectorize.preferred_simd_mode (QImode);
+  if (GET_MODE_CLASS (vqimode) == MODE_VECTOR_INT)
+    return GET_MODE_NUNITS (vqimode);
+
+  return 1;
 }
 
 /* Return maximum SIMT width if offloading may target SIMT hardware.  */
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	2017-10-23 17:22:29.882163248 +0100
+++ gcc/omp-low.c	2017-10-23 17:22:32.723227539 +0100
@@ -3451,9 +3451,11 @@ omp_clause_aligned_alignment (tree claus
   /* Otherwise return implementation defined alignment.  */
   unsigned int al = 1;
   opt_scalar_mode mode_iter;
-  int vs = targetm.vectorize.autovectorize_vector_sizes ();
-  if (vs)
-    vs = 1 << floor_log2 (vs);
+  auto_vector_sizes sizes;
+  targetm.vectorize.autovectorize_vector_sizes (&sizes);
+  poly_uint64 vs = 0;
+  for (unsigned int i = 0; i < sizes.length (); ++i)
+    vs = ordered_max (vs, sizes[i]);
   static enum mode_class classes[]
     = { MODE_INT, MODE_VECTOR_INT, MODE_FLOAT, MODE_VECTOR_FLOAT };
   for (int i = 0; i < 4; i += 2)
@@ -3464,16 +3466,16 @@ omp_clause_aligned_alignment (tree claus
 	machine_mode vmode = targetm.vectorize.preferred_simd_mode (mode);
 	if (GET_MODE_CLASS (vmode) != classes[i + 1])
 	  continue;
-	while (vs
-	       && GET_MODE_SIZE (vmode) < vs
+	while (maybe_nonzero (vs)
+	       && must_lt (GET_MODE_SIZE (vmode), vs)
 	       && GET_MODE_2XWIDER_MODE (vmode).exists ())
 	  vmode = GET_MODE_2XWIDER_MODE (vmode).require ();
 
 	tree type = lang_hooks.types.type_for_mode (mode, 1);
 	if (type == NULL_TREE || TYPE_MODE (type) != mode)
 	  continue;
-	type = build_vector_type (type, GET_MODE_SIZE (vmode)
-					/ GET_MODE_SIZE (mode));
+	unsigned int nelts = GET_MODE_SIZE (vmode) / GET_MODE_SIZE (mode);
+	type = build_vector_type (type, nelts);
 	if (TYPE_MODE (type) != vmode)
 	  continue;
 	if (TYPE_ALIGN_UNIT (type) > al)
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-10-23 17:11:39.995468444 +0100
+++ gcc/optabs-query.c	2017-10-23 17:22:32.723227539 +0100
@@ -489,7 +489,6 @@ can_vec_mask_load_store_p (machine_mode
 {
   optab op = is_load ? maskload_optab : maskstore_optab;
   machine_mode vmode;
-  unsigned int vector_sizes;
 
   /* If mode is vector mode, check it directly.  */
   if (VECTOR_MODE_P (mode))
@@ -513,14 +512,14 @@ can_vec_mask_load_store_p (machine_mode
       && convert_optab_handler (op, vmode, mask_mode) != CODE_FOR_nothing)
     return true;
 
-  vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
-  while (vector_sizes != 0)
+  auto_vector_sizes vector_sizes;
+  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes);
+  for (unsigned int i = 0; i < vector_sizes.length (); ++i)
     {
-      unsigned int cur = 1 << floor_log2 (vector_sizes);
-      vector_sizes &= ~cur;
-      if (cur <= GET_MODE_SIZE (smode))
+      poly_uint64 cur = vector_sizes[i];
+      poly_uint64 nunits;
+      if (!multiple_p (cur, GET_MODE_SIZE (smode), &nunits))
 	continue;
-      unsigned int nunits = cur / GET_MODE_SIZE (smode);
       if (mode_for_vector (smode, nunits).exists (&vmode)
 	  && VECTOR_MODE_P (vmode)
 	  && targetm.vectorize.get_mask_mode (nunits, cur).exists (&mask_mode)
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-10-23 17:22:28.835953330 +0100
+++ gcc/tree-vect-loop.c	2017-10-23 17:22:32.727227124 +0100
@@ -2327,11 +2327,12 @@ vect_analyze_loop_2 (loop_vec_info loop_
 vect_analyze_loop (struct loop *loop, loop_vec_info orig_loop_vinfo)
 {
   loop_vec_info loop_vinfo;
-  unsigned int vector_sizes;
+  auto_vector_sizes vector_sizes;
 
   /* Autodetect first vector size we try.  */
   current_vector_size = 0;
-  vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
+  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes);
+  unsigned int next_size = 0;
 
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -2347,6 +2348,7 @@ vect_analyze_loop (struct loop *loop, lo
       return NULL;
     }
 
+  poly_uint64 autodetected_vector_size = 0;
   while (1)
     {
       /* Check the CFG characteristics of the loop (nesting, entry/exit).  */
@@ -2373,18 +2375,28 @@ vect_analyze_loop (struct loop *loop, lo
 
       delete loop_vinfo;
 
-      vector_sizes &= ~current_vector_size;
+      if (next_size == 0)
+	autodetected_vector_size = current_vector_size;
+
+      if (next_size < vector_sizes.length ()
+	  && must_eq (vector_sizes[next_size], autodetected_vector_size))
+	next_size += 1;
+
       if (fatal
-	  || vector_sizes == 0
-	  || current_vector_size == 0)
+	  || next_size == vector_sizes.length ()
+	  || known_zero (current_vector_size))
 	return NULL;
 
       /* Try the next biggest vector size.  */
-      current_vector_size = 1 << floor_log2 (vector_sizes);
+      current_vector_size = vector_sizes[next_size++];
       if (dump_enabled_p ())
-	dump_printf_loc (MSG_NOTE, vect_location,
-			 "***** Re-trying analysis with "
-			 "vector size %d\n", current_vector_size);
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "***** Re-trying analysis with "
+			   "vector size ");
+	  dump_dec (MSG_NOTE, current_vector_size);
+	  dump_printf (MSG_NOTE, "\n");
+	}
     }
 }
 
@@ -7686,9 +7698,12 @@ vect_transform_loop (loop_vec_info loop_
 	  dump_printf (MSG_NOTE, "\n");
 	}
       else
-	dump_printf_loc (MSG_NOTE, vect_location,
-			 "LOOP EPILOGUE VECTORIZED (VS=%d)\n",
-			 current_vector_size);
+	{
+	  dump_printf_loc (MSG_NOTE, vect_location,
+			   "LOOP EPILOGUE VECTORIZED (VS=");
+	  dump_dec (MSG_NOTE, current_vector_size);
+	  dump_printf (MSG_NOTE, ")\n");
+	}
     }
 
   /* Free SLP instances here because otherwise stmt reference counting
@@ -7705,31 +7720,39 @@ vect_transform_loop (loop_vec_info loop_
   if (LOOP_VINFO_EPILOGUE_P (loop_vinfo))
     epilogue = NULL;
 
+  if (!PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK))
+    epilogue = NULL;
+
   if (epilogue)
     {
-	unsigned int vector_sizes
-	  = targetm.vectorize.autovectorize_vector_sizes ();
-	vector_sizes &= current_vector_size - 1;
-
-	if (!PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK))
-	  epilogue = NULL;
-	else if (!vector_sizes)
-	  epilogue = NULL;
-	else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-		 && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0
-		 && must_eq (vf, lowest_vf))
-	  {
-	    int smallest_vec_size = 1 << ctz_hwi (vector_sizes);
-	    int ratio = current_vector_size / smallest_vec_size;
-	    unsigned HOST_WIDE_INT eiters = LOOP_VINFO_INT_NITERS (loop_vinfo)
-	      - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
-	    eiters = eiters % lowest_vf;
-
-	    epilogue->nb_iterations_upper_bound = eiters - 1;
+      auto_vector_sizes vector_sizes;
+      targetm.vectorize.autovectorize_vector_sizes (&vector_sizes);
+      unsigned int next_size = 0;
+
+      if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+	  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0
+	  && must_eq (vf, lowest_vf))
+	{
+	  unsigned int eiters
+	    = (LOOP_VINFO_INT_NITERS (loop_vinfo)
+	       - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo));
+	  eiters = eiters % lowest_vf;
+	  epilogue->nb_iterations_upper_bound = eiters - 1;
+
+	  unsigned int ratio;
+	  while (next_size < vector_sizes.length ()
+		 && !(constant_multiple_p (current_vector_size,
+					   vector_sizes[next_size], &ratio)
+		      && eiters >= lowest_vf / ratio))
+	    next_size += 1;
+	}
+      else
+	while (next_size < vector_sizes.length ()
+	       && may_lt (current_vector_size, vector_sizes[next_size]))
+	  next_size += 1;
 
-	    if (eiters < lowest_vf / ratio)
-	      epilogue = NULL;
-	    }
+      if (next_size == vector_sizes.length ())
+	epilogue = NULL;
     }
 
   if (epilogue)
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-10-23 17:22:28.836953531 +0100
+++ gcc/tree-vect-slp.c	2017-10-23 17:22:32.728227020 +0100
@@ -3018,18 +3018,20 @@ vect_slp_bb (basic_block bb)
 {
   bb_vec_info bb_vinfo;
   gimple_stmt_iterator gsi;
-  unsigned int vector_sizes;
   bool any_vectorized = false;
+  auto_vector_sizes vector_sizes;
 
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n");
 
   /* Autodetect first vector size we try.  */
   current_vector_size = 0;
-  vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
+  targetm.vectorize.autovectorize_vector_sizes (&vector_sizes);
+  unsigned int next_size = 0;
 
   gsi = gsi_start_bb (bb);
 
+  poly_uint64 autodetected_vector_size = 0;
   while (1)
     {
       if (gsi_end_p (gsi))
@@ -3084,10 +3086,16 @@ vect_slp_bb (basic_block bb)
 
       any_vectorized |= vectorized;
 
-      vector_sizes &= ~current_vector_size;
+      if (next_size == 0)
+	autodetected_vector_size = current_vector_size;
+
+      if (next_size < vector_sizes.length ()
+	  && must_eq (vector_sizes[next_size], autodetected_vector_size))
+	next_size += 1;
+
       if (vectorized
-	  || vector_sizes == 0
-	  || current_vector_size == 0
+	  || next_size == vector_sizes.length ()
+	  || known_zero (current_vector_size)
 	  /* If vect_slp_analyze_bb_1 signaled that analysis for all
 	     vector sizes will fail do not bother iterating.  */
 	  || fatal)
@@ -3100,16 +3108,20 @@ vect_slp_bb (basic_block bb)
 
 	  /* And reset vector sizes.  */
 	  current_vector_size = 0;
-	  vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
+	  next_size = 0;
 	}
       else
 	{
 	  /* Try the next biggest vector size.  */
-	  current_vector_size = 1 << floor_log2 (vector_sizes);
+	  current_vector_size = vector_sizes[next_size++];
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, vect_location,
-			     "***** Re-trying analysis with "
-			     "vector size %d\n", current_vector_size);
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location,
+			       "***** Re-trying analysis with "
+			       "vector size ");
+	      dump_dec (MSG_NOTE, current_vector_size);
+	      dump_printf (MSG_NOTE, "\n");
+	    }
 
 	  /* Start over.  */
 	  gsi = region_begin;
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2017-10-23 17:22:30.979383401 +0100
+++ gcc/doc/tm.texi	2017-10-23 17:22:32.722227643 +0100
@@ -5839,11 +5839,15 @@ equal to @code{word_mode}, because the v
 transformations even in absence of specialized @acronym{SIMD} hardware.
 @end deftypefn
 
-@deftypefn {Target Hook} {unsigned int} TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES (void)
-This hook should return a mask of sizes that should be iterated over
-after trying to autovectorize using the vector size derived from the
-mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.
-The default is zero which means to not iterate over other vector sizes.
+@deftypefn {Target Hook} void TARGET_VECTORIZE_AUTOVECTORIZE_VECTOR_SIZES (vector_sizes *@var{sizes})
+If the mode returned by @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is not
+the only one that is worth considering, this hook should add all suitable
+vector sizes to @var{sizes}, in order of decreasing preference.  The first
+one should be the size of @code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE}.
+
+The hook does not need to do anything if the vector returned by
+@code{TARGET_VECTORIZE_PREFERRED_SIMD_MODE} is the only one relevant
+for autovectorization.  The default implementation does nothing.
 @end deftypefn
 
 @deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE (poly_uint64 @var{nunits}, poly_uint64 @var{length})
Index: gcc/tree-vectorizer.h
===================================================================
--- gcc/tree-vectorizer.h	2017-10-23 17:22:28.837953732 +0100
+++ gcc/tree-vectorizer.h	2017-10-23 17:22:32.731226709 +0100
@@ -1199,7 +1199,7 @@ extern source_location find_loop_locatio
 extern bool vect_can_advance_ivs_p (loop_vec_info);
 
 /* In tree-vect-stmts.c.  */
-extern unsigned int current_vector_size;
+extern poly_uint64 current_vector_size;
 extern tree get_vectype_for_scalar_type (tree);
 extern tree get_mask_type_for_scalar_type (tree);
 extern tree get_same_sized_vectype (tree, tree);
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-10-23 17:22:28.837953732 +0100
+++ gcc/tree-vect-stmts.c	2017-10-23 17:22:32.730226813 +0100
@@ -9084,12 +9084,12 @@ free_stmt_vec_info (gimple *stmt)
    by the target.  */
 
 static tree
-get_vectype_for_scalar_type_and_size (tree scalar_type, unsigned size)
+get_vectype_for_scalar_type_and_size (tree scalar_type, poly_uint64 size)
 {
   tree orig_scalar_type = scalar_type;
   scalar_mode inner_mode;
   machine_mode simd_mode;
-  int nunits;
+  poly_uint64 nunits;
   tree vectype;
 
   if (!is_int_mode (TYPE_MODE (scalar_type), &inner_mode)
@@ -9131,13 +9131,13 @@ get_vectype_for_scalar_type_and_size (tr
 
   /* If no size was supplied use the mode the target prefers.   Otherwise
      lookup a vector mode of the specified size.  */
-  if (size == 0)
+  if (known_zero (size))
     simd_mode = targetm.vectorize.preferred_simd_mode (inner_mode);
-  else if (!mode_for_vector (inner_mode, size / nbytes).exists (&simd_mode))
+  else if (!multiple_p (size, nbytes, &nunits)
+	   || !mode_for_vector (inner_mode, nunits).exists (&simd_mode))
     return NULL_TREE;
-  nunits = GET_MODE_SIZE (simd_mode) / nbytes;
   /* NOTE: nunits == 1 is allowed to support single element vector types.  */
-  if (nunits < 1)
+  if (!multiple_p (GET_MODE_SIZE (simd_mode), nbytes, &nunits))
     return NULL_TREE;
 
   vectype = build_vector_type (scalar_type, nunits);
@@ -9155,7 +9155,7 @@ get_vectype_for_scalar_type_and_size (tr
   return vectype;
 }
 
-unsigned int current_vector_size;
+poly_uint64 current_vector_size;
 
 /* Function get_vectype_for_scalar_type.
 
@@ -9169,7 +9169,7 @@ get_vectype_for_scalar_type (tree scalar
   vectype = get_vectype_for_scalar_type_and_size (scalar_type,
 						  current_vector_size);
   if (vectype
-      && current_vector_size == 0)
+      && known_zero (current_vector_size))
     current_vector_size = GET_MODE_SIZE (TYPE_MODE (vectype));
   return vectype;
 }
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 17:22:21.308442966 +0100
+++ gcc/tree.h	2017-10-23 17:22:32.736226191 +0100
@@ -4108,7 +4108,13 @@ extern tree build_reference_type_for_mod
 extern tree build_reference_type (tree);
 extern tree build_vector_type_for_mode (tree, machine_mode);
 extern tree build_vector_type (tree innertype, int nunits);
-extern tree build_truth_vector_type (unsigned, unsigned);
+/* Temporary.  */
+inline tree
+build_vector_type (tree innertype, poly_uint64 nunits)
+{
+  return build_vector_type (innertype, (int) nunits.to_constant ());
+}
+extern tree build_truth_vector_type (poly_uint64, poly_uint64);
 extern tree build_same_sized_truth_vector_type (tree vectype);
 extern tree build_opaque_vector_type (tree innertype, int nunits);
 extern tree build_index_type (tree);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 17:22:21.307442765 +0100
+++ gcc/tree.c	2017-10-23 17:22:32.734226398 +0100
@@ -9662,6 +9662,13 @@ make_vector_type (tree innertype, int nu
   return t;
 }
 
+/* Temporary.  */
+static tree
+make_vector_type (tree innertype, poly_uint64 nunits, machine_mode mode)
+{
+  return make_vector_type (innertype, (int) nunits.to_constant (), mode);
+}
+
 static tree
 make_or_reuse_type (unsigned size, int unsignedp)
 {
@@ -10559,19 +10566,18 @@ build_vector_type (tree innertype, int n
 /* Build truth vector with specified length and number of units.  */
 
 tree
-build_truth_vector_type (unsigned nunits, unsigned vector_size)
+build_truth_vector_type (poly_uint64 nunits, poly_uint64 vector_size)
 {
   machine_mode mask_mode
     = targetm.vectorize.get_mask_mode (nunits, vector_size).else_blk ();
 
-  unsigned HOST_WIDE_INT vsize;
+  poly_uint64 vsize;
   if (mask_mode == BLKmode)
     vsize = vector_size * BITS_PER_UNIT;
   else
     vsize = GET_MODE_BITSIZE (mask_mode);
 
-  unsigned HOST_WIDE_INT esize = vsize / nunits;
-  gcc_assert (esize * nunits == vsize);
+  unsigned HOST_WIDE_INT esize = vector_element_size (vsize, nunits);
 
   tree bool_type = build_nonstandard_boolean_type (esize);
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [067/nnn] poly_int: get_mask_mode
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (65 preceding siblings ...)
  2017-10-23 17:27 ` [064/nnn] poly_int: SLP max_units Richard Sandiford
@ 2017-10-23 17:28 ` Richard Sandiford
  2017-11-28 16:48   ` Jeff Law
  2017-10-23 17:28 ` [068/nnn] poly_int: current_vector_size and TARGET_AUTOVECTORIZE_VECTOR_SIZES Richard Sandiford
                   ` (40 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:28 UTC (permalink / raw)
  To: gcc-patches

This patch makes TARGET_GET_MASK_MODE take polynomial nunits and
vector_size arguments.  The gcc_assert in default_get_mask_mode
is now handled by the exact_div call in vector_element_size.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* target.def (get_mask_mode): Take the number of units and length
	as poly_uint64s rather than unsigned ints.
	* targhooks.h (default_get_mask_mode): Update accordingly.
	* targhooks.c (default_get_mask_mode): Likewise.
	* config/i386/i386.c (ix86_get_mask_mode): Likewise.
	* doc/tm.texi: Regenerate.

Index: gcc/target.def
===================================================================
--- gcc/target.def	2017-10-23 17:19:01.411170305 +0100
+++ gcc/target.def	2017-10-23 17:22:30.980383601 +0100
@@ -1901,7 +1901,7 @@ The default implementation returns the m
 is @var{length} bytes long and that contains @var{nunits} elements,\n\
 if such a mode exists.",
  opt_machine_mode,
- (unsigned nunits, unsigned length),
+ (poly_uint64 nunits, poly_uint64 length),
  default_get_mask_mode)
 
 /* Target builtin that implements vector gather operation.  */
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	2017-10-23 17:19:01.411170305 +0100
+++ gcc/targhooks.h	2017-10-23 17:22:30.980383601 +0100
@@ -107,7 +107,7 @@ default_builtin_support_vector_misalignm
 					     int, bool);
 extern machine_mode default_preferred_simd_mode (scalar_mode mode);
 extern unsigned int default_autovectorize_vector_sizes (void);
-extern opt_machine_mode default_get_mask_mode (unsigned, unsigned);
+extern opt_machine_mode default_get_mask_mode (poly_uint64, poly_uint64);
 extern void *default_init_cost (struct loop *);
 extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
 				       struct _stmt_vec_info *, int,
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2017-10-23 17:19:01.411170305 +0100
+++ gcc/targhooks.c	2017-10-23 17:22:30.980383601 +0100
@@ -1254,17 +1254,17 @@ default_autovectorize_vector_sizes (void
   return 0;
 }
 
-/* By defaults a vector of integers is used as a mask.  */
+/* By default a vector of integers is used as a mask.  */
 
 opt_machine_mode
-default_get_mask_mode (unsigned nunits, unsigned vector_size)
+default_get_mask_mode (poly_uint64 nunits, poly_uint64 vector_size)
 {
-  unsigned elem_size = vector_size / nunits;
+  unsigned int elem_size = vector_element_size (vector_size, nunits);
   scalar_int_mode elem_mode
     = smallest_int_mode_for_size (elem_size * BITS_PER_UNIT);
   machine_mode vector_mode;
 
-  gcc_assert (elem_size * nunits == vector_size);
+  gcc_assert (must_eq (elem_size * nunits, vector_size));
 
   if (mode_for_vector (elem_mode, nunits).exists (&vector_mode)
       && VECTOR_MODE_P (vector_mode)
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	2017-10-23 17:19:01.404170211 +0100
+++ gcc/config/i386/i386.c	2017-10-23 17:22:30.978383200 +0100
@@ -48121,7 +48121,7 @@ ix86_autovectorize_vector_sizes (void)
 /* Implemenation of targetm.vectorize.get_mask_mode.  */
 
 static opt_machine_mode
-ix86_get_mask_mode (unsigned nunits, unsigned vector_size)
+ix86_get_mask_mode (poly_uint64 nunits, poly_uint64 vector_size)
 {
   unsigned elem_size = vector_size / nunits;
 
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2017-10-23 17:19:01.408170265 +0100
+++ gcc/doc/tm.texi	2017-10-23 17:22:30.979383401 +0100
@@ -5846,7 +5846,7 @@ mode returned by @code{TARGET_VECTORIZE_
 The default is zero which means to not iterate over other vector sizes.
 @end deftypefn
 
-@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE (unsigned @var{nunits}, unsigned @var{length})
+@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE (poly_uint64 @var{nunits}, poly_uint64 @var{length})
 A vector mask is a value that holds one boolean result for every element
 in a vector.  This hook returns the machine mode that should be used to
 represent such a mask when the vector in question is @var{length} bytes

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [071/nnn] poly_int: vectorizable_induction
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (69 preceding siblings ...)
  2017-10-23 17:29 ` [069/nnn] poly_int: vector_alignment_reachable_p Richard Sandiford
@ 2017-10-23 17:29 ` Richard Sandiford
  2017-12-05 17:44   ` Jeff Law
  2017-10-23 17:30 ` [073/nnn] poly_int: vectorizable_load/store Richard Sandiford
                   ` (36 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:29 UTC (permalink / raw)
  To: gcc-patches

This patch makes vectorizable_induction cope with variable-length
vectors.  For now we punt on SLP inductions, but patchees after
the main SVE submission add support for those too.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-loop.c (vectorizable_induction): Treat the number
	of units as polynomial.  Punt on SLP inductions.  Use an integer
	VEC_SERIES_EXPR for variable-length integer reductions.  Use a
	cast of such a series for variable-length floating-point
	reductions.

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-10-23 17:22:35.829905285 +0100
+++ gcc/tree-vect-loop.c	2017-10-23 17:22:36.904793787 +0100
@@ -6624,7 +6624,7 @@ vectorizable_induction (gimple *phi,
     return false;
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
   if (slp_node)
     ncopies = 1;
@@ -6689,6 +6689,16 @@ vectorizable_induction (gimple *phi,
     iv_loop = loop;
   gcc_assert (iv_loop == (gimple_bb (phi))->loop_father);
 
+  if (slp_node && !nunits.is_constant ())
+    {
+      /* The current SLP code creates the initial value element-by-element.  */
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "SLP induction not supported for variable-length"
+			 " vectors.\n");
+      return false;
+    }
+
   if (!vec_stmt) /* transformation not required.  */
     {
       STMT_VINFO_TYPE (stmt_info) = induc_vec_info_type;
@@ -6737,6 +6747,9 @@ vectorizable_induction (gimple *phi,
      [VF*S, VF*S, VF*S, VF*S] for all.  */
   if (slp_node)
     {
+      /* Enforced above.  */
+      unsigned int const_nunits = nunits.to_constant ();
+
       /* Convert the init to the desired type.  */
       stmts = NULL;
       init_expr = gimple_convert (&stmts, TREE_TYPE (vectype), init_expr);
@@ -6765,19 +6778,20 @@ vectorizable_induction (gimple *phi,
       /* Now generate the IVs.  */
       unsigned group_size = SLP_TREE_SCALAR_STMTS (slp_node).length ();
       unsigned nvects = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-      unsigned elts = nunits * nvects;
-      unsigned nivs = least_common_multiple (group_size, nunits) / nunits;
+      unsigned elts = const_nunits * nvects;
+      unsigned nivs = least_common_multiple (group_size,
+					     const_nunits) / const_nunits;
       gcc_assert (elts % group_size == 0);
       tree elt = init_expr;
       unsigned ivn;
       for (ivn = 0; ivn < nivs; ++ivn)
 	{
-	  auto_vec<tree, 32> elts (nunits);
+	  auto_vec<tree, 32> elts (const_nunits);
 	  stmts = NULL;
-	  for (unsigned eltn = 0; eltn < nunits; ++eltn)
+	  for (unsigned eltn = 0; eltn < const_nunits; ++eltn)
 	    {
-	      if (ivn*nunits + eltn >= group_size
-		  && (ivn*nunits + eltn) % group_size == 0)
+	      if (ivn*const_nunits + eltn >= group_size
+		  && (ivn * const_nunits + eltn) % group_size == 0)
 		elt = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (elt),
 				    elt, step_expr);
 	      elts.quick_push (elt);
@@ -6814,7 +6828,7 @@ vectorizable_induction (gimple *phi,
       if (ivn < nvects)
 	{
 	  unsigned vfp
-	    = least_common_multiple (group_size, nunits) / group_size;
+	    = least_common_multiple (group_size, const_nunits) / group_size;
 	  /* Generate [VF'*S, VF'*S, ... ].  */
 	  if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)))
 	    {
@@ -6889,18 +6903,45 @@ vectorizable_induction (gimple *phi,
       stmts = NULL;
       new_name = gimple_convert (&stmts, TREE_TYPE (vectype), init_expr);
 
-      auto_vec<tree, 32> elts (nunits);
-      elts.quick_push (new_name);
-      for (i = 1; i < nunits; i++)
-	{
-	  /* Create: new_name_i = new_name + step_expr  */
-	  new_name = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (new_name),
-				   new_name, step_expr);
+      unsigned HOST_WIDE_INT const_nunits;
+      if (nunits.is_constant (&const_nunits))
+	{
+	  auto_vec<tree, 32> elts (const_nunits);
 	  elts.quick_push (new_name);
+	  for (i = 1; i < const_nunits; i++)
+	    {
+	      /* Create: new_name_i = new_name + step_expr  */
+	      new_name = gimple_build (&stmts, PLUS_EXPR, TREE_TYPE (new_name),
+				       new_name, step_expr);
+	      elts.quick_push (new_name);
+	    }
+	  /* Create a vector from [new_name_0, new_name_1, ...,
+	     new_name_nunits-1]  */
+	  vec_init = gimple_build_vector (&stmts, vectype, elts);
+	}
+      else if (INTEGRAL_TYPE_P (TREE_TYPE (step_expr)))
+	/* Build the initial value directly from a VEC_SERIES_EXPR.  */
+	vec_init = gimple_build (&stmts, VEC_SERIES_EXPR, vectype,
+				 new_name, step_expr);
+      else
+	{
+	  /* Build:
+	        [base, base, base, ...]
+		+ (vectype) [0, 1, 2, ...] * [step, step, step, ...].  */
+	  gcc_assert (SCALAR_FLOAT_TYPE_P (TREE_TYPE (step_expr)));
+	  gcc_assert (flag_associative_math);
+	  tree index = build_index_vector (vectype, 0, 1);
+	  tree base_vec = gimple_build_vector_from_val (&stmts, vectype,
+							new_name);
+	  tree step_vec = gimple_build_vector_from_val (&stmts, vectype,
+							step_expr);
+	  vec_init = gimple_build (&stmts, FLOAT_EXPR, vectype, index);
+	  vec_init = gimple_build (&stmts, MULT_EXPR, vectype,
+				   vec_init, step_vec);
+	  vec_init = gimple_build (&stmts, PLUS_EXPR, vectype,
+				   vec_init, base_vec);
 	}
-      /* Create a vector from [new_name_0, new_name_1, ...,
-	 new_name_nunits-1]  */
-      vec_init = gimple_build_vector (&stmts, vectype, elts);
+
       if (stmts)
 	{
 	  new_bb = gsi_insert_seq_on_edge_immediate (pe, stmts);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [069/nnn] poly_int: vector_alignment_reachable_p
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (68 preceding siblings ...)
  2017-10-23 17:29 ` [070/nnn] poly_int: vectorizable_reduction Richard Sandiford
@ 2017-10-23 17:29 ` Richard Sandiford
  2017-11-28 16:48   ` Jeff Law
  2017-10-23 17:29 ` [071/nnn] poly_int: vectorizable_induction Richard Sandiford
                   ` (37 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:29 UTC (permalink / raw)
  To: gcc-patches

This patch makes vector_alignment_reachable_p cope with variable-length
vectors.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-data-refs.c (vector_alignment_reachable_p): Treat the
	number of units as polynomial.

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 17:22:26.571498977 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:22:34.681024458 +0100
@@ -1153,16 +1153,17 @@ vector_alignment_reachable_p (struct dat
 	 the prolog loop ({VF - misalignment}), is a multiple of the
 	 number of the interleaved accesses.  */
       int elem_size, mis_in_elements;
-      int nelements = TYPE_VECTOR_SUBPARTS (vectype);
 
       /* FORNOW: handle only known alignment.  */
       if (!known_alignment_for_access_p (dr))
 	return false;
 
-      elem_size = GET_MODE_SIZE (TYPE_MODE (vectype)) / nelements;
+      poly_uint64 nelements = TYPE_VECTOR_SUBPARTS (vectype);
+      poly_uint64 vector_size = GET_MODE_SIZE (TYPE_MODE (vectype));
+      elem_size = vector_element_size (vector_size, nelements);
       mis_in_elements = DR_MISALIGNMENT (dr) / elem_size;
 
-      if ((nelements - mis_in_elements) % GROUP_SIZE (stmt_info))
+      if (!multiple_p (nelements - mis_in_elements, GROUP_SIZE (stmt_info)))
 	return false;
     }
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [070/nnn] poly_int: vectorizable_reduction
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (67 preceding siblings ...)
  2017-10-23 17:28 ` [068/nnn] poly_int: current_vector_size and TARGET_AUTOVECTORIZE_VECTOR_SIZES Richard Sandiford
@ 2017-10-23 17:29 ` Richard Sandiford
  2017-11-22 18:11   ` Richard Sandiford
  2017-10-23 17:29 ` [069/nnn] poly_int: vector_alignment_reachable_p Richard Sandiford
                   ` (38 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:29 UTC (permalink / raw)
  To: gcc-patches

This patch makes vectorizable_reduction cope with variable-length vectors.
We can handle the simple case of an inner loop reduction for which
the target has native support for the epilogue operation.  For now we
punt on other cases, but patches after the main SVE submission allow
SLP and double reductions too.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree.h (build_index_vector): Declare.
	* tree.c (build_index_vector): New function.
	* tree-vect-loop.c (get_initial_def_for_reduction): Treat the number
	of units as polynomial, forcibly converting it to a constant if
	vectorizable_reduction has already enforced the condition.
	(get_initial_defs_for_reduction): Likewise.
	(vect_create_epilog_for_reduction): Likewise.  Use build_index_vector
	to create a {1,2,3,...} vector.
	(vectorizable_reduction): Treat the number of units as polynomial.
	Choose vectype_in based on the largest scalar element size rather
	than the smallest number of units.  Enforce the restrictions
	relied on above.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 17:22:32.736226191 +0100
+++ gcc/tree.h	2017-10-23 17:22:35.831905077 +0100
@@ -4050,6 +4050,7 @@ extern tree build_vector (tree, vec<tree
 extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
 extern tree build_vector_from_val (tree, tree);
 extern tree build_vec_series (tree, tree, tree);
+extern tree build_index_vector (tree, poly_uint64, poly_uint64);
 extern void recompute_constructor_flags (tree);
 extern void verify_constructor_flags (tree);
 extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 17:22:32.734226398 +0100
+++ gcc/tree.c	2017-10-23 17:22:35.830905181 +0100
@@ -1974,6 +1974,37 @@ build_vec_series (tree type, tree base,
   return build2 (VEC_SERIES_EXPR, type, base, step);
 }
 
+/* Return a vector with the same number of units and number of bits
+   as VEC_TYPE, but in which the elements are a linear series of unsigned
+   integers { BASE, BASE + STEP, BASE + STEP * 2, ... }.  */
+
+tree
+build_index_vector (tree vec_type, poly_uint64 base, poly_uint64 step)
+{
+  tree index_vec_type = vec_type;
+  tree index_elt_type = TREE_TYPE (vec_type);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vec_type);
+  if (!INTEGRAL_TYPE_P (index_elt_type) || !TYPE_UNSIGNED (index_elt_type))
+    {
+      index_elt_type = build_nonstandard_integer_type
+	(GET_MODE_BITSIZE (SCALAR_TYPE_MODE (index_elt_type)), true);
+      index_vec_type = build_vector_type (index_elt_type, nunits);
+    }
+
+  unsigned HOST_WIDE_INT count;
+  if (nunits.is_constant (&count))
+    {
+      auto_vec<tree, 32> v (count);
+      for (unsigned int i = 0; i < count; ++i)
+	v.quick_push (build_int_cstu (index_elt_type, base + i * step));
+      return build_vector (index_vec_type, v);
+    }
+
+  return build_vec_series (index_vec_type,
+			   build_int_cstu (index_elt_type, base),
+			   build_int_cstu (index_elt_type, step));
+}
+
 /* Something has messed with the elements of CONSTRUCTOR C after it was built;
    calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
 
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-10-23 17:22:32.727227124 +0100
+++ gcc/tree-vect-loop.c	2017-10-23 17:22:35.829905285 +0100
@@ -3997,11 +3997,10 @@ get_initial_def_for_reduction (gimple *s
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree scalar_type = TREE_TYPE (init_val);
   tree vectype = get_vectype_for_scalar_type (scalar_type);
-  int nunits;
+  poly_uint64 nunits;
   enum tree_code code = gimple_assign_rhs_code (stmt);
   tree def_for_init;
   tree init_def;
-  int i;
   bool nested_in_vect_loop = false;
   REAL_VALUE_TYPE real_init_val = dconst0;
   int int_init_val = 0;
@@ -4082,9 +4081,13 @@ get_initial_def_for_reduction (gimple *s
 	else
 	  {
 	    /* Option2: the first element is INIT_VAL.  */
-	    auto_vec<tree, 32> elts (nunits);
+
+	    /* Enforced by vectorizable_reduction (which disallows double
+	       reductions with variable-length vectors).  */
+	    unsigned int count = nunits.to_constant ();
+	    auto_vec<tree, 32> elts (count);
 	    elts.quick_push (init_val);
-	    for (i = 1; i < nunits; ++i)
+	    for (unsigned int i = 1; i < count; ++i)
 	      elts.quick_push (def_for_init);
 	    init_def = gimple_build_vector (&stmts, vectype, elts);
 	  }
@@ -4144,6 +4147,8 @@ get_initial_defs_for_reduction (slp_tree
 
   vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
   scalar_type = TREE_TYPE (vector_type);
+  /* vectorizable_reduction has already rejected SLP reductions on
+     variable-length vectors.  */
   nunits = TYPE_VECTOR_SUBPARTS (vector_type);
 
   gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def);
@@ -4510,8 +4515,7 @@ vect_create_epilog_for_reduction (vec<tr
   if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
     {
       tree indx_before_incr, indx_after_incr;
-      int nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
-      int k;
+      poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
 
       gimple *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
       gcc_assert (gimple_assign_rhs_code (vec_stmt) == VEC_COND_EXPR);
@@ -4527,10 +4531,7 @@ vect_create_epilog_for_reduction (vec<tr
 	 vector size (STEP).  */
 
       /* Create a {1,2,3,...} vector.  */
-      auto_vec<tree, 32> vtemp (nunits_out);
-      for (k = 0; k < nunits_out; ++k)
-	vtemp.quick_push (build_int_cst (cr_index_scalar_type, k + 1));
-      tree series_vect = build_vector (cr_index_vector_type, vtemp);
+      tree series_vect = build_index_vector (cr_index_vector_type, 1, 1);
 
       /* Create a vector of the step value.  */
       tree step = build_int_cst (cr_index_scalar_type, nunits_out);
@@ -4911,8 +4912,11 @@ vect_create_epilog_for_reduction (vec<tr
       tree data_eltype = TREE_TYPE (TREE_TYPE (new_phi_result));
       tree idx_eltype = TREE_TYPE (TREE_TYPE (induction_index));
       unsigned HOST_WIDE_INT el_size = tree_to_uhwi (TYPE_SIZE (idx_eltype));
-      unsigned HOST_WIDE_INT v_size
-	= el_size * TYPE_VECTOR_SUBPARTS (TREE_TYPE (induction_index));
+      poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (induction_index));
+      /* Enforced by vectorizable_reduction, which ensures we have target
+	 support before allowing a conditional reduction on variable-length
+	 vectors.  */
+      unsigned HOST_WIDE_INT v_size = el_size * nunits.to_constant ();
       tree idx_val = NULL_TREE, val = NULL_TREE;
       for (unsigned HOST_WIDE_INT off = 0; off < v_size; off += el_size)
 	{
@@ -5024,6 +5028,9 @@ vect_create_epilog_for_reduction (vec<tr
     {
       bool reduce_with_shift = have_whole_vector_shift (mode);
       int element_bitsize = tree_to_uhwi (bitsize);
+      /* Enforced by vectorizable_reduction, which disallows SLP reductions
+	 for variable-length vectors and also requires direct target support
+	 for loop reductions.  */
       int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype));
       tree vec_temp;
 
@@ -5703,10 +5710,10 @@ vectorizable_reduction (gimple *stmt, gi
 	  if (k == 1
 	      && gimple_assign_rhs_code (reduc_stmt) == COND_EXPR)
 	    continue;
-	  tem = get_vectype_for_scalar_type (TREE_TYPE (op));
-	  if (! vectype_in
-	      || TYPE_VECTOR_SUBPARTS (tem) < TYPE_VECTOR_SUBPARTS (vectype_in))
-	    vectype_in = tem;
+	  if (!vectype_in
+	      || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
+		  < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (op)))))
+	    vectype_in = get_vectype_for_scalar_type (TREE_TYPE (op));
 	  break;
 	}
       gcc_assert (vectype_in);
@@ -6016,6 +6023,7 @@ vectorizable_reduction (gimple *stmt, gi
   gcc_assert (ncopies >= 1);
 
   vec_mode = TYPE_MODE (vectype_in);
+  poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
 
   if (code == COND_EXPR)
     {
@@ -6205,8 +6213,8 @@ vectorizable_reduction (gimple *stmt, gi
       int scalar_precision
 	= GET_MODE_PRECISION (SCALAR_TYPE_MODE (scalar_type));
       cr_index_scalar_type = make_unsigned_type (scalar_precision);
-      cr_index_vector_type = build_vector_type
-	(cr_index_scalar_type, TYPE_VECTOR_SUBPARTS (vectype_out));
+      cr_index_vector_type = build_vector_type (cr_index_scalar_type,
+						nunits_out);
 
       optab = optab_for_tree_code (REDUC_MAX_EXPR, cr_index_vector_type,
 				   optab_default);
@@ -6215,6 +6223,15 @@ vectorizable_reduction (gimple *stmt, gi
 	epilog_reduc_code = REDUC_MAX_EXPR;
     }
 
+  if (epilog_reduc_code == ERROR_MARK && !nunits_out.is_constant ())
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "missing target support for reduction on"
+			 " variable-length vectors.\n");
+      return false;
+    }
+
   if ((double_reduc
        || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) != TREE_CODE_REDUCTION)
       && ncopies > 1)
@@ -6226,6 +6243,27 @@ vectorizable_reduction (gimple *stmt, gi
       return false;
     }
 
+  if (double_reduc && !nunits_out.is_constant ())
+    {
+      /* The current double-reduction code creates the initial value
+	 element-by-element.  */
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "double reduction not supported for variable-length"
+			 " vectors.\n");
+      return false;
+    }
+
+  if (slp_node && !nunits_out.is_constant ())
+    {
+      /* The current SLP code creates the initial value element-by-element.  */
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "SLP reduction not supported for variable-length"
+			 " vectors.\n");
+      return false;
+    }
+
   /* In case of widenning multiplication by a constant, we update the type
      of the constant to be the type of the other operand.  We check that the
      constant fits the type in the pattern recognition pass.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [073/nnn] poly_int: vectorizable_load/store
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (70 preceding siblings ...)
  2017-10-23 17:29 ` [071/nnn] poly_int: vectorizable_induction Richard Sandiford
@ 2017-10-23 17:30 ` Richard Sandiford
  2017-12-06  0:51   ` Jeff Law
  2017-10-23 17:30 ` [072/nnn] poly_int: vectorizable_live_operation Richard Sandiford
                   ` (35 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:30 UTC (permalink / raw)
  To: gcc-patches

This patch makes vectorizable_load and vectorizable_store cope with
variable-length vectors.  The reverse and permute cases will be
excluded by the code that checks the permutation mask (although a
patch after the main SVE submission adds support for the reversed
case).  Here we also need to exclude VMAT_ELEMENTWISE and
VMAT_STRIDED_SLP, which split the operation up into a constant
number of constant-sized operations.  We also don't try to extend
the current widening gather/scatter support to variable-length
vectors, since SVE uses a different approach.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-stmts.c (get_load_store_type): Treat the number of
	units as polynomial.  Reject VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
	for variable-length vectors.
	(vectorizable_mask_load_store): Treat the number of units as
	polynomial, asserting that it is constant if the condition has
	already been enforced.
	(vectorizable_store, vectorizable_load): Likewise.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-10-23 17:22:32.730226813 +0100
+++ gcc/tree-vect-stmts.c	2017-10-23 17:22:38.938582823 +0100
@@ -1955,6 +1955,7 @@ get_load_store_type (gimple *stmt, tree
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   vec_info *vinfo = stmt_info->vinfo;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   if (STMT_VINFO_GATHER_SCATTER_P (stmt_info))
     {
       *memory_access_type = VMAT_GATHER_SCATTER;
@@ -1998,6 +1999,17 @@ get_load_store_type (gimple *stmt, tree
 	*memory_access_type = VMAT_CONTIGUOUS;
     }
 
+  if ((*memory_access_type == VMAT_ELEMENTWISE
+       || *memory_access_type == VMAT_STRIDED_SLP)
+      && !nunits.is_constant ())
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "Not using elementwise accesses due to variable "
+			 "vectorization factor.\n");
+      return false;
+    }
+
   /* FIXME: At the moment the cost model seems to underestimate the
      cost of using elementwise accesses.  This check preserves the
      traditional behavior until that can be fixed.  */
@@ -2038,7 +2050,7 @@ vectorizable_mask_load_store (gimple *st
   tree dummy;
   tree dataref_ptr = NULL_TREE;
   gimple *ptr_incr;
-  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
   int i, j;
   bool inv_p;
@@ -2168,7 +2180,8 @@ vectorizable_mask_load_store (gimple *st
       gimple_seq seq;
       basic_block new_bb;
       enum { NARROW, NONE, WIDEN } modifier;
-      int gather_off_nunits = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype);
+      poly_uint64 gather_off_nunits
+	= TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype);
 
       rettype = TREE_TYPE (TREE_TYPE (gs_info.decl));
       srctype = TREE_VALUE (arglist); arglist = TREE_CHAIN (arglist);
@@ -2179,32 +2192,37 @@ vectorizable_mask_load_store (gimple *st
       gcc_checking_assert (types_compatible_p (srctype, rettype)
 			   && types_compatible_p (srctype, masktype));
 
-      if (nunits == gather_off_nunits)
+      if (must_eq (nunits, gather_off_nunits))
 	modifier = NONE;
-      else if (nunits == gather_off_nunits / 2)
+      else if (must_eq (nunits * 2, gather_off_nunits))
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (gather_off_nunits);
-	  for (i = 0; i < gather_off_nunits; ++i)
-	    sel.quick_push (i | nunits);
+	  /* Currently widening gathers and scatters are only supported for
+	     fixed-length vectors.  */
+	  int count = gather_off_nunits.to_constant ();
+	  auto_vec_perm_indices sel (count);
+	  for (i = 0; i < count; ++i)
+	    sel.quick_push (i | (count / 2));
 
 	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
 	}
-      else if (nunits == gather_off_nunits * 2)
+      else if (must_eq (nunits, gather_off_nunits * 2))
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
-	  sel.quick_grow (nunits);
-	  for (i = 0; i < nunits; ++i)
-	    sel[i] = i < gather_off_nunits
-		     ? i : i + nunits - gather_off_nunits;
+	  /* Currently narrowing gathers and scatters are only supported for
+	     fixed-length vectors.  */
+	  int count = nunits.to_constant ();
+	  auto_vec_perm_indices sel (count);
+	  sel.quick_grow (count);
+	  for (i = 0; i < count; ++i)
+	    sel[i] = i < count / 2 ? i : i + count / 2;
 
 	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
 	  ncopies *= 2;
-	  for (i = 0; i < nunits; ++i)
-	    sel[i] = i | gather_off_nunits;
+	  for (i = 0; i < count; ++i)
+	    sel[i] = i | (count / 2);
 	  mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel);
 	}
       else
@@ -5713,7 +5731,7 @@ vectorizable_store (gimple *stmt, gimple
   gcc_assert (gimple_assign_single_p (stmt));
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info), rhs_vectype = NULL_TREE;
-  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
   if (loop_vinfo)
     {
@@ -5807,28 +5825,35 @@ vectorizable_store (gimple *stmt, gimple
       gimple_seq seq;
       basic_block new_bb;
       enum { NARROW, NONE, WIDEN } modifier;
-      int scatter_off_nunits = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype);
+      poly_uint64 scatter_off_nunits
+	= TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype);
 
-      if (nunits == (unsigned int) scatter_off_nunits)
+      if (must_eq (nunits, scatter_off_nunits))
 	modifier = NONE;
-      else if (nunits == (unsigned int) scatter_off_nunits / 2)
+      else if (must_eq (nunits * 2, scatter_off_nunits))
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (scatter_off_nunits);
-	  for (i = 0; i < (unsigned int) scatter_off_nunits; ++i)
-	    sel.quick_push (i | nunits);
+	  /* Currently gathers and scatters are only supported for
+	     fixed-length vectors.  */
+	  unsigned int count = scatter_off_nunits.to_constant ();
+	  auto_vec_perm_indices sel (count);
+	  for (i = 0; i < (unsigned int) count; ++i)
+	    sel.quick_push (i | (count / 2));
 
 	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
 	  gcc_assert (perm_mask != NULL_TREE);
 	}
-      else if (nunits == (unsigned int) scatter_off_nunits * 2)
+      else if (must_eq (nunits, scatter_off_nunits * 2))
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
-	  for (i = 0; i < (unsigned int) nunits; ++i)
-	    sel.quick_push (i | scatter_off_nunits);
+	  /* Currently gathers and scatters are only supported for
+	     fixed-length vectors.  */
+	  unsigned int count = nunits.to_constant ();
+	  auto_vec_perm_indices sel (count);
+	  for (i = 0; i < (unsigned int) count; ++i)
+	    sel.quick_push (i | (count / 2));
 
 	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
 	  gcc_assert (perm_mask != NULL_TREE);
@@ -6002,6 +6027,8 @@ vectorizable_store (gimple *stmt, gimple
       tree stride_base, stride_step, alias_off;
       tree vec_oprnd;
       unsigned int g;
+      /* Checked by get_load_store_type.  */
+      unsigned int const_nunits = nunits.to_constant ();
 
       gcc_assert (!nested_in_vect_loop_p (loop, stmt));
 
@@ -6031,16 +6058,16 @@ vectorizable_store (gimple *stmt, gimple
 	     ...
          */
 
-      unsigned nstores = nunits;
+      unsigned nstores = const_nunits;
       unsigned lnel = 1;
       tree ltype = elem_type;
       tree lvectype = vectype;
       if (slp)
 	{
-	  if (group_size < nunits
-	      && nunits % group_size == 0)
+	  if (group_size < const_nunits
+	      && const_nunits % group_size == 0)
 	    {
-	      nstores = nunits / group_size;
+	      nstores = const_nunits / group_size;
 	      lnel = group_size;
 	      ltype = build_vector_type (elem_type, group_size);
 	      lvectype = vectype;
@@ -6063,17 +6090,17 @@ vectorizable_store (gimple *stmt, gimple
 		  unsigned lsize
 		    = group_size * GET_MODE_BITSIZE (elmode);
 		  elmode = int_mode_for_size (lsize, 0).require ();
+		  unsigned int lnunits = const_nunits / group_size;
 		  /* If we can't construct such a vector fall back to
 		     element extracts from the original vector type and
 		     element size stores.  */
-		  if (mode_for_vector (elmode,
-				       nunits / group_size).exists (&vmode)
+		  if (mode_for_vector (elmode, lnunits).exists (&vmode)
 		      && VECTOR_MODE_P (vmode)
 		      && (convert_optab_handler (vec_extract_optab,
 						 vmode, elmode)
 			  != CODE_FOR_nothing))
 		    {
-		      nstores = nunits / group_size;
+		      nstores = lnunits;
 		      lnel = group_size;
 		      ltype = build_nonstandard_integer_type (lsize, 1);
 		      lvectype = build_vector_type (ltype, nstores);
@@ -6085,11 +6112,11 @@ vectorizable_store (gimple *stmt, gimple
 		     issue exists here for reasonable archs.  */
 		}
 	    }
-	  else if (group_size >= nunits
-		   && group_size % nunits == 0)
+	  else if (group_size >= const_nunits
+		   && group_size % const_nunits == 0)
 	    {
 	      nstores = 1;
-	      lnel = nunits;
+	      lnel = const_nunits;
 	      ltype = vectype;
 	      lvectype = vectype;
 	    }
@@ -6652,8 +6679,9 @@ vectorizable_load (gimple *stmt, gimple_
   tree dataref_offset = NULL_TREE;
   gimple *ptr_incr = NULL;
   int ncopies;
-  int i, j, group_size;
-  poly_int64 group_gap_adj;
+  int i, j;
+  unsigned int group_size;
+  poly_uint64 group_gap_adj;
   tree msq = NULL_TREE, lsq;
   tree offset = NULL_TREE;
   tree byte_offset = NULL_TREE;
@@ -6707,7 +6735,7 @@ vectorizable_load (gimple *stmt, gimple_
     return false;
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
   if (loop_vinfo)
     {
@@ -6846,28 +6874,34 @@ vectorizable_load (gimple *stmt, gimple_
       gimple_seq seq;
       basic_block new_bb;
       enum { NARROW, NONE, WIDEN } modifier;
-      int gather_off_nunits = TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype);
+      poly_uint64 gather_off_nunits
+	= TYPE_VECTOR_SUBPARTS (gs_info.offset_vectype);
 
-      if (nunits == gather_off_nunits)
+      if (must_eq (nunits, gather_off_nunits))
 	modifier = NONE;
-      else if (nunits == gather_off_nunits / 2)
+      else if (must_eq (nunits * 2, gather_off_nunits))
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (gather_off_nunits);
-	  for (i = 0; i < gather_off_nunits; ++i)
-	    sel.quick_push (i | nunits);
+	  /* Currently widening gathers are only supported for
+	     fixed-length vectors.  */
+	  int count = gather_off_nunits.to_constant ();
+	  auto_vec_perm_indices sel (count);
+	  for (i = 0; i < count; ++i)
+	    sel.quick_push (i | (count / 2));
 
 	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
 	}
-      else if (nunits == gather_off_nunits * 2)
+      else if (must_eq (nunits, gather_off_nunits * 2))
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
-	  for (i = 0; i < nunits; ++i)
-	    sel.quick_push (i < gather_off_nunits
-			    ? i : i + nunits - gather_off_nunits);
+	  /* Currently narrowing gathers are only supported for
+	     fixed-length vectors.  */
+	  int count = nunits.to_constant ();
+	  auto_vec_perm_indices sel (count);
+	  for (i = 0; i < count; ++i)
+	    sel.quick_push (i < count / 2 ? i : i + count / 2);
 
 	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
 	  ncopies *= 2;
@@ -7016,6 +7050,8 @@ vectorizable_load (gimple *stmt, gimple_
       vec<constructor_elt, va_gc> *v = NULL;
       gimple_seq stmts = NULL;
       tree stride_base, stride_step, alias_off;
+      /* Checked by get_load_store_type.  */
+      unsigned int const_nunits = nunits.to_constant ();
 
       gcc_assert (!nested_in_vect_loop);
 
@@ -7077,14 +7113,14 @@ vectorizable_load (gimple *stmt, gimple_
       prev_stmt_info = NULL;
       running_off = offvar;
       alias_off = build_int_cst (ref_type, 0);
-      int nloads = nunits;
+      int nloads = const_nunits;
       int lnel = 1;
       tree ltype = TREE_TYPE (vectype);
       tree lvectype = vectype;
       auto_vec<tree> dr_chain;
       if (memory_access_type == VMAT_STRIDED_SLP)
 	{
-	  if (group_size < nunits)
+	  if (group_size < const_nunits)
 	    {
 	      /* First check if vec_init optab supports construction from
 		 vector elts directly.  */
@@ -7096,7 +7132,7 @@ vectorizable_load (gimple *stmt, gimple_
 					     TYPE_MODE (vectype), vmode)
 		      != CODE_FOR_nothing))
 		{
-		  nloads = nunits / group_size;
+		  nloads = const_nunits / group_size;
 		  lnel = group_size;
 		  ltype = build_vector_type (TREE_TYPE (vectype), group_size);
 		}
@@ -7112,15 +7148,15 @@ vectorizable_load (gimple *stmt, gimple_
 		  unsigned lsize
 		    = group_size * TYPE_PRECISION (TREE_TYPE (vectype));
 		  elmode = int_mode_for_size (lsize, 0).require ();
+		  unsigned int lnunits = const_nunits / group_size;
 		  /* If we can't construct such a vector fall back to
 		     element loads of the original vector type.  */
-		  if (mode_for_vector (elmode,
-				       nunits / group_size).exists (&vmode)
+		  if (mode_for_vector (elmode, lnunits).exists (&vmode)
 		      && VECTOR_MODE_P (vmode)
 		      && (convert_optab_handler (vec_init_optab, vmode, elmode)
 			  != CODE_FOR_nothing))
 		    {
-		      nloads = nunits / group_size;
+		      nloads = lnunits;
 		      lnel = group_size;
 		      ltype = build_nonstandard_integer_type (lsize, 1);
 		      lvectype = build_vector_type (ltype, nloads);
@@ -7130,7 +7166,7 @@ vectorizable_load (gimple *stmt, gimple_
 	  else
 	    {
 	      nloads = 1;
-	      lnel = nunits;
+	      lnel = const_nunits;
 	      ltype = vectype;
 	    }
 	  ltype = build_aligned_type (ltype, TYPE_ALIGN (TREE_TYPE (vectype)));
@@ -7145,13 +7181,13 @@ vectorizable_load (gimple *stmt, gimple_
 	      /* We don't yet generate SLP_TREE_LOAD_PERMUTATIONs for
 		 variable VF.  */
 	      unsigned int const_vf = vf.to_constant ();
-	      ncopies = (group_size * const_vf + nunits - 1) / nunits;
+	      ncopies = CEIL (group_size * const_vf, const_nunits);
 	      dr_chain.create (ncopies);
 	    }
 	  else
 	    ncopies = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
 	}
-      int group_el = 0;
+      unsigned int group_el = 0;
       unsigned HOST_WIDE_INT
 	elsz = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (vectype)));
       for (j = 0; j < ncopies; j++)
@@ -7266,7 +7302,8 @@ vectorizable_load (gimple *stmt, gimple_
 	      /* We don't yet generate SLP_TREE_LOAD_PERMUTATIONs for
 		 variable VF.  */
 	      unsigned int const_vf = vf.to_constant ();
-	      vec_num = (group_size * const_vf + nunits - 1) / nunits;
+	      unsigned int const_nunits = nunits.to_constant ();
+	      vec_num = CEIL (group_size * const_vf, const_nunits);
 	      group_gap_adj = vf * group_size - nunits * vec_num;
 	    }
 	  else
@@ -7434,7 +7471,7 @@ vectorizable_load (gimple *stmt, gimple_
     aggr_type = vectype;
 
   prev_stmt_info = NULL;
-  int group_elt = 0;
+  poly_uint64 group_elt = 0;
   for (j = 0; j < ncopies; j++)
     {
       /* 1. Create the vector or array pointer update chain.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [074/nnn] poly_int: vectorizable_call
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (72 preceding siblings ...)
  2017-10-23 17:30 ` [072/nnn] poly_int: vectorizable_live_operation Richard Sandiford
@ 2017-10-23 17:30 ` Richard Sandiford
  2017-11-28 16:46   ` Jeff Law
  2017-10-23 17:31 ` [076/nnn] poly_int: vectorizable_conversion Richard Sandiford
                   ` (33 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:30 UTC (permalink / raw)
  To: gcc-patches

This patch makes vectorizable_call handle variable-length vectors.
The only substantial change is to use build_index_vector for
IFN_GOMP_SIMD_LANE; this makes no functional difference for
fixed-length vectors.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-stmts.c (vectorizable_call): Treat the number of
	vectors as polynomial.  Use build_index_vector for
	IFN_GOMP_SIMD_LANE.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-10-23 17:22:38.938582823 +0100
+++ gcc/tree-vect-stmts.c	2017-10-23 17:22:39.943478586 +0100
@@ -2637,8 +2637,8 @@ vectorizable_call (gimple *gs, gimple_st
   tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE;
   stmt_vec_info stmt_info = vinfo_for_stmt (gs), prev_stmt_info;
   tree vectype_out, vectype_in;
-  int nunits_in;
-  int nunits_out;
+  poly_uint64 nunits_in;
+  poly_uint64 nunits_out;
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   vec_info *vinfo = stmt_info->vinfo;
@@ -2758,11 +2758,11 @@ vectorizable_call (gimple *gs, gimple_st
   /* FORNOW */
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
   nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
-  if (nunits_in == nunits_out / 2)
+  if (must_eq (nunits_in * 2, nunits_out))
     modifier = NARROW;
-  else if (nunits_out == nunits_in)
+  else if (must_eq (nunits_out, nunits_in))
     modifier = NONE;
-  else if (nunits_out == nunits_in / 2)
+  else if (must_eq (nunits_out * 2, nunits_in))
     modifier = WIDEN;
   else
     return false;
@@ -2961,11 +2961,7 @@ vectorizable_call (gimple *gs, gimple_st
 	  if (gimple_call_internal_p (stmt)
 	      && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE)
 	    {
-	      auto_vec<tree, 32> v (nunits_out);
-	      for (int k = 0; k < nunits_out; ++k)
-		v.quick_push (build_int_cst (unsigned_type_node,
-					     j * nunits_out + k));
-	      tree cst = build_vector (vectype_out, v);
+	      tree cst = build_index_vector (vectype_out, j * nunits_out, 1);
 	      tree new_var
 		= vect_get_new_ssa_name (vectype_out, vect_simple_var, "cst_");
 	      gimple *init_stmt = gimple_build_assign (new_var, cst);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [072/nnn] poly_int: vectorizable_live_operation
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (71 preceding siblings ...)
  2017-10-23 17:30 ` [073/nnn] poly_int: vectorizable_load/store Richard Sandiford
@ 2017-10-23 17:30 ` Richard Sandiford
  2017-11-28 16:47   ` Jeff Law
  2017-10-23 17:30 ` [074/nnn] poly_int: vectorizable_call Richard Sandiford
                   ` (34 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:30 UTC (permalink / raw)
  To: gcc-patches

This patch makes vectorizable_live_operation cope with variable-length
vectors.  For now we just handle cases in which we can tell at compile
time which vector contains the final result.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-loop.c (vectorizable_live_operation): Treat the number
	of units as polynomial.  Punt if we can't tell at compile time
	which vector contains the final result.

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-10-23 17:22:36.904793787 +0100
+++ gcc/tree-vect-loop.c	2017-10-23 17:22:37.879692661 +0100
@@ -7132,10 +7132,12 @@ vectorizable_live_operation (gimple *stm
   imm_use_iterator imm_iter;
   tree lhs, lhs_type, bitsize, vec_bitsize;
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   int ncopies;
   gimple *use_stmt;
   auto_vec<tree> vec_oprnds;
+  int vec_entry = 0;
+  poly_uint64 vec_index = 0;
 
   gcc_assert (STMT_VINFO_LIVE_P (stmt_info));
 
@@ -7164,6 +7166,30 @@ vectorizable_live_operation (gimple *stm
   else
     ncopies = vect_get_num_copies (loop_vinfo, vectype);
 
+  if (slp_node)
+    {
+      gcc_assert (slp_index >= 0);
+
+      int num_scalar = SLP_TREE_SCALAR_STMTS (slp_node).length ();
+      int num_vec = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
+
+      /* Get the last occurrence of the scalar index from the concatenation of
+	 all the slp vectors. Calculate which slp vector it is and the index
+	 within.  */
+      poly_uint64 pos = (num_vec * nunits) - num_scalar + slp_index;
+
+      /* Calculate which vector contains the result, and which lane of
+	 that vector we need.  */
+      if (!can_div_trunc_p (pos, nunits, &vec_entry, &vec_index))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "Cannot determine which vector holds the"
+			     " final result.\n");
+	  return false;
+	}
+    }
+
   if (!vec_stmt)
     /* No transformation required.  */
     return true;
@@ -7185,18 +7211,6 @@ vectorizable_live_operation (gimple *stm
   tree vec_lhs, bitstart;
   if (slp_node)
     {
-      gcc_assert (slp_index >= 0);
-
-      int num_scalar = SLP_TREE_SCALAR_STMTS (slp_node).length ();
-      int num_vec = SLP_TREE_NUMBER_OF_VEC_STMTS (slp_node);
-
-      /* Get the last occurrence of the scalar index from the concatenation of
-	 all the slp vectors. Calculate which slp vector it is and the index
-	 within.  */
-      int pos = (num_vec * nunits) - num_scalar + slp_index;
-      int vec_entry = pos / nunits;
-      int vec_index = pos % nunits;
-
       /* Get the correct slp vectorized stmt.  */
       vec_lhs = gimple_get_lhs (SLP_TREE_VEC_STMTS (slp_node)[vec_entry]);
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [077/nnn] poly_int: vect_get_constant_vectors
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (75 preceding siblings ...)
  2017-10-23 17:31 ` [075/nnn] poly_int: vectorizable_simd_clone_call Richard Sandiford
@ 2017-10-23 17:31 ` Richard Sandiford
  2017-11-28 16:43   ` Jeff Law
  2017-10-23 17:32 ` [080/nnn] poly_int: tree-vect-generic.c Richard Sandiford
                   ` (30 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:31 UTC (permalink / raw)
  To: gcc-patches

For now, vect_get_constant_vectors can only cope with constant-length
vectors, although a patch after the main SVE submission relaxes this.
This patch adds an appropriate guard for variable-length vectors.
The TYPE_VECTOR_SUBPARTS use in vect_get_constant_vectors will then
have a to_constant call when TYPE_VECTOR_SUBPARTS becomes a poly_int.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-slp.c (vect_get_and_check_slp_defs): Reject
	constant and extern definitions for variable-length vectors.
	(vect_get_constant_vectors): Note that the number of units
	is known to be constant.

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-10-23 17:22:32.728227020 +0100
+++ gcc/tree-vect-slp.c	2017-10-23 17:22:42.827179461 +0100
@@ -403,6 +403,20 @@ vect_get_and_check_slp_defs (vec_info *v
 	{
 	case vect_constant_def:
 	case vect_external_def:
+	  /* We must already have set a vector size by now.  */
+	  gcc_checking_assert (maybe_nonzero (current_vector_size));
+	  if (!current_vector_size.is_constant ())
+	    {
+	      if (dump_enabled_p ())
+		{
+		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+				   "Build SLP failed: invalid type of def "
+				   "for variable-length SLP ");
+		  dump_generic_expr (MSG_MISSED_OPTIMIZATION, TDF_SLIM, oprnd);
+		  dump_printf (MSG_MISSED_OPTIMIZATION, "\n");
+		}
+	      return -1;
+	    }
 	  break;
 
 	case vect_reduction_def:
@@ -3219,6 +3233,7 @@ vect_get_constant_vectors (tree op, slp_
       = build_same_sized_truth_vector_type (STMT_VINFO_VECTYPE (stmt_vinfo));
   else
     vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
+  /* Enforced by vect_get_and_check_slp_defs.  */
   nunits = TYPE_VECTOR_SUBPARTS (vector_type);
 
   if (STMT_VINFO_DATA_REF (stmt_vinfo))

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [076/nnn] poly_int: vectorizable_conversion
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (73 preceding siblings ...)
  2017-10-23 17:30 ` [074/nnn] poly_int: vectorizable_call Richard Sandiford
@ 2017-10-23 17:31 ` Richard Sandiford
  2017-11-28 16:44   ` Jeff Law
  2017-10-23 17:31 ` [075/nnn] poly_int: vectorizable_simd_clone_call Richard Sandiford
                   ` (32 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:31 UTC (permalink / raw)
  To: gcc-patches

This patch makes vectorizable_conversion cope with variable-length
vectors.  We already require the number of elements in one vector
to be a multiple of the number of elements in the other vector,
so the patch uses that to choose between widening and narrowing.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-stmts.c (vectorizable_conversion): Treat the number
	of units as polynomial.  Choose between WIDE and NARROW based
	on multiple_p.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-10-23 17:22:40.906378704 +0100
+++ gcc/tree-vect-stmts.c	2017-10-23 17:22:41.879277786 +0100
@@ -4102,8 +4102,8 @@ vectorizable_conversion (gimple *stmt, g
   int ndts = 2;
   gimple *new_stmt = NULL;
   stmt_vec_info prev_stmt_info;
-  int nunits_in;
-  int nunits_out;
+  poly_uint64 nunits_in;
+  poly_uint64 nunits_out;
   tree vectype_out, vectype_in;
   int ncopies, i, j;
   tree lhs_type, rhs_type;
@@ -4238,12 +4238,15 @@ vectorizable_conversion (gimple *stmt, g
 
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype_in);
   nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
-  if (nunits_in < nunits_out)
-    modifier = NARROW;
-  else if (nunits_out == nunits_in)
+  if (must_eq (nunits_out, nunits_in))
     modifier = NONE;
+  else if (multiple_p (nunits_out, nunits_in))
+    modifier = NARROW;
   else
-    modifier = WIDEN;
+    {
+      gcc_checking_assert (multiple_p (nunits_in, nunits_out));
+      modifier = WIDEN;
+    }
 
   /* Multiple types in SLP are handled by creating the appropriate number of
      vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [075/nnn] poly_int: vectorizable_simd_clone_call
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (74 preceding siblings ...)
  2017-10-23 17:31 ` [076/nnn] poly_int: vectorizable_conversion Richard Sandiford
@ 2017-10-23 17:31 ` Richard Sandiford
  2017-11-28 16:45   ` Jeff Law
  2017-10-23 17:31 ` [077/nnn] poly_int: vect_get_constant_vectors Richard Sandiford
                   ` (31 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:31 UTC (permalink / raw)
  To: gcc-patches

This patch makes vectorizable_simd_clone_call cope with variable-length
vectors.  For now we don't support SIMD clones for variable-length
vectors; this will be post GCC 8 material.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-stmts.c (simd_clone_subparts): New function.
	(vectorizable_simd_clone_call): Use it instead of TYPE_VECTOR_SUBPARTS.

Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-10-23 17:22:39.943478586 +0100
+++ gcc/tree-vect-stmts.c	2017-10-23 17:22:40.906378704 +0100
@@ -3206,6 +3206,16 @@ vect_simd_lane_linear (tree op, struct l
     }
 }
 
+/* Return the number of elements in vector type VECTYPE, which is associated
+   with a SIMD clone.  At present these vectors always have a constant
+   length.  */
+
+static unsigned HOST_WIDE_INT
+simd_clone_subparts (tree vectype)
+{
+  return TYPE_VECTOR_SUBPARTS (vectype);
+}
+
 /* Function vectorizable_simd_clone_call.
 
    Check if STMT performs a function call that can be vectorized
@@ -3474,7 +3484,7 @@ vectorizable_simd_clone_call (gimple *st
 	  = get_vectype_for_scalar_type (TREE_TYPE (gimple_call_arg (stmt,
 								     i)));
 	if (arginfo[i].vectype == NULL
-	    || (TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)
+	    || (simd_clone_subparts (arginfo[i].vectype)
 		> bestn->simdclone->simdlen))
 	  return false;
       }
@@ -3561,15 +3571,15 @@ vectorizable_simd_clone_call (gimple *st
 	    {
 	    case SIMD_CLONE_ARG_TYPE_VECTOR:
 	      atype = bestn->simdclone->args[i].vector_type;
-	      o = nunits / TYPE_VECTOR_SUBPARTS (atype);
+	      o = nunits / simd_clone_subparts (atype);
 	      for (m = j * o; m < (j + 1) * o; m++)
 		{
-		  if (TYPE_VECTOR_SUBPARTS (atype)
-		      < TYPE_VECTOR_SUBPARTS (arginfo[i].vectype))
+		  if (simd_clone_subparts (atype)
+		      < simd_clone_subparts (arginfo[i].vectype))
 		    {
 		      unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
-		      k = (TYPE_VECTOR_SUBPARTS (arginfo[i].vectype)
-			   / TYPE_VECTOR_SUBPARTS (atype));
+		      k = (simd_clone_subparts (arginfo[i].vectype)
+			   / simd_clone_subparts (atype));
 		      gcc_assert ((k & (k - 1)) == 0);
 		      if (m == 0)
 			vec_oprnd0
@@ -3595,8 +3605,8 @@ vectorizable_simd_clone_call (gimple *st
 		    }
 		  else
 		    {
-		      k = (TYPE_VECTOR_SUBPARTS (atype)
-			   / TYPE_VECTOR_SUBPARTS (arginfo[i].vectype));
+		      k = (simd_clone_subparts (atype)
+			   / simd_clone_subparts (arginfo[i].vectype));
 		      gcc_assert ((k & (k - 1)) == 0);
 		      vec<constructor_elt, va_gc> *ctor_elts;
 		      if (k != 1)
@@ -3714,11 +3724,11 @@ vectorizable_simd_clone_call (gimple *st
       new_stmt = gimple_build_call_vec (fndecl, vargs);
       if (vec_dest)
 	{
-	  gcc_assert (ratype || TYPE_VECTOR_SUBPARTS (rtype) == nunits);
+	  gcc_assert (ratype || simd_clone_subparts (rtype) == nunits);
 	  if (ratype)
 	    new_temp = create_tmp_var (ratype);
-	  else if (TYPE_VECTOR_SUBPARTS (vectype)
-		   == TYPE_VECTOR_SUBPARTS (rtype))
+	  else if (simd_clone_subparts (vectype)
+		   == simd_clone_subparts (rtype))
 	    new_temp = make_ssa_name (vec_dest, new_stmt);
 	  else
 	    new_temp = make_ssa_name (rtype, new_stmt);
@@ -3728,11 +3738,11 @@ vectorizable_simd_clone_call (gimple *st
 
       if (vec_dest)
 	{
-	  if (TYPE_VECTOR_SUBPARTS (vectype) < nunits)
+	  if (simd_clone_subparts (vectype) < nunits)
 	    {
 	      unsigned int k, l;
 	      unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
-	      k = nunits / TYPE_VECTOR_SUBPARTS (vectype);
+	      k = nunits / simd_clone_subparts (vectype);
 	      gcc_assert ((k & (k - 1)) == 0);
 	      for (l = 0; l < k; l++)
 		{
@@ -3767,16 +3777,16 @@ vectorizable_simd_clone_call (gimple *st
 		}
 	      continue;
 	    }
-	  else if (TYPE_VECTOR_SUBPARTS (vectype) > nunits)
+	  else if (simd_clone_subparts (vectype) > nunits)
 	    {
-	      unsigned int k = (TYPE_VECTOR_SUBPARTS (vectype)
-				/ TYPE_VECTOR_SUBPARTS (rtype));
+	      unsigned int k = (simd_clone_subparts (vectype)
+				/ simd_clone_subparts (rtype));
 	      gcc_assert ((k & (k - 1)) == 0);
 	      if ((j & (k - 1)) == 0)
 		vec_alloc (ret_ctor_elts, k);
 	      if (ratype)
 		{
-		  unsigned int m, o = nunits / TYPE_VECTOR_SUBPARTS (rtype);
+		  unsigned int m, o = nunits / simd_clone_subparts (rtype);
 		  for (m = 0; m < o; m++)
 		    {
 		      tree tem = build4 (ARRAY_REF, rtype, new_temp,

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [079/nnn] poly_int: vect_no_alias_p
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (77 preceding siblings ...)
  2017-10-23 17:32 ` [080/nnn] poly_int: tree-vect-generic.c Richard Sandiford
@ 2017-10-23 17:32 ` Richard Sandiford
  2017-12-05 17:46   ` Jeff Law
  2017-10-23 17:32 ` [078/nnn] poly_int: two-operation SLP Richard Sandiford
                   ` (28 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:32 UTC (permalink / raw)
  To: gcc-patches

This patch replaces the two-state vect_no_alias_p with a three-state
vect_compile_time_alias that handles polynomial segment lengths.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-data-refs.c (vect_no_alias_p): Replace with...
	(vect_compile_time_alias): ...this new function.  Do the calculation
	on poly_ints rather than trees.
	(vect_prune_runtime_alias_test_list): Update call accordingly.

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 17:22:34.681024458 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:22:44.864968082 +0100
@@ -2989,52 +2989,49 @@ vect_vfa_segment_size (struct data_refer
 
 /* Function vect_no_alias_p.
 
-   Given data references A and B with equal base and offset, the alias
-   relation can be decided at compilation time, return TRUE if they do
-   not alias to each other; return FALSE otherwise.  SEGMENT_LENGTH_A
+   Given data references A and B with equal base and offset, see whether
+   the alias relation can be decided at compilation time.  Return 1 if
+   it can and the references alias, 0 if it can and the references do
+   not alias, and -1 if we cannot decide at compile time.  SEGMENT_LENGTH_A
    and SEGMENT_LENGTH_B are the memory lengths accessed by A and B
    respectively.  */
 
-static bool
-vect_no_alias_p (struct data_reference *a, struct data_reference *b,
-                 tree segment_length_a, tree segment_length_b)
+static int
+vect_compile_time_alias (struct data_reference *a, struct data_reference *b,
+			 tree segment_length_a, tree segment_length_b)
 {
-  gcc_assert (TREE_CODE (DR_INIT (a)) == INTEGER_CST
-	      && TREE_CODE (DR_INIT (b)) == INTEGER_CST);
-  if (tree_int_cst_equal (DR_INIT (a), DR_INIT (b)))
-    return false;
+  poly_offset_int offset_a = wi::to_poly_offset (DR_INIT (a));
+  poly_offset_int offset_b = wi::to_poly_offset (DR_INIT (b));
+  poly_uint64 const_length_a;
+  poly_uint64 const_length_b;
 
-  tree seg_a_min = DR_INIT (a);
-  tree seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_min),
-				seg_a_min, segment_length_a);
   /* For negative step, we need to adjust address range by TYPE_SIZE_UNIT
      bytes, e.g., int a[3] -> a[1] range is [a+4, a+16) instead of
      [a, a+12) */
   if (tree_int_cst_compare (DR_STEP (a), size_zero_node) < 0)
     {
-      tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (a)));
-      seg_a_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_a_max),
-			       seg_a_max, unit_size);
-      seg_a_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (a)),
-			       DR_INIT (a), unit_size);
+      const_length_a = (-wi::to_poly_wide (segment_length_a)).force_uhwi ();
+      offset_a = (offset_a + vect_get_scalar_dr_size (a)) - const_length_a;
     }
-  tree seg_b_min = DR_INIT (b);
-  tree seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_min),
-				seg_b_min, segment_length_b);
+  else
+    const_length_a = tree_to_poly_uint64 (segment_length_a);
   if (tree_int_cst_compare (DR_STEP (b), size_zero_node) < 0)
     {
-      tree unit_size = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (b)));
-      seg_b_min = fold_build2 (PLUS_EXPR, TREE_TYPE (seg_b_max),
-			       seg_b_max, unit_size);
-      seg_b_max = fold_build2 (PLUS_EXPR, TREE_TYPE (DR_INIT (b)),
-			       DR_INIT (b), unit_size);
+      const_length_b = (-wi::to_poly_wide (segment_length_b)).force_uhwi ();
+      offset_b = (offset_b + vect_get_scalar_dr_size (b)) - const_length_b;
     }
+  else
+    const_length_b = tree_to_poly_uint64 (segment_length_b);
 
-  if (tree_int_cst_le (seg_a_max, seg_b_min)
-      || tree_int_cst_le (seg_b_max, seg_a_min))
-    return true;
+  if (ranges_must_overlap_p (offset_a, const_length_a,
+			     offset_b, const_length_b))
+    return 1;
+
+  if (!ranges_may_overlap_p (offset_a, const_length_a,
+			     offset_b, const_length_b))
+    return 0;
 
-  return false;
+  return -1;
 }
 
 /* Return true if the minimum nonzero dependence distance for loop LOOP_DEPTH
@@ -3176,21 +3173,26 @@ vect_prune_runtime_alias_test_list (loop
 	comp_res = data_ref_compare_tree (DR_OFFSET (dr_a),
 					  DR_OFFSET (dr_b));
 
-      /* Alias is known at compilation time.  */
+      /* See whether the alias is known at compilation time.  */
       if (comp_res == 0
 	  && TREE_CODE (DR_STEP (dr_a)) == INTEGER_CST
 	  && TREE_CODE (DR_STEP (dr_b)) == INTEGER_CST
-	  && TREE_CODE (segment_length_a) == INTEGER_CST
-	  && TREE_CODE (segment_length_b) == INTEGER_CST)
+	  && poly_int_tree_p (segment_length_a)
+	  && poly_int_tree_p (segment_length_b))
 	{
-	  if (vect_no_alias_p (dr_a, dr_b, segment_length_a, segment_length_b))
+	  int res = vect_compile_time_alias (dr_a, dr_b,
+					     segment_length_a,
+					     segment_length_b);
+	  if (res == 0)
 	    continue;
 
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, vect_location,
-			     "not vectorized: compilation time alias.\n");
-
-	  return false;
+	  if (res == 1)
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "not vectorized: compilation time alias.\n");
+	      return false;
+	    }
 	}
 
       dr_with_seg_len_pair_t dr_with_seg_len_pair

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [078/nnn] poly_int: two-operation SLP
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (78 preceding siblings ...)
  2017-10-23 17:32 ` [079/nnn] poly_int: vect_no_alias_p Richard Sandiford
@ 2017-10-23 17:32 ` Richard Sandiford
  2017-11-28 16:41   ` Jeff Law
  2017-10-23 17:33 ` [082/nnn] poly_int: omp-simd-clone.c Richard Sandiford
                   ` (27 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:32 UTC (permalink / raw)
  To: gcc-patches

This patch makes two-operation SLP handle but reject variable-length
vectors.  Adding support for this is a post-GCC8 thing.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-slp.c (vect_build_slp_tree_1): Handle polynomial
	numbers of units.
	(vect_schedule_slp_instance): Likewise.

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-10-23 17:22:42.827179461 +0100
+++ gcc/tree-vect-slp.c	2017-10-23 17:22:43.865071801 +0100
@@ -903,10 +903,19 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 
   /* If we allowed a two-operation SLP node verify the target can cope
      with the permute we are going to use.  */
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   if (alt_stmt_code != ERROR_MARK
       && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
     {
-      unsigned int count = TYPE_VECTOR_SUBPARTS (vectype);
+      unsigned HOST_WIDE_INT count;
+      if (!nunits.is_constant (&count))
+	{
+	  if (dump_enabled_p ())
+	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			     "Build SLP failed: different operations "
+			     "not allowed with variable-length SLP.\n");
+	  return false;
+	}
       auto_vec_perm_indices sel (count);
       for (i = 0; i < count; ++i)
 	{
@@ -3796,6 +3805,7 @@ vect_schedule_slp_instance (slp_tree nod
 
   /* VECTYPE is the type of the destination.  */
   vectype = STMT_VINFO_VECTYPE (stmt_info);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
   group_size = SLP_INSTANCE_GROUP_SIZE (instance);
 
   if (!SLP_TREE_VEC_STMTS (node).exists ())
@@ -3858,13 +3868,16 @@ vect_schedule_slp_instance (slp_tree nod
 	  unsigned k = 0, l;
 	  for (j = 0; j < v0.length (); ++j)
 	    {
-	      unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
-	      auto_vec<tree, 32> melts (nunits);
-	      for (l = 0; l < nunits; ++l)
+	      /* Enforced by vect_build_slp_tree, which rejects variable-length
+		 vectors for SLP_TREE_TWO_OPERATORS.  */
+	      unsigned int const_nunits = nunits.to_constant ();
+	      auto_vec<tree, 32> melts (const_nunits);
+	      for (l = 0; l < const_nunits; ++l)
 		{
 		  if (k >= group_size)
 		    k = 0;
-		  tree t = build_int_cst (meltype, mask[k++] * nunits + l);
+		  tree t = build_int_cst (meltype,
+					  mask[k++] * const_nunits + l);
 		  melts.quick_push (t);
 		}
 	      tmask = build_vector (mvectype, melts);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [080/nnn] poly_int: tree-vect-generic.c
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (76 preceding siblings ...)
  2017-10-23 17:31 ` [077/nnn] poly_int: vect_get_constant_vectors Richard Sandiford
@ 2017-10-23 17:32 ` Richard Sandiford
  2017-12-05 17:48   ` Jeff Law
  2017-10-23 17:32 ` [079/nnn] poly_int: vect_no_alias_p Richard Sandiford
                   ` (29 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:32 UTC (permalink / raw)
  To: gcc-patches

This patch makes tree-vect-generic.c cope with variable-length vectors.
Decomposition is only supported for constant-length vectors, since we
should never generate unsupported variable-length operations.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-generic.c (nunits_for_known_piecewise_op): New function.
	(expand_vector_piecewise): Use it instead of TYPE_VECTOR_SUBPARTS.
	(expand_vector_addition, add_rshift, expand_vector_divmod): Likewise.
	(expand_vector_condition, vector_element): Likewise.
	(subparts_gt): New function.
	(get_compute_type): Use subparts_gt.
	(count_type_subparts): Delete.
	(expand_vector_operations_1): Use subparts_gt instead of
	count_type_subparts.

Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-10-23 17:11:39.944370794 +0100
+++ gcc/tree-vect-generic.c	2017-10-23 17:22:45.856865193 +0100
@@ -41,6 +41,26 @@ Free Software Foundation; either version
 
 static void expand_vector_operations_1 (gimple_stmt_iterator *);
 
+/* Return the number of elements in a vector type TYPE that we have
+   already decided needs to be expanded piecewise.  We don't support
+   this kind of expansion for variable-length vectors, since we should
+   always check for target support before introducing uses of those.  */
+static unsigned int
+nunits_for_known_piecewise_op (const_tree type)
+{
+  return TYPE_VECTOR_SUBPARTS (type);
+}
+
+/* Return true if TYPE1 has more elements than TYPE2, where either
+   type may be a vector or a scalar.  */
+
+static inline bool
+subparts_gt (tree type1, tree type2)
+{
+  poly_uint64 n1 = VECTOR_TYPE_P (type1) ? TYPE_VECTOR_SUBPARTS (type1) : 1;
+  poly_uint64 n2 = VECTOR_TYPE_P (type2) ? TYPE_VECTOR_SUBPARTS (type2) : 1;
+  return must_gt (n1, n2);
+}
 
 /* Build a constant of type TYPE, made of VALUE's bits replicated
    every TYPE_SIZE (INNER_TYPE) bits to fit TYPE's precision.  */
@@ -254,7 +274,7 @@ expand_vector_piecewise (gimple_stmt_ite
   vec<constructor_elt, va_gc> *v;
   tree part_width = TYPE_SIZE (inner_type);
   tree index = bitsize_int (0);
-  int nunits = TYPE_VECTOR_SUBPARTS (type);
+  int nunits = nunits_for_known_piecewise_op (type);
   int delta = tree_to_uhwi (part_width)
 	      / tree_to_uhwi (TYPE_SIZE (TREE_TYPE (type)));
   int i;
@@ -338,7 +358,7 @@ expand_vector_addition (gimple_stmt_iter
 
   if (INTEGRAL_TYPE_P (TREE_TYPE (type))
       && parts_per_word >= 4
-      && TYPE_VECTOR_SUBPARTS (type) >= 4)
+      && nunits_for_known_piecewise_op (type) >= 4)
     return expand_vector_parallel (gsi, f_parallel,
 				   type, a, b, code);
   else
@@ -373,7 +393,7 @@ expand_vector_comparison (gimple_stmt_it
 add_rshift (gimple_stmt_iterator *gsi, tree type, tree op0, int *shiftcnts)
 {
   optab op;
-  unsigned int i, nunits = TYPE_VECTOR_SUBPARTS (type);
+  unsigned int i, nunits = nunits_for_known_piecewise_op (type);
   bool scalar_shift = true;
 
   for (i = 1; i < nunits; i++)
@@ -418,7 +438,7 @@ expand_vector_divmod (gimple_stmt_iterat
   bool has_vector_shift = true;
   int mode = -1, this_mode;
   int pre_shift = -1, post_shift;
-  unsigned int nunits = TYPE_VECTOR_SUBPARTS (type);
+  unsigned int nunits = nunits_for_known_piecewise_op (type);
   int *shifts = XALLOCAVEC (int, nunits * 4);
   int *pre_shifts = shifts + nunits;
   int *post_shifts = pre_shifts + nunits;
@@ -867,7 +887,6 @@ expand_vector_condition (gimple_stmt_ite
   tree index = bitsize_int (0);
   tree comp_width = width;
   tree comp_index = index;
-  int nunits = TYPE_VECTOR_SUBPARTS (type);
   int i;
   location_t loc = gimple_location (gsi_stmt (*gsi));
 
@@ -920,6 +939,7 @@ expand_vector_condition (gimple_stmt_ite
   warning_at (loc, OPT_Wvector_operation_performance,
 	      "vector condition will be expanded piecewise");
 
+  int nunits = nunits_for_known_piecewise_op (type);
   vec_alloc (v, nunits);
   for (i = 0; i < nunits; i++)
     {
@@ -1189,7 +1209,7 @@ vector_element (gimple_stmt_iterator *gs
 
   vect_type = TREE_TYPE (vect);
   vect_elt_type = TREE_TYPE (vect_type);
-  elements = TYPE_VECTOR_SUBPARTS (vect_type);
+  elements = nunits_for_known_piecewise_op (vect_type);
 
   if (TREE_CODE (idx) == INTEGER_CST)
     {
@@ -1446,8 +1466,7 @@ get_compute_type (enum tree_code code, o
       tree vector_compute_type
 	= type_for_widest_vector_mode (TREE_TYPE (type), op);
       if (vector_compute_type != NULL_TREE
-	  && (TYPE_VECTOR_SUBPARTS (vector_compute_type)
-	      < TYPE_VECTOR_SUBPARTS (compute_type))
+	  && subparts_gt (compute_type, vector_compute_type)
 	  && TYPE_VECTOR_SUBPARTS (vector_compute_type) > 1
 	  && (optab_handler (op, TYPE_MODE (vector_compute_type))
 	      != CODE_FOR_nothing))
@@ -1476,15 +1495,6 @@ get_compute_type (enum tree_code code, o
   return compute_type;
 }
 
-/* Helper function of expand_vector_operations_1.  Return number of
-   vector elements for vector types or 1 for other types.  */
-
-static inline int
-count_type_subparts (tree type)
-{
-  return VECTOR_TYPE_P (type) ? TYPE_VECTOR_SUBPARTS (type) : 1;
-}
-
 static tree
 do_cond (gimple_stmt_iterator *gsi, tree inner_type, tree a, tree b,
 	 tree bitpos, tree bitsize, enum tree_code code,
@@ -1704,8 +1714,7 @@ expand_vector_operations_1 (gimple_stmt_
 	  /* The rtl expander will expand vector/scalar as vector/vector
 	     if necessary.  Pick one with wider vector type.  */
 	  tree compute_vtype = get_compute_type (code, opv, type);
-	  if (count_type_subparts (compute_vtype)
-	      > count_type_subparts (compute_type))
+	  if (subparts_gt (compute_vtype, compute_type))
 	    {
 	      compute_type = compute_vtype;
 	      op = opv;
@@ -1735,14 +1744,12 @@ expand_vector_operations_1 (gimple_stmt_
 	      tree compute_rtype = get_compute_type (RSHIFT_EXPR, opr, type);
 	      /* The rtl expander will expand vector/scalar as vector/vector
 		 if necessary.  Pick one with wider vector type.  */
-	      if (count_type_subparts (compute_lvtype)
-		  > count_type_subparts (compute_ltype))
+	      if (subparts_gt (compute_lvtype, compute_ltype))
 		{
 		  compute_ltype = compute_lvtype;
 		  opl = oplv;
 		}
-	      if (count_type_subparts (compute_rvtype)
-		  > count_type_subparts (compute_rtype))
+	      if (subparts_gt (compute_rvtype, compute_rtype))
 		{
 		  compute_rtype = compute_rvtype;
 		  opr = oprv;
@@ -1750,11 +1757,9 @@ expand_vector_operations_1 (gimple_stmt_
 	      /* Pick the narrowest type from LSHIFT_EXPR, RSHIFT_EXPR and
 		 BIT_IOR_EXPR.  */
 	      compute_type = compute_ltype;
-	      if (count_type_subparts (compute_type)
-		  > count_type_subparts (compute_rtype))
+	      if (subparts_gt (compute_type, compute_rtype))
 		compute_type = compute_rtype;
-	      if (count_type_subparts (compute_type)
-		  > count_type_subparts (compute_otype))
+	      if (subparts_gt (compute_type, compute_otype))
 		compute_type = compute_otype;
 	      /* Verify all 3 operations can be performed in that type.  */
 	      if (compute_type != TREE_TYPE (type))

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [082/nnn] poly_int: omp-simd-clone.c
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (79 preceding siblings ...)
  2017-10-23 17:32 ` [078/nnn] poly_int: two-operation SLP Richard Sandiford
@ 2017-10-23 17:33 ` Richard Sandiford
  2017-11-28 16:36   ` Jeff Law
  2017-10-23 17:33 ` [081/nnn] poly_int: brig vector elements Richard Sandiford
                   ` (26 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:33 UTC (permalink / raw)
  To: gcc-patches

This patch adds a wrapper around TYPE_VECTOR_SUBPARTS for omp-simd-clone.c.
Supporting SIMD clones for variable-length vectors is post GCC8 work.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* omp-simd-clone.c (simd_clone_subparts): New function.
	(simd_clone_init_simd_arrays): Use it instead of TYPE_VECTOR_SUBPARTS.
	(ipa_simd_modify_function_body): Likewise.

Index: gcc/omp-simd-clone.c
===================================================================
--- gcc/omp-simd-clone.c	2017-08-30 12:19:19.716220030 +0100
+++ gcc/omp-simd-clone.c	2017-10-23 17:22:47.947648317 +0100
@@ -51,6 +51,15 @@ Software Foundation; either version 3, o
 #include "stringpool.h"
 #include "attribs.h"
 
+/* Return the number of elements in vector type VECTYPE, which is associated
+   with a SIMD clone.  At present these always have a constant length.  */
+
+static unsigned HOST_WIDE_INT
+simd_clone_subparts (tree vectype)
+{
+  return TYPE_VECTOR_SUBPARTS (vectype);
+}
+
 /* Allocate a fresh `simd_clone' and return it.  NARGS is the number
    of arguments to reserve space for.  */
 
@@ -770,7 +779,7 @@ simd_clone_init_simd_arrays (struct cgra
 	    }
 	  continue;
 	}
-      if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg)) == node->simdclone->simdlen)
+      if (simd_clone_subparts (TREE_TYPE (arg)) == node->simdclone->simdlen)
 	{
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
 	  tree ptr = build_fold_addr_expr (array);
@@ -781,7 +790,7 @@ simd_clone_init_simd_arrays (struct cgra
 	}
       else
 	{
-	  unsigned int simdlen = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg));
+	  unsigned int simdlen = simd_clone_subparts (TREE_TYPE (arg));
 	  tree ptype = build_pointer_type (TREE_TYPE (TREE_TYPE (array)));
 	  for (k = 0; k < node->simdclone->simdlen; k += simdlen)
 	    {
@@ -927,8 +936,8 @@ ipa_simd_modify_function_body (struct cg
 		  iter,
 		  NULL_TREE, NULL_TREE);
       if (adjustments[j].op == IPA_PARM_OP_NONE
-	  && TYPE_VECTOR_SUBPARTS (vectype) < node->simdclone->simdlen)
-	j += node->simdclone->simdlen / TYPE_VECTOR_SUBPARTS (vectype) - 1;
+	  && simd_clone_subparts (vectype) < node->simdclone->simdlen)
+	j += node->simdclone->simdlen / simd_clone_subparts (vectype) - 1;
     }
 
   l = adjustments.length ();

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [081/nnn] poly_int: brig vector elements
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (80 preceding siblings ...)
  2017-10-23 17:33 ` [082/nnn] poly_int: omp-simd-clone.c Richard Sandiford
@ 2017-10-23 17:33 ` Richard Sandiford
  2017-10-24  7:10   ` Pekka Jääskeläinen
  2017-10-23 17:34 ` [083/nnn] poly_int: fold_indirect_ref_1 Richard Sandiford
                   ` (25 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:33 UTC (permalink / raw)
  To: gcc-patches

This patch adds a brig-specific wrapper around TYPE_VECTOR_SUBPARTS,
since presumably it will never need to support variable vector lengths.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/brig/
	* brigfrontend/brig-util.h (gccbrig_type_vector_subparts): New
	function.
	* brigfrontend/brig-basic-inst-handler.cc
	(brig_basic_inst_handler::build_shuffle): Use it instead of
	TYPE_VECTOR_SUBPARTS.
	(brig_basic_inst_handler::build_unpack): Likewise.
	(brig_basic_inst_handler::build_pack): Likewise.
	(brig_basic_inst_handler::build_unpack_lo_or_hi): Likewise.
	(brig_basic_inst_handler::operator ()): Likewise.
	(brig_basic_inst_handler::build_lower_element_broadcast): Likewise.
	* brigfrontend/brig-code-entry-handler.cc
	(brig_code_entry_handler::get_tree_cst_for_hsa_operand): Likewise.
	(brig_code_entry_handler::get_comparison_result_type): Likewise.
	(brig_code_entry_handler::expand_or_call_builtin): Likewise.

Index: gcc/brig/brigfrontend/brig-util.h
===================================================================
--- gcc/brig/brigfrontend/brig-util.h	2017-10-02 09:10:56.960755788 +0100
+++ gcc/brig/brigfrontend/brig-util.h	2017-10-23 17:22:46.882758777 +0100
@@ -76,4 +76,12 @@ bool gccbrig_might_be_host_defined_var_p
 /* From hsa.h.  */
 bool hsa_type_packed_p (BrigType16_t type);
 
+/* Return the number of elements in a VECTOR_TYPE.  BRIG does not support
+   variable-length vectors.  */
+inline unsigned HOST_WIDE_INT
+gccbrig_type_vector_subparts (const_tree type)
+{
+  return TYPE_VECTOR_SUBPARTS (type);
+}
+
 #endif
Index: gcc/brig/brigfrontend/brig-basic-inst-handler.cc
===================================================================
--- gcc/brig/brigfrontend/brig-basic-inst-handler.cc	2017-08-10 14:36:07.092506123 +0100
+++ gcc/brig/brigfrontend/brig-basic-inst-handler.cc	2017-10-23 17:22:46.882758777 +0100
@@ -97,9 +97,10 @@ brig_basic_inst_handler::build_shuffle (
      output elements can originate from any input element.  */
   vec<constructor_elt, va_gc> *mask_offset_vals = NULL;
 
+  unsigned int element_count = gccbrig_type_vector_subparts (arith_type);
+
   vec<constructor_elt, va_gc> *input_mask_vals = NULL;
-  size_t input_mask_element_size
-    = exact_log2 (TYPE_VECTOR_SUBPARTS (arith_type));
+  size_t input_mask_element_size = exact_log2 (element_count);
 
   /* Unpack the tightly packed mask elements to BIT_FIELD_REFs
      from which to construct the mask vector as understood by
@@ -109,7 +110,7 @@ brig_basic_inst_handler::build_shuffle (
   tree mask_element_type
     = build_nonstandard_integer_type (input_mask_element_size, true);
 
-  for (size_t i = 0; i < TYPE_VECTOR_SUBPARTS (arith_type); ++i)
+  for (size_t i = 0; i < element_count; ++i)
     {
       tree mask_element
 	= build3 (BIT_FIELD_REF, mask_element_type, mask_operand,
@@ -119,17 +120,15 @@ brig_basic_inst_handler::build_shuffle (
       mask_element = convert (element_type, mask_element);
 
       tree offset;
-      if (i < TYPE_VECTOR_SUBPARTS (arith_type) / 2)
+      if (i < element_count / 2)
 	offset = build_int_cst (element_type, 0);
       else
-	offset
-	  = build_int_cst (element_type, TYPE_VECTOR_SUBPARTS (arith_type));
+	offset = build_int_cst (element_type, element_count);
 
       CONSTRUCTOR_APPEND_ELT (mask_offset_vals, NULL_TREE, offset);
       CONSTRUCTOR_APPEND_ELT (input_mask_vals, NULL_TREE, mask_element);
     }
-  tree mask_vec_type
-    = build_vector_type (element_type, TYPE_VECTOR_SUBPARTS (arith_type));
+  tree mask_vec_type = build_vector_type (element_type, element_count);
 
   tree mask_vec = build_constructor (mask_vec_type, input_mask_vals);
   tree offset_vec = build_constructor (mask_vec_type, mask_offset_vals);
@@ -158,7 +157,8 @@ brig_basic_inst_handler::build_unpack (t
   vec<constructor_elt, va_gc> *input_mask_vals = NULL;
   vec<constructor_elt, va_gc> *and_mask_vals = NULL;
 
-  size_t element_count = TYPE_VECTOR_SUBPARTS (TREE_TYPE (operands[0]));
+  size_t element_count
+    = gccbrig_type_vector_subparts (TREE_TYPE (operands[0]));
   tree vec_type = build_vector_type (element_type, element_count);
 
   for (size_t i = 0; i < element_count; ++i)
@@ -213,7 +213,7 @@ brig_basic_inst_handler::build_pack (tre
      TODO: Reuse this for implementing 'bitinsert'
      without a builtin call.  */
 
-  size_t ecount = TYPE_VECTOR_SUBPARTS (TREE_TYPE (operands[0]));
+  size_t ecount = gccbrig_type_vector_subparts (TREE_TYPE (operands[0]));
   size_t vecsize = int_size_in_bytes (TREE_TYPE (operands[0])) * BITS_PER_UNIT;
   tree wide_type = build_nonstandard_integer_type (vecsize, 1);
 
@@ -275,9 +275,10 @@ brig_basic_inst_handler::build_unpack_lo
 {
   tree element_type = get_unsigned_int_type (TREE_TYPE (arith_type));
   tree mask_vec_type
-    = build_vector_type (element_type, TYPE_VECTOR_SUBPARTS (arith_type));
+    = build_vector_type (element_type,
+			 gccbrig_type_vector_subparts (arith_type));
 
-  size_t element_count = TYPE_VECTOR_SUBPARTS (arith_type);
+  size_t element_count = gccbrig_type_vector_subparts (arith_type);
   vec<constructor_elt, va_gc> *input_mask_vals = NULL;
 
   size_t offset = (brig_opcode == BRIG_OPCODE_UNPACKLO) ? 0 : element_count / 2;
@@ -600,8 +601,8 @@ brig_basic_inst_handler::operator () (co
 	}
 
       size_t promoted_type_size = int_size_in_bytes (promoted_type) * 8;
-
-      for (size_t i = 0; i < TYPE_VECTOR_SUBPARTS (arith_type); ++i)
+      size_t element_count = gccbrig_type_vector_subparts (arith_type);
+      for (size_t i = 0; i < element_count; ++i)
 	{
 	  tree operand0 = convert (promoted_type, operand0_elements.at (i));
 	  tree operand1 = convert (promoted_type, operand1_elements.at (i));
@@ -708,7 +709,8 @@ brig_basic_inst_handler::build_lower_ele
   tree element_type = TREE_TYPE (TREE_TYPE (vec_operand));
   size_t esize = 8 * int_size_in_bytes (element_type);
 
-  size_t element_count = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_operand));
+  size_t element_count
+    = gccbrig_type_vector_subparts (TREE_TYPE (vec_operand));
   tree mask_inner_type = build_nonstandard_integer_type (esize, 1);
   vec<constructor_elt, va_gc> *constructor_vals = NULL;
 
Index: gcc/brig/brigfrontend/brig-code-entry-handler.cc
===================================================================
--- gcc/brig/brigfrontend/brig-code-entry-handler.cc	2017-10-02 09:10:56.960755788 +0100
+++ gcc/brig/brigfrontend/brig-code-entry-handler.cc	2017-10-23 17:22:46.882758777 +0100
@@ -641,7 +641,8 @@ brig_code_entry_handler::get_tree_cst_fo
 	{
 	  /* In case of vector type elements (or sole vectors),
 	     create a vector ctor.  */
-	  size_t element_count = TYPE_VECTOR_SUBPARTS (tree_element_type);
+	  size_t element_count
+	    = gccbrig_type_vector_subparts (tree_element_type);
 	  if (bytes_left < scalar_element_size * element_count)
 	    fatal_error (UNKNOWN_LOCATION,
 			 "Not enough bytes left for the initializer "
@@ -844,7 +845,7 @@ brig_code_entry_handler::get_comparison_
       size_t element_size = int_size_in_bytes (TREE_TYPE (source_type));
       return build_vector_type
 	(build_nonstandard_boolean_type (element_size * BITS_PER_UNIT),
-	 TYPE_VECTOR_SUBPARTS (source_type));
+	 gccbrig_type_vector_subparts (source_type));
     }
   else
     return gccbrig_tree_type_for_hsa_type (BRIG_TYPE_B1);
@@ -949,7 +950,8 @@ brig_code_entry_handler::expand_or_call_
 
       tree_stl_vec result_elements;
 
-      for (size_t i = 0; i < TYPE_VECTOR_SUBPARTS (arith_type); ++i)
+      size_t element_count = gccbrig_type_vector_subparts (arith_type);
+      for (size_t i = 0; i < element_count; ++i)
 	{
 	  tree_stl_vec call_operands;
 	  if (operand0_elements.size () > 0)

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [085/nnn] poly_int: expand_vector_ubsan_overflow
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (82 preceding siblings ...)
  2017-10-23 17:34 ` [083/nnn] poly_int: fold_indirect_ref_1 Richard Sandiford
@ 2017-10-23 17:34 ` Richard Sandiford
  2017-11-28 16:33   ` Jeff Law
  2017-10-23 17:34 ` [084/nnn] poly_int: folding BIT_FIELD_REFs on vectors Richard Sandiford
                   ` (23 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:34 UTC (permalink / raw)
  To: gcc-patches

This patch makes expand_vector_ubsan_overflow cope with a polynomial
number of elements.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* internal-fn.c (expand_vector_ubsan_overflow): Handle polynomial
	numbers of elements.

Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c	2017-10-23 17:11:39.913311438 +0100
+++ gcc/internal-fn.c	2017-10-23 17:22:51.056325855 +0100
@@ -1872,7 +1872,7 @@ expand_mul_overflow (location_t loc, tre
 expand_vector_ubsan_overflow (location_t loc, enum tree_code code, tree lhs,
 			      tree arg0, tree arg1)
 {
-  int cnt = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
+  poly_uint64 cnt = TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0));
   rtx_code_label *loop_lab = NULL;
   rtx cntvar = NULL_RTX;
   tree cntv = NULL_TREE;
@@ -1882,6 +1882,8 @@ expand_vector_ubsan_overflow (location_t
   tree resv = NULL_TREE;
   rtx lhsr = NULL_RTX;
   rtx resvr = NULL_RTX;
+  unsigned HOST_WIDE_INT const_cnt = 0;
+  bool use_loop_p = (!cnt.is_constant (&const_cnt) || const_cnt > 4);
 
   if (lhs)
     {
@@ -1902,7 +1904,7 @@ expand_vector_ubsan_overflow (location_t
 	    }
 	}
     }
-  if (cnt > 4)
+  if (use_loop_p)
     {
       do_pending_stack_adjust ();
       loop_lab = gen_label_rtx ();
@@ -1921,10 +1923,10 @@ expand_vector_ubsan_overflow (location_t
       rtx arg1r = expand_normal (arg1);
       arg1 = make_tree (TREE_TYPE (arg1), arg1r);
     }
-  for (int i = 0; i < (cnt > 4 ? 1 : cnt); i++)
+  for (unsigned int i = 0; i < (use_loop_p ? 1 : const_cnt); i++)
     {
       tree op0, op1, res = NULL_TREE;
-      if (cnt > 4)
+      if (use_loop_p)
 	{
 	  tree atype = build_array_type_nelts (eltype, cnt);
 	  op0 = uniform_vector_p (arg0);
@@ -1964,7 +1966,7 @@ expand_vector_ubsan_overflow (location_t
 				  false, false, false, true, &data);
 	  break;
 	case MINUS_EXPR:
-	  if (cnt > 4 ? integer_zerop (arg0) : integer_zerop (op0))
+	  if (use_loop_p ? integer_zerop (arg0) : integer_zerop (op0))
 	    expand_neg_overflow (loc, res, op1, true, &data);
 	  else
 	    expand_addsub_overflow (loc, MINUS_EXPR, res, op0, op1,
@@ -1978,7 +1980,7 @@ expand_vector_ubsan_overflow (location_t
 	  gcc_unreachable ();
 	}
     }
-  if (cnt > 4)
+  if (use_loop_p)
     {
       struct separate_ops ops;
       ops.code = PLUS_EXPR;
@@ -1991,7 +1993,8 @@ expand_vector_ubsan_overflow (location_t
 				    EXPAND_NORMAL);
       if (ret != cntvar)
 	emit_move_insn (cntvar, ret);
-      do_compare_rtx_and_jump (cntvar, GEN_INT (cnt), NE, false,
+      rtx cntrtx = gen_int_mode (cnt, TYPE_MODE (sizetype));
+      do_compare_rtx_and_jump (cntvar, cntrtx, NE, false,
 			       TYPE_MODE (sizetype), NULL_RTX, NULL, loop_lab,
 			       profile_probability::very_likely ());
     }

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [084/nnn] poly_int: folding BIT_FIELD_REFs on vectors
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (83 preceding siblings ...)
  2017-10-23 17:34 ` [085/nnn] poly_int: expand_vector_ubsan_overflow Richard Sandiford
@ 2017-10-23 17:34 ` Richard Sandiford
  2017-11-28 16:33   ` Jeff Law
  2017-10-23 17:35 ` [088/nnn] poly_int: expand_expr_real_2 Richard Sandiford
                   ` (22 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:34 UTC (permalink / raw)
  To: gcc-patches

This patch makes the:

  (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2)

folder cope with polynomial numbers of elements.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* match.pd: Cope with polynomial numbers of vector elements.

Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2017-10-23 17:22:18.230825454 +0100
+++ gcc/match.pd	2017-10-23 17:22:50.031432167 +0100
@@ -4307,46 +4307,43 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        idx = idx / width;
        n = n / width;
        /* Constructor elements can be subvectors.  */
-       unsigned HOST_WIDE_INT k = 1;
+       poly_uint64 k = 1;
        if (CONSTRUCTOR_NELTS (ctor) != 0)
          {
            tree cons_elem = TREE_TYPE (CONSTRUCTOR_ELT (ctor, 0)->value);
 	   if (TREE_CODE (cons_elem) == VECTOR_TYPE)
 	     k = TYPE_VECTOR_SUBPARTS (cons_elem);
 	 }
+       unsigned HOST_WIDE_INT elt, count, const_k;
      }
      (switch
       /* We keep an exact subset of the constructor elements.  */
-      (if ((idx % k) == 0 && (n % k) == 0)
+      (if (multiple_p (idx, k, &elt) && multiple_p (n, k, &count))
        (if (CONSTRUCTOR_NELTS (ctor) == 0)
         { build_constructor (type, NULL); }
-	(with
+	(if (count == 1)
+	 (if (elt < CONSTRUCTOR_NELTS (ctor))
+	  { CONSTRUCTOR_ELT (ctor, elt)->value; }
+	  { build_zero_cst (type); })
 	 {
-	   idx /= k;
-	   n /= k;
-	 }
-	 (if (n == 1)
-	  (if (idx < CONSTRUCTOR_NELTS (ctor))
-	   { CONSTRUCTOR_ELT (ctor, idx)->value; }
-	   { build_zero_cst (type); })
-	  {
-	    vec<constructor_elt, va_gc> *vals;
-	    vec_alloc (vals, n);
-	    for (unsigned i = 0;
-	         i < n && idx + i < CONSTRUCTOR_NELTS (ctor); ++i)
-	      CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
-				      CONSTRUCTOR_ELT (ctor, idx + i)->value);
-	    build_constructor (type, vals);
-	  }))))
+	   vec<constructor_elt, va_gc> *vals;
+	   vec_alloc (vals, count);
+	   for (unsigned i = 0;
+		i < count && elt + i < CONSTRUCTOR_NELTS (ctor); ++i)
+	     CONSTRUCTOR_APPEND_ELT (vals, NULL_TREE,
+				     CONSTRUCTOR_ELT (ctor, elt + i)->value);
+	   build_constructor (type, vals);
+	 })))
       /* The bitfield references a single constructor element.  */
-      (if (idx + n <= (idx / k + 1) * k)
+      (if (k.is_constant (&const_k)
+	   && idx + n <= (idx / const_k + 1) * const_k)
        (switch
-        (if (CONSTRUCTOR_NELTS (ctor) <= idx / k)
+	(if (CONSTRUCTOR_NELTS (ctor) <= idx / const_k)
 	 { build_zero_cst (type); })
-	(if (n == k)
-	 { CONSTRUCTOR_ELT (ctor, idx / k)->value; })
-	(BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / k)->value; }
-		       @1 { bitsize_int ((idx % k) * width); })))))))))
+	(if (n == const_k)
+	 { CONSTRUCTOR_ELT (ctor, idx / const_k)->value; })
+	(BIT_FIELD_REF { CONSTRUCTOR_ELT (ctor, idx / const_k)->value; }
+		       @1 { bitsize_int ((idx % const_k) * width); })))))))))
 
 /* Simplify a bit extraction from a bit insertion for the cases with
    the inserted element fully covering the extraction or the insertion

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [083/nnn] poly_int: fold_indirect_ref_1
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (81 preceding siblings ...)
  2017-10-23 17:33 ` [081/nnn] poly_int: brig vector elements Richard Sandiford
@ 2017-10-23 17:34 ` Richard Sandiford
  2017-11-28 16:34   ` Jeff Law
  2017-10-23 17:34 ` [085/nnn] poly_int: expand_vector_ubsan_overflow Richard Sandiford
                   ` (24 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:34 UTC (permalink / raw)
  To: gcc-patches

This patch makes fold_indirect_ref_1 handle polynomial offsets in
a POINTER_PLUS_EXPR.  The specific reason for doing this now is
to handle:

 		  (tree_to_uhwi (part_width) / BITS_PER_UNIT
 		   * TYPE_VECTOR_SUBPARTS (op00type));

when TYPE_VECTOR_SUBPARTS becomes a poly_int.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* fold-const.c (fold_indirect_ref_1): Handle polynomial offsets
	in a POINTER_PLUS_EXPR.

Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 17:20:50.881679906 +0100
+++ gcc/fold-const.c	2017-10-23 17:22:48.984540760 +0100
@@ -14137,6 +14137,7 @@ fold_indirect_ref_1 (location_t loc, tre
 {
   tree sub = op0;
   tree subtype;
+  poly_uint64 const_op01;
 
   STRIP_NOPS (sub);
   subtype = TREE_TYPE (sub);
@@ -14191,7 +14192,7 @@ fold_indirect_ref_1 (location_t loc, tre
     }
 
   if (TREE_CODE (sub) == POINTER_PLUS_EXPR
-      && TREE_CODE (TREE_OPERAND (sub, 1)) == INTEGER_CST)
+      && poly_int_tree_p (TREE_OPERAND (sub, 1), &const_op01))
     {
       tree op00 = TREE_OPERAND (sub, 0);
       tree op01 = TREE_OPERAND (sub, 1);
@@ -14208,15 +14209,12 @@ fold_indirect_ref_1 (location_t loc, tre
 	      && type == TREE_TYPE (op00type))
 	    {
 	      tree part_width = TYPE_SIZE (type);
-	      unsigned HOST_WIDE_INT max_offset
+	      poly_uint64 max_offset
 		= (tree_to_uhwi (part_width) / BITS_PER_UNIT
 		   * TYPE_VECTOR_SUBPARTS (op00type));
-	      if (tree_int_cst_sign_bit (op01) == 0
-		  && compare_tree_int (op01, max_offset) == -1)
+	      if (must_lt (const_op01, max_offset))
 		{
-		  unsigned HOST_WIDE_INT offset = tree_to_uhwi (op01);
-		  unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
-		  tree index = bitsize_int (indexi);
+		  tree index = bitsize_int (const_op01 * BITS_PER_UNIT);
 		  return fold_build3_loc (loc,
 					  BIT_FIELD_REF, type, op00,
 					  part_width, index);
@@ -14226,8 +14224,8 @@ fold_indirect_ref_1 (location_t loc, tre
 	  else if (TREE_CODE (op00type) == COMPLEX_TYPE
 		   && type == TREE_TYPE (op00type))
 	    {
-	      tree size = TYPE_SIZE_UNIT (type);
-	      if (tree_int_cst_equal (size, op01))
+	      if (must_eq (wi::to_poly_offset (TYPE_SIZE_UNIT (type)),
+			   const_op01))
 		return fold_build1_loc (loc, IMAGPART_EXPR, type, op00);
 	    }
 	  /* ((foo *)&fooarray)[1] => fooarray[1] */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [086/nnn] poly_int: REGMODE_NATURAL_SIZE
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (86 preceding siblings ...)
  2017-10-23 17:35 ` [087/nnn] poly_int: subreg_get_info Richard Sandiford
@ 2017-10-23 17:35 ` Richard Sandiford
  2017-12-05 23:33   ` Jeff Law
  2017-10-23 17:36 ` [089/nnn] poly_int: expand_expr_real_1 Richard Sandiford
                   ` (19 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:35 UTC (permalink / raw)
  To: gcc-patches

This patch makes target-independent code that uses REGMODE_NATURAL_SIZE
treat it as a poly_int rather than a constant.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* combine.c (can_change_dest_mode): Handle polynomial
	REGMODE_NATURAL_SIZE.
	* expmed.c (store_bit_field_1): Likewise.
	* expr.c (store_constructor): Likewise.
	* emit-rtl.c (validate_subreg): Operate on polynomial mode sizes
	and polynomial REGMODE_NATURAL_SIZE.
	(gen_lowpart_common): Likewise.
	* reginfo.c (record_subregs_of_mode): Likewise.
	* rtlanal.c (read_modify_subreg_p): Likewise.

Index: gcc/combine.c
===================================================================
--- gcc/combine.c	2017-10-23 17:25:26.554256722 +0100
+++ gcc/combine.c	2017-10-23 17:25:30.702136080 +0100
@@ -2474,8 +2474,8 @@ can_change_dest_mode (rtx x, int added_s
 
   /* Don't change between modes with different underlying register sizes,
      since this could lead to invalid subregs.  */
-  if (REGMODE_NATURAL_SIZE (mode)
-      != REGMODE_NATURAL_SIZE (GET_MODE (x)))
+  if (may_ne (REGMODE_NATURAL_SIZE (mode),
+	      REGMODE_NATURAL_SIZE (GET_MODE (x))))
     return false;
 
   regno = REGNO (x);
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 17:23:00.293367701 +0100
+++ gcc/expmed.c	2017-10-23 17:25:30.703136044 +0100
@@ -778,7 +778,7 @@ store_bit_field_1 (rtx str_rtx, poly_uin
 	 In the latter case, use subreg on the rhs side, not lhs.  */
       rtx sub;
       HOST_WIDE_INT regnum;
-      HOST_WIDE_INT regsize = REGMODE_NATURAL_SIZE (GET_MODE (op0));
+      poly_uint64 regsize = REGMODE_NATURAL_SIZE (GET_MODE (op0));
       if (known_zero (bitnum)
 	  && must_eq (bitsize, GET_MODE_BITSIZE (GET_MODE (op0))))
 	{
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:23:00.293367701 +0100
+++ gcc/expr.c	2017-10-23 17:25:30.704136008 +0100
@@ -6204,8 +6204,8 @@ store_constructor (tree exp, rtx target,
 	   a constant.  But if more than one register is involved,
 	   this probably loses.  */
 	else if (REG_P (target) && TREE_STATIC (exp)
-		 && (GET_MODE_SIZE (GET_MODE (target))
-		     <= REGMODE_NATURAL_SIZE (GET_MODE (target))))
+		 && must_le (GET_MODE_SIZE (GET_MODE (target)),
+			     REGMODE_NATURAL_SIZE (GET_MODE (target))))
 	  {
 	    emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
 	    cleared = 1;
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 17:23:00.293367701 +0100
+++ gcc/emit-rtl.c	2017-10-23 17:25:30.703136044 +0100
@@ -924,8 +924,13 @@ gen_tmp_stack_mem (machine_mode mode, rt
 validate_subreg (machine_mode omode, machine_mode imode,
 		 const_rtx reg, poly_uint64 offset)
 {
-  unsigned int isize = GET_MODE_SIZE (imode);
-  unsigned int osize = GET_MODE_SIZE (omode);
+  poly_uint64 isize = GET_MODE_SIZE (imode);
+  poly_uint64 osize = GET_MODE_SIZE (omode);
+
+  /* The sizes must be ordered, so that we know whether the subreg
+     is partial, paradoxical or complete.  */
+  if (!ordered_p (isize, osize))
+    return false;
 
   /* All subregs must be aligned.  */
   if (!multiple_p (offset, osize))
@@ -935,7 +940,7 @@ validate_subreg (machine_mode omode, mac
   if (may_ge (offset, isize))
     return false;
 
-  unsigned int regsize = REGMODE_NATURAL_SIZE (imode);
+  poly_uint64 regsize = REGMODE_NATURAL_SIZE (imode);
 
   /* ??? This should not be here.  Temporarily continue to allow word_mode
      subregs of anything.  The most common offender is (subreg:SI (reg:DF)).
@@ -945,7 +950,7 @@ validate_subreg (machine_mode omode, mac
     ;
   /* ??? Similarly, e.g. with (subreg:DF (reg:TI)).  Though store_bit_field
      is the culprit here, and not the backends.  */
-  else if (osize >= regsize && isize >= osize)
+  else if (must_ge (osize, regsize) && must_ge (isize, osize))
     ;
   /* Allow component subregs of complex and vector.  Though given the below
      extraction rules, it's not always clear what that means.  */
@@ -964,7 +969,7 @@ validate_subreg (machine_mode omode, mac
      (subreg:SI (reg:DF) 0) isn't.  */
   else if (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))
     {
-      if (! (isize == osize
+      if (! (must_eq (isize, osize)
 	     /* LRA can use subreg to store a floating point value in
 		an integer mode.  Although the floating point and the
 		integer modes need the same number of hard registers,
@@ -976,7 +981,7 @@ validate_subreg (machine_mode omode, mac
     }
 
   /* Paradoxical subregs must have offset zero.  */
-  if (osize > isize)
+  if (may_gt (osize, isize))
     return known_zero (offset);
 
   /* This is a normal subreg.  Verify that the offset is representable.  */
@@ -996,6 +1001,12 @@ validate_subreg (machine_mode omode, mac
       return subreg_offset_representable_p (regno, imode, offset, omode);
     }
 
+  /* The outer size must be ordered wrt the register size, otherwise
+     we wouldn't know at compile time how many registers the outer
+     mode occupies.  */
+  if (!ordered_p (osize, regsize))
+    return false;
+
   /* For pseudo registers, we want most of the same checks.  Namely:
 
      Assume that the pseudo register will be allocated to hard registers
@@ -1006,10 +1017,12 @@ validate_subreg (machine_mode omode, mac
 
      Given that we've already checked the mode and offset alignment,
      we only have to check subblock subregs here.  */
-  if (osize < regsize
+  if (may_lt (osize, regsize)
       && ! (lra_in_progress && (FLOAT_MODE_P (imode) || FLOAT_MODE_P (omode))))
     {
-      poly_uint64 block_size = MIN (isize, regsize);
+      /* It is invalid for the target to pick a register size for a mode
+	 that isn't ordered wrt to the size of that mode.  */
+      poly_uint64 block_size = ordered_min (isize, regsize);
       unsigned int start_reg;
       poly_uint64 offset_within_reg;
       if (!can_div_trunc_p (offset, block_size, &start_reg, &offset_within_reg)
@@ -1548,39 +1561,43 @@ maybe_set_max_label_num (rtx_code_label
 rtx
 gen_lowpart_common (machine_mode mode, rtx x)
 {
-  int msize = GET_MODE_SIZE (mode);
-  int xsize;
+  poly_uint64 msize = GET_MODE_SIZE (mode);
   machine_mode innermode;
 
   /* Unfortunately, this routine doesn't take a parameter for the mode of X,
      so we have to make one up.  Yuk.  */
   innermode = GET_MODE (x);
   if (CONST_INT_P (x)
-      && msize * BITS_PER_UNIT <= HOST_BITS_PER_WIDE_INT)
+      && must_le (msize * BITS_PER_UNIT,
+		  (unsigned HOST_WIDE_INT) HOST_BITS_PER_WIDE_INT))
     innermode = int_mode_for_size (HOST_BITS_PER_WIDE_INT, 0).require ();
   else if (innermode == VOIDmode)
     innermode = int_mode_for_size (HOST_BITS_PER_DOUBLE_INT, 0).require ();
 
-  xsize = GET_MODE_SIZE (innermode);
-
   gcc_assert (innermode != VOIDmode && innermode != BLKmode);
 
   if (innermode == mode)
     return x;
 
+  /* The size of the outer and inner modes must be ordered.  */
+  poly_uint64 xsize = GET_MODE_SIZE (innermode);
+  if (!ordered_p (msize, xsize))
+    return 0;
+
   if (SCALAR_FLOAT_MODE_P (mode))
     {
       /* Don't allow paradoxical FLOAT_MODE subregs.  */
-      if (msize > xsize)
+      if (may_gt (msize, xsize))
 	return 0;
     }
   else
     {
       /* MODE must occupy no more of the underlying registers than X.  */
-      unsigned int regsize = REGMODE_NATURAL_SIZE (innermode);
-      unsigned int mregs = CEIL (msize, regsize);
-      unsigned int xregs = CEIL (xsize, regsize);
-      if (mregs > xregs)
+      poly_uint64 regsize = REGMODE_NATURAL_SIZE (innermode);
+      unsigned int mregs, xregs;
+      if (!can_div_away_from_zero_p (msize, regsize, &mregs)
+	  || !can_div_away_from_zero_p (xsize, regsize, &xregs)
+	  || mregs > xregs)
 	return 0;
     }
 
Index: gcc/reginfo.c
===================================================================
--- gcc/reginfo.c	2017-10-23 17:23:00.293367701 +0100
+++ gcc/reginfo.c	2017-10-23 17:25:30.704136008 +0100
@@ -1294,10 +1294,14 @@ record_subregs_of_mode (rtx subreg, bool
 	 subregs will be invalid.
 
 	 This relies on the fact that we've already been passed
-	 SUBREG with PARTIAL_DEF set to false.  */
-      unsigned int size = MAX (REGMODE_NATURAL_SIZE (shape.inner_mode),
-			       GET_MODE_SIZE (shape.outer_mode));
-      gcc_checking_assert (size < GET_MODE_SIZE (shape.inner_mode));
+	 SUBREG with PARTIAL_DEF set to false.
+
+	 The size of the outer mode must ordered wrt the size of the
+	 inner mode's registers, since otherwise we wouldn't know at
+	 compile time how many registers the outer mode occupies.  */
+      poly_uint64 size = MAX (REGMODE_NATURAL_SIZE (shape.inner_mode),
+			      GET_MODE_SIZE (shape.outer_mode));
+      gcc_checking_assert (must_lt (size, GET_MODE_SIZE (shape.inner_mode)));
       if (must_ge (shape.offset, size))
 	shape.offset -= size;
       else
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:23:00.293367701 +0100
+++ gcc/rtlanal.c	2017-10-23 17:25:30.705135972 +0100
@@ -1395,13 +1395,15 @@ modified_in_p (const_rtx x, const_rtx in
 bool
 read_modify_subreg_p (const_rtx x)
 {
-  unsigned int isize, osize;
   if (GET_CODE (x) != SUBREG)
     return false;
-  isize = GET_MODE_SIZE (GET_MODE (SUBREG_REG (x)));
-  osize = GET_MODE_SIZE (GET_MODE (x));
-  return isize > osize
-	 && isize > REGMODE_NATURAL_SIZE (GET_MODE (SUBREG_REG (x)));
+  poly_uint64 isize = GET_MODE_SIZE (GET_MODE (SUBREG_REG (x)));
+  poly_uint64 osize = GET_MODE_SIZE (GET_MODE (x));
+  poly_uint64 regsize = REGMODE_NATURAL_SIZE (GET_MODE (SUBREG_REG (x)));
+  /* The inner and outer modes of a subreg must be ordered, so that we
+     can tell whether they're paradoxical or partial.  */
+  gcc_checking_assert (ordered_p (isize, osize));
+  return (may_gt (isize, osize) && may_gt (isize, regsize));
 }
 \f
 /* Helper function for set_of.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [088/nnn] poly_int: expand_expr_real_2
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (84 preceding siblings ...)
  2017-10-23 17:34 ` [084/nnn] poly_int: folding BIT_FIELD_REFs on vectors Richard Sandiford
@ 2017-10-23 17:35 ` Richard Sandiford
  2017-11-28  8:49   ` Jeff Law
  2017-10-23 17:35 ` [087/nnn] poly_int: subreg_get_info Richard Sandiford
                   ` (21 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:35 UTC (permalink / raw)
  To: gcc-patches

This patch makes expand_expr_real_2 cope with polynomial mode sizes
when handling conversions involving a union type.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expr.c (expand_expr_real_2): When handling conversions involving
	unions, apply tree_to_poly_uint64 to the TYPE_SIZE rather than
	multiplying int_size_in_bytes by BITS_PER_UNIT.  Treat GET_MODE_BISIZE
	as a poly_uint64 too.

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:25:30.704136008 +0100
+++ gcc/expr.c	2017-10-23 17:25:34.105013764 +0100
@@ -8354,11 +8354,14 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
 			  && !TYPE_REVERSE_STORAGE_ORDER (type));
 
 	      /* Store this field into a union of the proper type.  */
+	      poly_uint64 op0_size
+		= tree_to_poly_uint64 (TYPE_SIZE (TREE_TYPE (treeop0)));
+	      poly_uint64 union_size = GET_MODE_BITSIZE (mode);
 	      store_field (target,
-			   MIN ((int_size_in_bytes (TREE_TYPE
-						    (treeop0))
-				 * BITS_PER_UNIT),
-				(HOST_WIDE_INT) GET_MODE_BITSIZE (mode)),
+			   /* The conversion must be constructed so that
+			      we know at compile time how many bits
+			      to preserve.  */
+			   ordered_min (op0_size, union_size),
 			   0, 0, 0, TYPE_MODE (valtype), treeop0, 0,
 			   false, false);
 	    }

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [087/nnn] poly_int: subreg_get_info
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (85 preceding siblings ...)
  2017-10-23 17:35 ` [088/nnn] poly_int: expand_expr_real_2 Richard Sandiford
@ 2017-10-23 17:35 ` Richard Sandiford
  2017-11-28 16:29   ` Jeff Law
  2017-10-23 17:35 ` [086/nnn] poly_int: REGMODE_NATURAL_SIZE Richard Sandiford
                   ` (20 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:35 UTC (permalink / raw)
  To: gcc-patches

This patch makes subreg_get_info handle polynomial sizes.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtlanal.c (subreg_get_info): Handle polynomial mode sizes.

Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:25:30.705135972 +0100
+++ gcc/rtlanal.c	2017-10-23 17:25:32.610067499 +0100
@@ -3694,8 +3694,9 @@ subreg_get_info (unsigned int xregno, ma
 
   gcc_assert (xregno < FIRST_PSEUDO_REGISTER);
 
-  unsigned int xsize = GET_MODE_SIZE (xmode);
-  unsigned int ysize = GET_MODE_SIZE (ymode);
+  poly_uint64 xsize = GET_MODE_SIZE (xmode);
+  poly_uint64 ysize = GET_MODE_SIZE (ymode);
+
   bool rknown = false;
 
   /* If the register representation of a non-scalar mode has holes in it,
@@ -3707,6 +3708,7 @@ subreg_get_info (unsigned int xregno, ma
       /* As a consequence, we must be dealing with a constant number of
 	 scalars, and thus a constant offset.  */
       HOST_WIDE_INT coffset = offset.to_constant ();
+      HOST_WIDE_INT cysize = ysize.to_constant ();
       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
       unsigned int nunits = GET_MODE_NUNITS (xmode);
       scalar_mode xmode_unit = GET_MODE_INNER (xmode);
@@ -3727,7 +3729,7 @@ subreg_get_info (unsigned int xregno, ma
 	 of each unit.  */
       if ((coffset / GET_MODE_SIZE (xmode_unit) + 1 < nunits)
 	  && (coffset / GET_MODE_SIZE (xmode_unit)
-	      != ((coffset + ysize - 1) / GET_MODE_SIZE (xmode_unit))))
+	      != ((coffset + cysize - 1) / GET_MODE_SIZE (xmode_unit))))
 	{
 	  info->representable_p = false;
 	  rknown = true;
@@ -3738,8 +3740,12 @@ subreg_get_info (unsigned int xregno, ma
 
   nregs_ymode = hard_regno_nregs (xregno, ymode);
 
+  /* Subreg sizes must be ordered, so that we can tell whether they are
+     partial, paradoxical or complete.  */
+  gcc_checking_assert (ordered_p (xsize, ysize));
+
   /* Paradoxical subregs are otherwise valid.  */
-  if (!rknown && known_zero (offset) && ysize > xsize)
+  if (!rknown && known_zero (offset) && may_gt (ysize, xsize))
     {
       info->representable_p = true;
       /* If this is a big endian paradoxical subreg, which uses more
@@ -3761,20 +3767,19 @@ subreg_get_info (unsigned int xregno, ma
 
   /* If registers store different numbers of bits in the different
      modes, we cannot generally form this subreg.  */
+  poly_uint64 regsize_xmode, regsize_ymode;
   if (!HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode)
       && !HARD_REGNO_NREGS_HAS_PADDING (xregno, ymode)
-      && (xsize % nregs_xmode) == 0
-      && (ysize % nregs_ymode) == 0)
+      && multiple_p (xsize, nregs_xmode, &regsize_xmode)
+      && multiple_p (ysize, nregs_ymode, &regsize_ymode))
     {
-      int regsize_xmode = xsize / nregs_xmode;
-      int regsize_ymode = ysize / nregs_ymode;
       if (!rknown
-	  && ((nregs_ymode > 1 && regsize_xmode > regsize_ymode)
-	      || (nregs_xmode > 1 && regsize_ymode > regsize_xmode)))
+	  && ((nregs_ymode > 1 && may_gt (regsize_xmode, regsize_ymode))
+	      || (nregs_xmode > 1 && may_gt (regsize_ymode, regsize_xmode))))
 	{
 	  info->representable_p = false;
-	  info->nregs = CEIL (ysize, regsize_xmode);
-	  if (!can_div_trunc_p (offset, regsize_xmode, &info->offset))
+	  if (!can_div_away_from_zero_p (ysize, regsize_xmode, &info->nregs)
+	      || !can_div_trunc_p (offset, regsize_xmode, &info->offset))
 	    /* Checked by validate_subreg.  We must know at compile time
 	       which inner registers are being accessed.  */
 	    gcc_unreachable ();
@@ -3800,7 +3805,7 @@ subreg_get_info (unsigned int xregno, ma
       HOST_WIDE_INT count;
       if (!rknown
 	  && WORDS_BIG_ENDIAN == REG_WORDS_BIG_ENDIAN
-	  && regsize_xmode == regsize_ymode
+	  && must_eq (regsize_xmode, regsize_ymode)
 	  && constant_multiple_p (offset, regsize_ymode, &count))
 	{
 	  info->representable_p = true;
@@ -3837,8 +3842,7 @@ subreg_get_info (unsigned int xregno, ma
      be exact, otherwise we don't know how to verify the constraint.
      These conditions may be relaxed but subreg_regno_offset would
      need to be redesigned.  */
-  gcc_assert ((xsize % num_blocks) == 0);
-  poly_uint64 bytes_per_block = xsize / num_blocks;
+  poly_uint64 bytes_per_block = exact_div (xsize, num_blocks);
 
   /* Get the number of the first block that contains the subreg and the byte
      offset of the subreg from the start of that block.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [089/nnn] poly_int: expand_expr_real_1
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (87 preceding siblings ...)
  2017-10-23 17:35 ` [086/nnn] poly_int: REGMODE_NATURAL_SIZE Richard Sandiford
@ 2017-10-23 17:36 ` Richard Sandiford
  2017-11-28  8:41   ` Jeff Law
  2017-10-23 17:36 ` [090/nnn] poly_int: set_inc_state Richard Sandiford
                   ` (18 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:36 UTC (permalink / raw)
  To: gcc-patches

This patch makes the VIEW_CONVERT_EXPR handling in expand_expr_real_1
cope with polynomial type and mode sizes.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expr.c (expand_expr_real_1): Use tree_to_poly_uint64
	instead of int_size_in_bytes when handling VIEW_CONVERT_EXPRs
	via stack temporaries.  Treat the mode size as polynomial too.

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:25:34.105013764 +0100
+++ gcc/expr.c	2017-10-23 17:25:35.142976454 +0100
@@ -11116,9 +11116,10 @@ expand_expr_real_1 (tree exp, rtx target
 	      else if (STRICT_ALIGNMENT)
 		{
 		  tree inner_type = TREE_TYPE (treeop0);
-		  HOST_WIDE_INT temp_size
-		    = MAX (int_size_in_bytes (inner_type),
-			   (HOST_WIDE_INT) GET_MODE_SIZE (mode));
+		  poly_uint64 mode_size = GET_MODE_SIZE (mode);
+		  poly_uint64 op0_size
+		    = tree_to_poly_uint64 (TYPE_SIZE_UNIT (inner_type));
+		  poly_int64 temp_size = upper_bound (op0_size, mode_size);
 		  rtx new_rtx
 		    = assign_stack_temp_for_type (mode, temp_size, type);
 		  rtx new_with_op0_mode

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [090/nnn] poly_int: set_inc_state
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (88 preceding siblings ...)
  2017-10-23 17:36 ` [089/nnn] poly_int: expand_expr_real_1 Richard Sandiford
@ 2017-10-23 17:36 ` Richard Sandiford
  2017-11-28  8:35   ` Jeff Law
  2017-10-23 17:37 ` [092/nnn] poly_int: PUSH_ROUNDING Richard Sandiford
                   ` (17 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:36 UTC (permalink / raw)
  To: gcc-patches

This trivial patch makes auto-inc-dec.c:set_inc_state take a poly_int64.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* auto-inc-dec.c (set_inc_state): Take the mode size as a poly_int64
	rather than an int.

Index: gcc/auto-inc-dec.c
===================================================================
--- gcc/auto-inc-dec.c	2017-07-27 10:37:54.907033464 +0100
+++ gcc/auto-inc-dec.c	2017-10-23 17:25:36.142940510 +0100
@@ -152,14 +152,14 @@ enum gen_form
 static rtx mem_tmp;
 
 static enum inc_state
-set_inc_state (HOST_WIDE_INT val, int size)
+set_inc_state (HOST_WIDE_INT val, poly_int64 size)
 {
   if (val == 0)
     return INC_ZERO;
   if (val < 0)
-    return (val == -size) ? INC_NEG_SIZE : INC_NEG_ANY;
+    return must_eq (val, -size) ? INC_NEG_SIZE : INC_NEG_ANY;
   else
-    return (val == size) ? INC_POS_SIZE : INC_POS_ANY;
+    return must_eq (val, size) ? INC_POS_SIZE : INC_POS_ANY;
 }
 
 /* The DECISION_TABLE that describes what form, if any, the increment

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [092/nnn] poly_int: PUSH_ROUNDING
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (89 preceding siblings ...)
  2017-10-23 17:36 ` [090/nnn] poly_int: set_inc_state Richard Sandiford
@ 2017-10-23 17:37 ` Richard Sandiford
  2017-11-28 16:21   ` Jeff Law
  2017-10-23 17:37 ` [093/nnn] poly_int: adjust_mems Richard Sandiford
                   ` (16 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:37 UTC (permalink / raw)
  To: gcc-patches

PUSH_ROUNDING is difficult to convert to a hook since there is still
a lot of conditional code based on it.  It isn't clear that a direct
conversion with checks for null hooks is the right thing to do.

Rather than untangle that, this patch converts all implementations
that do something to out-of-line functions that have the same
interface as a hook would have.  This should at least help towards
any future hook conversion.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* config/cr16/cr16-protos.h (cr16_push_rounding): Declare.
	* config/cr16/cr16.h (PUSH_ROUNDING): Move implementation to...
	* config/cr16/cr16.c (cr16_push_rounding): ...this new function.
	* config/h8300/h8300-protos.h (h8300_push_rounding): Declare.
	* config/h8300/h8300.h (PUSH_ROUNDING): Move implementation to...
	* config/h8300/h8300.c (h8300_push_rounding): ...this new function.
	* config/i386/i386-protos.h (ix86_push_rounding): Declare.
	* config/i386/i386.h (PUSH_ROUNDING): Move implementation to...
	* config/i386/i386.c (ix86_push_rounding): ...this new function.
	* config/m32c/m32c-protos.h (m32c_push_rounding): Take and return
	a poly_int64.
	* config/m32c/m32c.c (m32c_push_rounding): Likewise.
	* config/m68k/m68k-protos.h (m68k_push_rounding): Declare.
	* config/m68k/m68k.h (PUSH_ROUNDING): Move implementation to...
	* config/m68k/m68k.c (m68k_push_rounding): ...this new function.
	* config/pdp11/pdp11-protos.h (pdp11_push_rounding): Declare.
	* config/pdp11/pdp11.h (PUSH_ROUNDING): Move implementation to...
	* config/pdp11/pdp11.c (pdp11_push_rounding): ...this new function.
	* config/stormy16/stormy16-protos.h (xstormy16_push_rounding): Declare.
	* config/stormy16/stormy16.h (PUSH_ROUNDING): Move implementation to...
	* config/stormy16/stormy16.c (xstormy16_push_rounding): ...this new
	function.
	* expr.c (emit_move_resolve_push): Treat the input and result
	of PUSH_ROUNDING as a poly_int64.
	(emit_move_complex_push, emit_single_push_insn_1): Likewise.
	(emit_push_insn): Likewise.
	* lra-eliminations.c (mark_not_eliminable): Likewise.
	* recog.c (push_operand): Likewise.
	* reload1.c (elimination_effects): Likewise.
	* rtlanal.c (nonzero_bits1): Likewise.
	* calls.c (store_one_arg): Likewise.  Require the padding to be
	known at compile time.

Index: gcc/config/cr16/cr16-protos.h
===================================================================
--- gcc/config/cr16/cr16-protos.h	2017-09-04 11:49:42.896500726 +0100
+++ gcc/config/cr16/cr16-protos.h	2017-10-23 17:25:38.230865460 +0100
@@ -94,5 +94,6 @@ extern const char *cr16_emit_logical_di
 /* Handling the "interrupt" attribute.  */
 extern int cr16_interrupt_function_p (void);
 extern bool cr16_is_data_model (enum data_model_type);
+extern poly_int64 cr16_push_rounding (poly_int64);
 
 #endif /* Not GCC_CR16_PROTOS_H.  */ 
Index: gcc/config/cr16/cr16.h
===================================================================
--- gcc/config/cr16/cr16.h	2017-10-23 11:41:22.824941066 +0100
+++ gcc/config/cr16/cr16.h	2017-10-23 17:25:38.231865424 +0100
@@ -383,7 +383,7 @@ #define ACCUMULATE_OUTGOING_ARGS 0
 
 #define PUSH_ARGS 1
 
-#define PUSH_ROUNDING(BYTES) (((BYTES) + 1) & ~1)
+#define PUSH_ROUNDING(BYTES) cr16_push_rounding (BYTES)
 
 #ifndef CUMULATIVE_ARGS
 struct cumulative_args
Index: gcc/config/cr16/cr16.c
===================================================================
--- gcc/config/cr16/cr16.c	2017-10-23 17:19:01.400170158 +0100
+++ gcc/config/cr16/cr16.c	2017-10-23 17:25:38.231865424 +0100
@@ -2215,6 +2215,14 @@ cr16_emit_logical_di (rtx *operands, enu
   return "";
 }
 
+/* Implement PUSH_ROUNDING.  */
+
+poly_int64
+cr16_push_rounding (poly_int64 bytes)
+{
+  return (bytes + 1) & ~1;
+}
+
 /* Initialize 'targetm' variable which contains pointers to functions 
    and data relating to the target machine.  */
 
Index: gcc/config/h8300/h8300-protos.h
===================================================================
--- gcc/config/h8300/h8300-protos.h	2017-09-12 14:29:25.231530806 +0100
+++ gcc/config/h8300/h8300-protos.h	2017-10-23 17:25:38.231865424 +0100
@@ -112,5 +112,6 @@ extern bool            h8sx_mergeable_me
 extern bool            h8sx_emit_movmd (rtx, rtx, rtx, HOST_WIDE_INT);
 extern void            h8300_swap_into_er6 (rtx);
 extern void            h8300_swap_out_of_er6 (rtx);
+extern poly_int64      h8300_push_rounding (poly_int64);
 
 #endif /* ! GCC_H8300_PROTOS_H */
Index: gcc/config/h8300/h8300.h
===================================================================
--- gcc/config/h8300/h8300.h	2017-10-23 11:41:22.920697531 +0100
+++ gcc/config/h8300/h8300.h	2017-10-23 17:25:38.232865388 +0100
@@ -359,18 +359,7 @@ #define FRAME_GROWS_DOWNWARD 1
 
 #define STARTING_FRAME_OFFSET 0
 
-/* If we generate an insn to push BYTES bytes,
-   this says how many the stack pointer really advances by.
-
-   On the H8/300, @-sp really pushes a byte if you ask it to - but that's
-   dangerous, so we claim that it always pushes a word, then we catch
-   the mov.b rx,@-sp and turn it into a mov.w rx,@-sp on output.
-
-   On the H8/300H, we simplify TARGET_QUICKCALL by setting this to 4
-   and doing a similar thing.  */
-
-#define PUSH_ROUNDING(BYTES) \
-  (((BYTES) + PARM_BOUNDARY / 8 - 1) & -PARM_BOUNDARY / 8)
+#define PUSH_ROUNDING(BYTES) h8300_push_rounding (BYTES)
 
 /* Offset of first parameter from the argument pointer register value.  */
 /* Is equal to the size of the saved fp + pc, even if an fp isn't
Index: gcc/config/h8300/h8300.c
===================================================================
--- gcc/config/h8300/h8300.c	2017-10-23 17:11:40.151767139 +0100
+++ gcc/config/h8300/h8300.c	2017-10-23 17:25:38.232865388 +0100
@@ -6044,6 +6044,21 @@ h8300_trampoline_init (rtx m_tramp, tree
       emit_move_insn (mem, tem);
     }
 }
+
+/* Implement PUSH_ROUNDING.
+
+   On the H8/300, @-sp really pushes a byte if you ask it to - but that's
+   dangerous, so we claim that it always pushes a word, then we catch
+   the mov.b rx,@-sp and turn it into a mov.w rx,@-sp on output.
+
+   On the H8/300H, we simplify TARGET_QUICKCALL by setting this to 4
+   and doing a similar thing.  */
+
+poly_int64
+h8300_push_rounding (poly_int64 bytes)
+{
+  return ((bytes + PARM_BOUNDARY / 8 - 1) & (-PARM_BOUNDARY / 8));
+}
 \f
 /* Initialize the GCC target structure.  */
 #undef TARGET_ATTRIBUTE_TABLE
Index: gcc/config/i386/i386-protos.h
===================================================================
--- gcc/config/i386/i386-protos.h	2017-10-23 11:41:22.909090687 +0100
+++ gcc/config/i386/i386-protos.h	2017-10-23 17:25:38.232865388 +0100
@@ -328,6 +328,8 @@ extern void ix86_core2i7_init_hooks (voi
 
 extern int ix86_atom_sched_reorder (FILE *, int, rtx_insn **, int *, int);
 
+extern poly_int64 ix86_push_rounding (poly_int64);
+
 #ifdef RTX_CODE
 /* Target data for multipass lookahead scheduling.
    Currently used for Core 2/i7 tuning.  */
Index: gcc/config/i386/i386.h
===================================================================
--- gcc/config/i386/i386.h	2017-10-23 11:41:22.852023702 +0100
+++ gcc/config/i386/i386.h	2017-10-23 17:25:38.237865208 +0100
@@ -1525,15 +1525,7 @@ #define FRAME_GROWS_DOWNWARD 1
    of the first local allocated.  */
 #define STARTING_FRAME_OFFSET 0
 
-/* If we generate an insn to push BYTES bytes, this says how many the stack
-   pointer really advances by.  On 386, we have pushw instruction that
-   decrements by exactly 2 no matter what the position was, there is no pushb.
-
-   But as CIE data alignment factor on this arch is -4 for 32bit targets
-   and -8 for 64bit targets, we need to make sure all stack pointer adjustments
-   are in multiple of 4 for 32bit targets and 8 for 64bit targets.  */
-
-#define PUSH_ROUNDING(BYTES) ROUND_UP ((BYTES), UNITS_PER_WORD)
+#define PUSH_ROUNDING(BYTES) ix86_push_rounding (BYTES)
 
 /* If defined, the maximum amount of space required for outgoing arguments
    will be computed and placed into the variable `crtl->outgoing_args_size'.
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	2017-10-23 17:22:32.719227954 +0100
+++ gcc/config/i386/i386.c	2017-10-23 17:25:38.237865208 +0100
@@ -49022,6 +49022,19 @@ ix86_excess_precision (enum excess_preci
   return FLT_EVAL_METHOD_UNPREDICTABLE;
 }
 
+/* Implement PUSH_ROUNDING.  On 386, we have pushw instruction that
+   decrements by exactly 2 no matter what the position was, there is no pushb.
+
+   But as CIE data alignment factor on this arch is -4 for 32bit targets
+   and -8 for 64bit targets, we need to make sure all stack pointer adjustments
+   are in multiple of 4 for 32bit targets and 8 for 64bit targets.  */
+
+poly_int64
+ix86_push_rounding (poly_int64 bytes)
+{
+  return ROUND_UP (bytes, UNITS_PER_WORD);
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
Index: gcc/config/m32c/m32c-protos.h
===================================================================
--- gcc/config/m32c/m32c-protos.h	2017-09-15 13:56:20.271148742 +0100
+++ gcc/config/m32c/m32c-protos.h	2017-10-23 17:25:38.237865208 +0100
@@ -29,7 +29,7 @@ void m32c_init_expanders (void);
 int  m32c_initial_elimination_offset (int, int);
 void m32c_output_reg_pop (FILE *, int);
 void m32c_output_reg_push (FILE *, int);
-unsigned int  m32c_push_rounding (int);
+poly_int64 m32c_push_rounding (poly_int64);
 void m32c_register_pragmas (void);
 void m32c_note_pragma_address (const char *, unsigned);
 int  m32c_regno_ok_for_base_p (int);
Index: gcc/config/m32c/m32c.c
===================================================================
--- gcc/config/m32c/m32c.c	2017-10-23 17:11:40.159782457 +0100
+++ gcc/config/m32c/m32c.c	2017-10-23 17:25:38.238865172 +0100
@@ -1290,8 +1290,8 @@ m32c_initial_elimination_offset (int fro
 
 /* Implements PUSH_ROUNDING.  The R8C and M16C have byte stacks, the
    M32C has word stacks.  */
-unsigned int
-m32c_push_rounding (int n)
+poly_int64
+m32c_push_rounding (poly_int64 n)
 {
   if (TARGET_R8C || TARGET_M16C)
     return n;
Index: gcc/config/m68k/m68k-protos.h
===================================================================
--- gcc/config/m68k/m68k-protos.h	2017-09-04 11:49:42.908500725 +0100
+++ gcc/config/m68k/m68k-protos.h	2017-10-23 17:25:38.238865172 +0100
@@ -99,3 +99,4 @@ extern void init_68881_table (void);
 extern rtx m68k_legitimize_call_address (rtx);
 extern rtx m68k_legitimize_sibcall_address (rtx);
 extern int m68k_hard_regno_rename_ok(unsigned int, unsigned int);
+extern poly_int64 m68k_push_rounding (poly_int64);
Index: gcc/config/m68k/m68k.h
===================================================================
--- gcc/config/m68k/m68k.h	2017-10-23 11:41:23.045471107 +0100
+++ gcc/config/m68k/m68k.h	2017-10-23 17:25:38.239865136 +0100
@@ -469,9 +469,7 @@ #define STACK_GROWS_DOWNWARD 1
 #define FRAME_GROWS_DOWNWARD 1
 #define STARTING_FRAME_OFFSET 0
 
-/* On the 680x0, sp@- in a byte insn really pushes a word.
-   On the ColdFire, sp@- in a byte insn pushes just a byte.  */
-#define PUSH_ROUNDING(BYTES) (TARGET_COLDFIRE ? BYTES : ((BYTES) + 1) & ~1)
+#define PUSH_ROUNDING(BYTES) m68k_push_rounding (BYTES)
 
 #define FIRST_PARM_OFFSET(FNDECL) 8
 
Index: gcc/config/m68k/m68k.c
===================================================================
--- gcc/config/m68k/m68k.c	2017-10-23 17:19:01.404170211 +0100
+++ gcc/config/m68k/m68k.c	2017-10-23 17:25:38.238865172 +0100
@@ -6612,4 +6612,15 @@ m68k_excess_precision (enum excess_preci
   return FLT_EVAL_METHOD_UNPREDICTABLE;
 }
 
+/* Implement PUSH_ROUNDING.  On the 680x0, sp@- in a byte insn really pushes
+   a word.  On the ColdFire, sp@- in a byte insn pushes just a byte.  */
+
+poly_int64
+m68k_push_rounding (poly_int64 bytes)
+{
+  if (TARGET_COLDFIRE)
+    return bytes;
+  return (bytes + 1) & ~1;
+}
+
 #include "gt-m68k.h"
Index: gcc/config/pdp11/pdp11-protos.h
===================================================================
--- gcc/config/pdp11/pdp11-protos.h	2017-09-15 13:56:20.277148839 +0100
+++ gcc/config/pdp11/pdp11-protos.h	2017-10-23 17:25:38.239865136 +0100
@@ -44,3 +44,4 @@ extern void pdp11_asm_output_var (FILE *
 extern void pdp11_expand_prologue (void);
 extern void pdp11_expand_epilogue (void);
 extern int pdp11_branch_cost (void);
+extern poly_int64 pdp11_push_rounding (poly_int64);
Index: gcc/config/pdp11/pdp11.h
===================================================================
--- gcc/config/pdp11/pdp11.h	2017-10-23 11:41:23.044503870 +0100
+++ gcc/config/pdp11/pdp11.h	2017-10-23 17:25:38.239865136 +0100
@@ -263,10 +263,7 @@ #define FRAME_GROWS_DOWNWARD 1
    of the first local allocated.  */
 #define STARTING_FRAME_OFFSET 0
 
-/* If we generate an insn to push BYTES bytes,
-   this says how many the stack pointer really advances by.
-   On the pdp11, the stack is on an even boundary */
-#define PUSH_ROUNDING(BYTES) ((BYTES + 1) & ~1)
+#define PUSH_ROUNDING(BYTES) pdp11_push_rounding (BYTES)
 
 /* current_first_parm_offset stores the # of registers pushed on the 
    stack */
Index: gcc/config/pdp11/pdp11.c
===================================================================
--- gcc/config/pdp11/pdp11.c	2017-10-23 17:11:40.168799690 +0100
+++ gcc/config/pdp11/pdp11.c	2017-10-23 17:25:38.239865136 +0100
@@ -1977,4 +1977,13 @@ pdp11_modes_tieable_p (machine_mode, mac
   return false;
 }
 
+/* Implement PUSH_ROUNDING.  On the pdp11, the stack is on an even
+   boundary.  */
+
+poly_int64
+pdp11_push_rounding (poly_int64 bytes)
+{
+  return (bytes + 1) & ~1;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
Index: gcc/config/stormy16/stormy16-protos.h
===================================================================
--- gcc/config/stormy16/stormy16-protos.h	2017-02-23 19:54:23.000000000 +0000
+++ gcc/config/stormy16/stormy16-protos.h	2017-10-23 17:25:38.239865136 +0100
@@ -28,6 +28,7 @@ extern int direct_return (void);
 extern int xstormy16_interrupt_function_p (void);
 extern int xstormy16_epilogue_uses (int);
 extern void xstormy16_function_profiler (void);
+extern poly_int64 xstormy16_push_rounding (poly_int64);
 
 #if defined (TREE_CODE)
 extern void xstormy16_asm_output_aligned_common (FILE *, tree, const char *,
Index: gcc/config/stormy16/stormy16.h
===================================================================
--- gcc/config/stormy16/stormy16.h	2017-10-23 11:41:22.789153296 +0100
+++ gcc/config/stormy16/stormy16.h	2017-10-23 17:25:38.240865100 +0100
@@ -257,7 +257,7 @@ #define INITIAL_ELIMINATION_OFFSET(FROM,
 \f
 /* Passing Function Arguments on the Stack.  */
 
-#define PUSH_ROUNDING(BYTES) (((BYTES) + 1) & ~1)
+#define PUSH_ROUNDING(BYTES) xstormy16_push_rounding (BYTES)
 
 \f
 /* Function Arguments in Registers.  */
Index: gcc/config/stormy16/stormy16.c
===================================================================
--- gcc/config/stormy16/stormy16.c	2017-10-23 17:11:40.184830325 +0100
+++ gcc/config/stormy16/stormy16.c	2017-10-23 17:25:38.240865100 +0100
@@ -2635,6 +2635,14 @@ xstormy16_modes_tieable_p (machine_mode
 {
   return mode1 != BImode && mode2 != BImode;
 }
+
+/* Implement PUSH_ROUNDING.  */
+
+poly_int64
+xstormy16_push_rounding (poly_int64 bytes)
+{
+  return (bytes + 1) & ~1;
+}
 \f
 #undef  TARGET_ASM_ALIGNED_HI_OP
 #define TARGET_ASM_ALIGNED_HI_OP "\t.hword\t"
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:25:37.064907370 +0100
+++ gcc/expr.c	2017-10-23 17:25:38.241865064 +0100
@@ -3344,10 +3344,9 @@ emit_move_via_integer (machine_mode mode
 emit_move_resolve_push (machine_mode mode, rtx x)
 {
   enum rtx_code code = GET_CODE (XEXP (x, 0));
-  HOST_WIDE_INT adjust;
   rtx temp;
 
-  adjust = GET_MODE_SIZE (mode);
+  poly_int64 adjust = GET_MODE_SIZE (mode);
 #ifdef PUSH_ROUNDING
   adjust = PUSH_ROUNDING (adjust);
 #endif
@@ -3356,14 +3355,12 @@ emit_move_resolve_push (machine_mode mod
   else if (code == PRE_MODIFY || code == POST_MODIFY)
     {
       rtx expr = XEXP (XEXP (x, 0), 1);
-      HOST_WIDE_INT val;
 
       gcc_assert (GET_CODE (expr) == PLUS || GET_CODE (expr) == MINUS);
-      gcc_assert (CONST_INT_P (XEXP (expr, 1)));
-      val = INTVAL (XEXP (expr, 1));
+      poly_int64 val = rtx_to_poly_int64 (XEXP (expr, 1));
       if (GET_CODE (expr) == MINUS)
 	val = -val;
-      gcc_assert (adjust == val || adjust == -val);
+      gcc_assert (must_eq (adjust, val) || must_eq (adjust, -val));
       adjust = val;
     }
 
@@ -3405,11 +3402,11 @@ emit_move_complex_push (machine_mode mod
   bool imag_first;
 
 #ifdef PUSH_ROUNDING
-  unsigned int submodesize = GET_MODE_SIZE (submode);
+  poly_int64 submodesize = GET_MODE_SIZE (submode);
 
   /* In case we output to the stack, but the size is smaller than the
      machine can push exactly, we need to use move instructions.  */
-  if (PUSH_ROUNDING (submodesize) != submodesize)
+  if (may_ne (PUSH_ROUNDING (submodesize), submodesize))
     {
       x = emit_move_resolve_push (mode, x);
       return emit_move_insn (x, y);
@@ -4117,7 +4114,7 @@ fixup_args_size_notes (rtx_insn *prev, r
 emit_single_push_insn_1 (machine_mode mode, rtx x, tree type)
 {
   rtx dest_addr;
-  unsigned rounded_size = PUSH_ROUNDING (GET_MODE_SIZE (mode));
+  poly_int64 rounded_size = PUSH_ROUNDING (GET_MODE_SIZE (mode));
   rtx dest;
   enum insn_code icode;
 
@@ -4133,7 +4130,7 @@ emit_single_push_insn_1 (machine_mode mo
       if (maybe_expand_insn (icode, 1, ops))
 	return;
     }
-  if (GET_MODE_SIZE (mode) == rounded_size)
+  if (must_eq (GET_MODE_SIZE (mode), rounded_size))
     dest_addr = gen_rtx_fmt_e (STACK_PUSH_CODE, Pmode, stack_pointer_rtx);
   /* If we are to pad downward, adjust the stack pointer first and
      then store X into the stack location using an offset.  This is
@@ -4353,9 +4350,9 @@ emit_push_insn (rtx x, machine_mode mode
 	     and such small pushes do rounding that causes trouble.  */
 	  && ((!targetm.slow_unaligned_access (word_mode, align))
 	      || align >= BIGGEST_ALIGNMENT
-	      || (PUSH_ROUNDING (align / BITS_PER_UNIT)
-		  == (align / BITS_PER_UNIT)))
-	  && (HOST_WIDE_INT) PUSH_ROUNDING (INTVAL (size)) == INTVAL (size))
+	      || must_eq (PUSH_ROUNDING (align / BITS_PER_UNIT),
+			  align / BITS_PER_UNIT))
+	  && must_eq (PUSH_ROUNDING (INTVAL (size)), INTVAL (size)))
 	{
 	  /* Push padding now if padding above and stack grows down,
 	     or if padding below and stack grows up.
Index: gcc/lra-eliminations.c
===================================================================
--- gcc/lra-eliminations.c	2017-10-23 17:11:40.393228585 +0100
+++ gcc/lra-eliminations.c	2017-10-23 17:25:38.241865064 +0100
@@ -748,7 +748,7 @@ mark_not_eliminable (rtx x, machine_mode
 		  && XEXP (x, 0) == XEXP (XEXP (x, 1), 0)
 		  && poly_int_rtx_p (XEXP (XEXP (x, 1), 1), &offset))))
 	{
-	  int size = GET_MODE_SIZE (mem_mode);
+	  poly_int64 size = GET_MODE_SIZE (mem_mode);
 	  
 #ifdef PUSH_ROUNDING
 	  /* If more bytes than MEM_MODE are pushed, account for
Index: gcc/recog.c
===================================================================
--- gcc/recog.c	2017-10-23 17:18:57.860160878 +0100
+++ gcc/recog.c	2017-10-23 17:25:38.242865029 +0100
@@ -1258,33 +1258,35 @@ nonmemory_operand (rtx op, machine_mode
 int
 push_operand (rtx op, machine_mode mode)
 {
-  unsigned int rounded_size = GET_MODE_SIZE (mode);
-
-#ifdef PUSH_ROUNDING
-  rounded_size = PUSH_ROUNDING (rounded_size);
-#endif
-
   if (!MEM_P (op))
     return 0;
 
   if (mode != VOIDmode && GET_MODE (op) != mode)
     return 0;
 
+  poly_int64 rounded_size = GET_MODE_SIZE (mode);
+
+#ifdef PUSH_ROUNDING
+  rounded_size = PUSH_ROUNDING (MACRO_INT (rounded_size));
+#endif
+
   op = XEXP (op, 0);
 
-  if (rounded_size == GET_MODE_SIZE (mode))
+  if (must_eq (rounded_size, GET_MODE_SIZE (mode)))
     {
       if (GET_CODE (op) != STACK_PUSH_CODE)
 	return 0;
     }
   else
     {
+      poly_int64 offset;
       if (GET_CODE (op) != PRE_MODIFY
 	  || GET_CODE (XEXP (op, 1)) != PLUS
 	  || XEXP (XEXP (op, 1), 0) != XEXP (op, 0)
-	  || !CONST_INT_P (XEXP (XEXP (op, 1), 1))
-	  || INTVAL (XEXP (XEXP (op, 1), 1))
-	     != ((STACK_GROWS_DOWNWARD ? -1 : 1) * (int) rounded_size))
+	  || !poly_int_rtx_p (XEXP (XEXP (op, 1), 1), &offset)
+	  || (STACK_GROWS_DOWNWARD
+	      ? may_ne (offset, -rounded_size)
+	      : may_ne (offset, rounded_size)))
 	return 0;
     }
 
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2017-10-23 17:18:57.861160790 +0100
+++ gcc/reload1.c	2017-10-23 17:25:38.242865029 +0100
@@ -2996,7 +2996,7 @@ elimination_effects (rtx x, machine_mode
       for (ep = reg_eliminate; ep < &reg_eliminate[NUM_ELIMINABLE_REGS]; ep++)
 	if (ep->to_rtx == XEXP (x, 0))
 	  {
-	    int size = GET_MODE_SIZE (mem_mode);
+	    poly_int64 size = GET_MODE_SIZE (mem_mode);
 
 	    /* If more bytes than MEM_MODE are pushed, account for them.  */
 #ifdef PUSH_ROUNDING
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:25:32.610067499 +0100
+++ gcc/rtlanal.c	2017-10-23 17:25:38.243864993 +0100
@@ -4518,8 +4518,10 @@ nonzero_bits1 (const_rtx x, scalar_int_m
 	     stack to be momentarily aligned only to that amount,
 	     so we pick the least alignment.  */
 	  if (x == stack_pointer_rtx && PUSH_ARGS)
-	    alignment = MIN ((unsigned HOST_WIDE_INT) PUSH_ROUNDING (1),
-			     alignment);
+	    {
+	      poly_uint64 rounded_1 = PUSH_ROUNDING (poly_int64 (1));
+	      alignment = MIN (known_alignment (rounded_1), alignment);
+	    }
 #endif
 
 	  nonzero &= ~(alignment - 1);
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-10-23 17:19:01.395170091 +0100
+++ gcc/calls.c	2017-10-23 17:25:38.230865460 +0100
@@ -5449,7 +5449,6 @@ store_one_arg (struct arg_data *arg, rtx
     ;
   else if (arg->mode != BLKmode)
     {
-      int size;
       unsigned int parm_align;
 
       /* Argument is a scalar, not entirely passed in registers.
@@ -5462,7 +5461,7 @@ store_one_arg (struct arg_data *arg, rtx
 	 Note that in C the default argument promotions
 	 will prevent such mismatches.  */
 
-      size = GET_MODE_SIZE (arg->mode);
+      poly_int64 size = GET_MODE_SIZE (arg->mode);
       /* Compute how much space the push instruction will push.
 	 On many machines, pushing a byte will advance the stack
 	 pointer by a halfword.  */
@@ -5475,9 +5474,10 @@ store_one_arg (struct arg_data *arg, rtx
 	 round up to a multiple of the alignment for arguments.  */
       if (targetm.calls.function_arg_padding (arg->mode, TREE_TYPE (pval))
 	  != PAD_NONE)
-	used = (((size + PARM_BOUNDARY / BITS_PER_UNIT - 1)
-		 / (PARM_BOUNDARY / BITS_PER_UNIT))
-		* (PARM_BOUNDARY / BITS_PER_UNIT));
+	/* At the moment we don't (need to) support ABIs for which the
+	   padding isn't known at compile time.  In principle it should
+	   be easy to add though.  */
+	used = force_align_up (size, PARM_BOUNDARY / BITS_PER_UNIT);
 
       /* Compute the alignment of the pushed argument.  */
       parm_align = arg->locate.boundary;

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [091/nnn] poly_int: emit_single_push_insn_1
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (91 preceding siblings ...)
  2017-10-23 17:37 ` [093/nnn] poly_int: adjust_mems Richard Sandiford
@ 2017-10-23 17:37 ` Richard Sandiford
  2017-11-28  8:33   ` Jeff Law
  2017-10-23 17:38 ` [094/nnn] poly_int: expand_ifn_atomic_compare_exchange_into_call Richard Sandiford
                   ` (14 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:37 UTC (permalink / raw)
  To: gcc-patches

This patch makes emit_single_push_insn_1 cope with polynomial mode sizes.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expr.c (emit_single_push_insn_1): Treat mode sizes as polynomial.
	Use plus_constant instead of gen_rtx_PLUS.

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:25:35.142976454 +0100
+++ gcc/expr.c	2017-10-23 17:25:37.064907370 +0100
@@ -4141,9 +4141,6 @@ emit_single_push_insn_1 (machine_mode mo
      access to type.  */
   else if (targetm.calls.function_arg_padding (mode, type) == PAD_DOWNWARD)
     {
-      unsigned padding_size = rounded_size - GET_MODE_SIZE (mode);
-      HOST_WIDE_INT offset;
-
       emit_move_insn (stack_pointer_rtx,
 		      expand_binop (Pmode,
 				    STACK_GROWS_DOWNWARD ? sub_optab
@@ -4152,31 +4149,27 @@ emit_single_push_insn_1 (machine_mode mo
 				    gen_int_mode (rounded_size, Pmode),
 				    NULL_RTX, 0, OPTAB_LIB_WIDEN));
 
-      offset = (HOST_WIDE_INT) padding_size;
+      poly_int64 offset = rounded_size - GET_MODE_SIZE (mode);
       if (STACK_GROWS_DOWNWARD && STACK_PUSH_CODE == POST_DEC)
 	/* We have already decremented the stack pointer, so get the
 	   previous value.  */
-	offset += (HOST_WIDE_INT) rounded_size;
+	offset += rounded_size;
 
       if (!STACK_GROWS_DOWNWARD && STACK_PUSH_CODE == POST_INC)
 	/* We have already incremented the stack pointer, so get the
 	   previous value.  */
-	offset -= (HOST_WIDE_INT) rounded_size;
+	offset -= rounded_size;
 
-      dest_addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-				gen_int_mode (offset, Pmode));
+      dest_addr = plus_constant (Pmode, stack_pointer_rtx, offset);
     }
   else
     {
       if (STACK_GROWS_DOWNWARD)
 	/* ??? This seems wrong if STACK_PUSH_CODE == POST_DEC.  */
-	dest_addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-				  gen_int_mode (-(HOST_WIDE_INT) rounded_size,
-						Pmode));
+	dest_addr = plus_constant (Pmode, stack_pointer_rtx, -rounded_size);
       else
 	/* ??? This seems wrong if STACK_PUSH_CODE == POST_INC.  */
-	dest_addr = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
-				  gen_int_mode (rounded_size, Pmode));
+	dest_addr = plus_constant (Pmode, stack_pointer_rtx, rounded_size);
 
       dest_addr = gen_rtx_PRE_MODIFY (Pmode, stack_pointer_rtx, dest_addr);
     }

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [093/nnn] poly_int: adjust_mems
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (90 preceding siblings ...)
  2017-10-23 17:37 ` [092/nnn] poly_int: PUSH_ROUNDING Richard Sandiford
@ 2017-10-23 17:37 ` Richard Sandiford
  2017-11-28  8:32   ` Jeff Law
  2017-10-23 17:37 ` [091/nnn] poly_int: emit_single_push_insn_1 Richard Sandiford
                   ` (15 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:37 UTC (permalink / raw)
  To: gcc-patches

This patch makes the var-tracking.c handling of autoinc addresses
cope with polynomial mode sizes.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* var-tracking.c (adjust_mems): Treat mode sizes as polynomial.
	Use plus_constant instead of gen_rtx_PLUS.

Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c	2017-10-23 17:16:59.708267276 +0100
+++ gcc/var-tracking.c	2017-10-23 17:25:40.610779914 +0100
@@ -1016,6 +1016,7 @@ adjust_mems (rtx loc, const_rtx old_rtx,
   machine_mode mem_mode_save;
   bool store_save;
   scalar_int_mode tem_mode, tem_subreg_mode;
+  poly_int64 size;
   switch (GET_CODE (loc))
     {
     case REG:
@@ -1060,11 +1061,9 @@ adjust_mems (rtx loc, const_rtx old_rtx,
       return mem;
     case PRE_INC:
     case PRE_DEC:
-      addr = gen_rtx_PLUS (GET_MODE (loc), XEXP (loc, 0),
-			   gen_int_mode (GET_CODE (loc) == PRE_INC
-					 ? GET_MODE_SIZE (amd->mem_mode)
-					 : -GET_MODE_SIZE (amd->mem_mode),
-					 GET_MODE (loc)));
+      size = GET_MODE_SIZE (amd->mem_mode);
+      addr = plus_constant (GET_MODE (loc), XEXP (loc, 0),
+			    GET_CODE (loc) == PRE_INC ? size : -size);
       /* FALLTHRU */
     case POST_INC:
     case POST_DEC:
@@ -1072,12 +1071,10 @@ adjust_mems (rtx loc, const_rtx old_rtx,
 	addr = XEXP (loc, 0);
       gcc_assert (amd->mem_mode != VOIDmode && amd->mem_mode != BLKmode);
       addr = simplify_replace_fn_rtx (addr, old_rtx, adjust_mems, data);
-      tem = gen_rtx_PLUS (GET_MODE (loc), XEXP (loc, 0),
-			  gen_int_mode ((GET_CODE (loc) == PRE_INC
-					 || GET_CODE (loc) == POST_INC)
-					? GET_MODE_SIZE (amd->mem_mode)
-					: -GET_MODE_SIZE (amd->mem_mode),
-					GET_MODE (loc)));
+      size = GET_MODE_SIZE (amd->mem_mode);
+      tem = plus_constant (GET_MODE (loc), XEXP (loc, 0),
+			   (GET_CODE (loc) == PRE_INC
+			    || GET_CODE (loc) == POST_INC) ? size : -size);
       store_save = amd->store;
       amd->store = false;
       tem = simplify_replace_fn_rtx (tem, old_rtx, adjust_mems, data);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [094/nnn] poly_int: expand_ifn_atomic_compare_exchange_into_call
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (92 preceding siblings ...)
  2017-10-23 17:37 ` [091/nnn] poly_int: emit_single_push_insn_1 Richard Sandiford
@ 2017-10-23 17:38 ` Richard Sandiford
  2017-11-28  8:31   ` Jeff Law
  2017-10-23 17:39 ` [095/nnn] poly_int: process_alt_operands Richard Sandiford
                   ` (13 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:38 UTC (permalink / raw)
  To: gcc-patches

This patch makes the mode size assumptions in
expand_ifn_atomic_compare_exchange_into_call a bit more
explicit, so that a later patch can add a to_constant () call.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Assert
	that the mode size is in the set {1, 2, 4, 8, 16}.

Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	2017-10-23 17:22:18.226824652 +0100
+++ gcc/builtins.c	2017-10-23 17:25:41.647742640 +0100
@@ -5838,9 +5838,12 @@ expand_ifn_atomic_compare_exchange_into_
   /* Skip the boolean weak parameter.  */
   for (z = 4; z < 6; z++)
     vec->quick_push (gimple_call_arg (call, z));
+  /* At present we only have BUILT_IN_ATOMIC_COMPARE_EXCHANGE_{1,2,4,8,16}.  */
+  unsigned int bytes_log2 = exact_log2 (GET_MODE_SIZE (mode));
+  gcc_assert (bytes_log2 < 5);
   built_in_function fncode
     = (built_in_function) ((int) BUILT_IN_ATOMIC_COMPARE_EXCHANGE_1
-			   + exact_log2 (GET_MODE_SIZE (mode)));
+			   + bytes_log2);
   tree fndecl = builtin_decl_explicit (fncode);
   tree fn = build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fndecl)),
 		    fndecl);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [095/nnn] poly_int: process_alt_operands
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (93 preceding siblings ...)
  2017-10-23 17:38 ` [094/nnn] poly_int: expand_ifn_atomic_compare_exchange_into_call Richard Sandiford
@ 2017-10-23 17:39 ` Richard Sandiford
  2017-11-28  8:14   ` Jeff Law
  2017-10-23 17:39 ` [096/nnn] poly_int: reloading complex subregs Richard Sandiford
                   ` (12 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:39 UTC (permalink / raw)
  To: gcc-patches

This patch makes process_alt_operands check that the mode sizes
are ordered, so that match_reload can validly treat them as subregs
of one another.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* lra-constraints.c (process_alt_operands): Reject matched
	operands whose sizes aren't ordered.
	(match_reload): Refer to this check here.

Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c	2017-10-23 17:20:47.003797985 +0100
+++ gcc/lra-constraints.c	2017-10-23 17:25:42.597708494 +0100
@@ -933,6 +933,8 @@ match_reload (signed char out, signed ch
   push_to_sequence (*before);
   if (inmode != outmode)
     {
+      /* process_alt_operands has already checked that the mode sizes
+	 are ordered.  */
       if (partial_subreg_p (outmode, inmode))
 	{
 	  reg = new_in_reg
@@ -2112,6 +2114,13 @@ process_alt_operands (int only_alternati
 		    len = 0;
 		    lra_assert (nop > m);
 
+		    /* Reject matches if we don't know which operand is
+		       bigger.  This situation would arguably be a bug in
+		       an .md pattern, but could also occur in a user asm.  */
+		    if (!ordered_p (GET_MODE_SIZE (biggest_mode[m]),
+				    GET_MODE_SIZE (biggest_mode[nop])))
+		      break;
+
 		    this_alternative_matches = m;
 		    m_hregno = get_hard_regno (*curr_id->operand_loc[m], false);
 		    /* We are supposed to match a previous operand.

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [096/nnn] poly_int: reloading complex subregs
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (94 preceding siblings ...)
  2017-10-23 17:39 ` [095/nnn] poly_int: process_alt_operands Richard Sandiford
@ 2017-10-23 17:39 ` Richard Sandiford
  2017-11-28  8:09   ` Jeff Law
  2017-10-23 17:40 ` [097/nnn] poly_int: alter_reg Richard Sandiford
                   ` (11 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:39 UTC (permalink / raw)
  To: gcc-patches

This patch splits out a condition that is common to both push_reload
and reload_inner_reg_of_subreg.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* reload.c (complex_word_subreg_p): New function.
	(reload_inner_reg_of_subreg, push_reload): Use it.

Index: gcc/reload.c
===================================================================
--- gcc/reload.c	2017-10-23 17:18:51.485721234 +0100
+++ gcc/reload.c	2017-10-23 17:25:43.543674491 +0100
@@ -811,6 +811,23 @@ find_reusable_reload (rtx *p_in, rtx out
   return n_reloads;
 }
 
+/* Return true if:
+
+   (a) (subreg:OUTER_MODE REG ...) represents a word or subword subreg
+       of a multiword value; and
+
+   (b) the number of *words* in REG does not match the number of *registers*
+       in REG.  */
+
+static bool
+complex_word_subreg_p (machine_mode outer_mode, rtx reg)
+{
+  machine_mode inner_mode = GET_MODE (reg);
+  return (GET_MODE_SIZE (outer_mode) <= UNITS_PER_WORD
+	  && GET_MODE_SIZE (inner_mode) > UNITS_PER_WORD
+	  && GET_MODE_SIZE (inner_mode) / UNITS_PER_WORD != REG_NREGS (reg));
+}
+
 /* Return true if X is a SUBREG that will need reloading of its SUBREG_REG
    expression.  MODE is the mode that X will be used in.  OUTPUT is true if
    the function is invoked for the output part of an enclosing reload.  */
@@ -842,11 +859,7 @@ reload_inner_reg_of_subreg (rtx x, machi
      INNER is larger than a word and the number of registers in INNER is
      not the same as the number of words in INNER, then INNER will need
      reloading (with an in-out reload).  */
-  return (output
-	  && GET_MODE_SIZE (mode) <= UNITS_PER_WORD
-	  && GET_MODE_SIZE (GET_MODE (inner)) > UNITS_PER_WORD
-	  && ((GET_MODE_SIZE (GET_MODE (inner)) / UNITS_PER_WORD)
-	      != REG_NREGS (inner)));
+  return output && complex_word_subreg_p (mode, inner);
 }
 
 /* Return nonzero if IN can be reloaded into REGNO with mode MODE without
@@ -1064,12 +1077,7 @@ push_reload (rtx in, rtx out, rtx *inloc
 	      /* The case where out is nonzero
 		 is handled differently in the following statement.  */
 	      && (out == 0 || subreg_lowpart_p (in))
-	      && ((GET_MODE_SIZE (inmode) <= UNITS_PER_WORD
-		   && (GET_MODE_SIZE (GET_MODE (SUBREG_REG (in)))
-		       > UNITS_PER_WORD)
-		   && ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (in)))
-			/ UNITS_PER_WORD)
-		       != REG_NREGS (SUBREG_REG (in))))
+	      && (complex_word_subreg_p (inmode, SUBREG_REG (in))
 		  || !targetm.hard_regno_mode_ok (subreg_regno (in), inmode)))
 	  || (secondary_reload_class (1, rclass, inmode, in) != NO_REGS
 	      && (secondary_reload_class (1, rclass, GET_MODE (SUBREG_REG (in)),

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [098/nnn] poly_int: load_register_parameters
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (97 preceding siblings ...)
  2017-10-23 17:40 ` [099/nnn] poly_int: struct_value_size Richard Sandiford
@ 2017-10-23 17:40 ` Richard Sandiford
  2017-11-28  8:08   ` Jeff Law
  2017-10-23 17:41 ` [100/nnn] poly_int: memrefs_conflict_p Richard Sandiford
                   ` (8 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:40 UTC (permalink / raw)
  To: gcc-patches

This patch makes load_register_parameters cope with polynomial sizes.
The requirement here is that any register parameters with non-constant
sizes must either have a specific mode (e.g. a variable-length vector
mode) or must be represented with a PARALLEL.  This is in practice
already a requirement for parameters passed in vector registers,
since the default behaviour of splitting parameters into words doesn't
make sense for them.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* calls.c (load_register_parameters): Cope with polynomial
	mode sizes.  Require a constant size for BLKmode parameters
	that aren't described by a PARALLEL.  If BLOCK_REG_PADDING
	forces a parameter to be padded at the lsb end in order to
	fill a complete number of words, require the parameter size
	to be ordered wrt UNITS_PER_WORD.

Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-10-23 17:25:38.230865460 +0100
+++ gcc/calls.c	2017-10-23 17:25:45.501604113 +0100
@@ -2520,7 +2520,8 @@ load_register_parameters (struct arg_dat
 	{
 	  int partial = args[i].partial;
 	  int nregs;
-	  int size = 0;
+	  poly_int64 size = 0;
+	  HOST_WIDE_INT const_size = 0;
 	  rtx_insn *before_arg = get_last_insn ();
 	  /* Set non-negative if we must move a word at a time, even if
 	     just one word (e.g, partial == 4 && mode == DFmode).  Set
@@ -2536,8 +2537,12 @@ load_register_parameters (struct arg_dat
 	    }
 	  else if (TYPE_MODE (TREE_TYPE (args[i].tree_value)) == BLKmode)
 	    {
-	      size = int_size_in_bytes (TREE_TYPE (args[i].tree_value));
-	      nregs = (size + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD;
+	      /* Variable-sized parameters should be described by a
+		 PARALLEL instead.  */
+	      const_size = int_size_in_bytes (TREE_TYPE (args[i].tree_value));
+	      gcc_assert (const_size >= 0);
+	      nregs = (const_size + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD;
+	      size = const_size;
 	    }
 	  else
 	    size = GET_MODE_SIZE (args[i].mode);
@@ -2559,21 +2564,27 @@ load_register_parameters (struct arg_dat
 	      /* Handle case where we have a value that needs shifting
 		 up to the msb.  eg. a QImode value and we're padding
 		 upward on a BYTES_BIG_ENDIAN machine.  */
-	      if (size < UNITS_PER_WORD
-		  && (args[i].locate.where_pad
-		      == (BYTES_BIG_ENDIAN ? PAD_UPWARD : PAD_DOWNWARD)))
+	      if (args[i].locate.where_pad
+		  == (BYTES_BIG_ENDIAN ? PAD_UPWARD : PAD_DOWNWARD))
 		{
-		  rtx x;
-		  int shift = (UNITS_PER_WORD - size) * BITS_PER_UNIT;
-
-		  /* Assigning REG here rather than a temp makes CALL_FUSAGE
-		     report the whole reg as used.  Strictly speaking, the
-		     call only uses SIZE bytes at the msb end, but it doesn't
-		     seem worth generating rtl to say that.  */
-		  reg = gen_rtx_REG (word_mode, REGNO (reg));
-		  x = expand_shift (LSHIFT_EXPR, word_mode, reg, shift, reg, 1);
-		  if (x != reg)
-		    emit_move_insn (reg, x);
+		  gcc_checking_assert (ordered_p (size, UNITS_PER_WORD));
+		  if (may_lt (size, UNITS_PER_WORD))
+		    {
+		      rtx x;
+		      poly_int64 shift
+			= (UNITS_PER_WORD - size) * BITS_PER_UNIT;
+
+		      /* Assigning REG here rather than a temp makes
+			 CALL_FUSAGE report the whole reg as used.
+			 Strictly speaking, the call only uses SIZE
+			 bytes at the msb end, but it doesn't seem worth
+			 generating rtl to say that.  */
+		      reg = gen_rtx_REG (word_mode, REGNO (reg));
+		      x = expand_shift (LSHIFT_EXPR, word_mode,
+					reg, shift, reg, 1);
+		      if (x != reg)
+			emit_move_insn (reg, x);
+		    }
 		}
 #endif
 	    }
@@ -2588,17 +2599,20 @@ load_register_parameters (struct arg_dat
 
 	  else if (partial == 0 || args[i].pass_on_stack)
 	    {
+	      /* SIZE and CONST_SIZE are 0 for partial arguments and
+		 the size of a BLKmode type otherwise.  */
+	      gcc_checking_assert (must_eq (size, const_size));
 	      rtx mem = validize_mem (copy_rtx (args[i].value));
 
 	      /* Check for overlap with already clobbered argument area,
 	         providing that this has non-zero size.  */
 	      if (is_sibcall
-		  && size != 0
+		  && const_size != 0
 		  && (mem_might_overlap_already_clobbered_arg_p
-		      (XEXP (args[i].value, 0), size)))
+		      (XEXP (args[i].value, 0), const_size)))
 		*sibcall_failure = 1;
 
-	      if (size % UNITS_PER_WORD == 0
+	      if (const_size % UNITS_PER_WORD == 0
 		  || MEM_ALIGN (mem) % BITS_PER_WORD == 0)
 		move_block_to_reg (REGNO (reg), mem, nregs, args[i].mode);
 	      else
@@ -2608,7 +2622,7 @@ load_register_parameters (struct arg_dat
 				       args[i].mode);
 		  rtx dest = gen_rtx_REG (word_mode, REGNO (reg) + nregs - 1);
 		  unsigned int bitoff = (nregs - 1) * BITS_PER_WORD;
-		  unsigned int bitsize = size * BITS_PER_UNIT - bitoff;
+		  unsigned int bitsize = const_size * BITS_PER_UNIT - bitoff;
 		  rtx x = extract_bit_field (mem, bitsize, bitoff, 1, dest,
 					     word_mode, word_mode, false,
 					     NULL);
@@ -2620,7 +2634,7 @@ load_register_parameters (struct arg_dat
 		}
 
 	      /* Handle a BLKmode that needs shifting.  */
-	      if (nregs == 1 && size < UNITS_PER_WORD
+	      if (nregs == 1 && const_size < UNITS_PER_WORD
 #ifdef BLOCK_REG_PADDING
 		  && args[i].locate.where_pad == PAD_DOWNWARD
 #else
@@ -2629,7 +2643,7 @@ load_register_parameters (struct arg_dat
 		  )
 		{
 		  rtx dest = gen_rtx_REG (word_mode, REGNO (reg));
-		  int shift = (UNITS_PER_WORD - size) * BITS_PER_UNIT;
+		  int shift = (UNITS_PER_WORD - const_size) * BITS_PER_UNIT;
 		  enum tree_code dir = (BYTES_BIG_ENDIAN
 					? RSHIFT_EXPR : LSHIFT_EXPR);
 		  rtx x;

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [097/nnn] poly_int: alter_reg
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (95 preceding siblings ...)
  2017-10-23 17:39 ` [096/nnn] poly_int: reloading complex subregs Richard Sandiford
@ 2017-10-23 17:40 ` Richard Sandiford
  2017-11-28  8:08   ` Jeff Law
  2017-10-23 17:40 ` [099/nnn] poly_int: struct_value_size Richard Sandiford
                   ` (10 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:40 UTC (permalink / raw)
  To: gcc-patches

This patch makes alter_reg cope with polynomial mode sizes.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* reload1.c (spill_stack_slot_width): Change element type
	from unsigned int to poly_uint64_pod.
	(alter_reg): Treat mode sizes as polynomial.

Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2017-10-23 17:25:38.242865029 +0100
+++ gcc/reload1.c	2017-10-23 17:25:44.492640380 +0100
@@ -200,7 +200,7 @@ #define spill_indirect_levels			\
 static rtx spill_stack_slot[FIRST_PSEUDO_REGISTER];
 
 /* Width allocated so far for that stack slot.  */
-static unsigned int spill_stack_slot_width[FIRST_PSEUDO_REGISTER];
+static poly_uint64_pod spill_stack_slot_width[FIRST_PSEUDO_REGISTER];
 
 /* Record which pseudos needed to be spilled.  */
 static regset_head spilled_pseudos;
@@ -2142,10 +2142,10 @@ alter_reg (int i, int from_reg, bool don
     {
       rtx x = NULL_RTX;
       machine_mode mode = GET_MODE (regno_reg_rtx[i]);
-      unsigned int inherent_size = PSEUDO_REGNO_BYTES (i);
+      poly_uint64 inherent_size = GET_MODE_SIZE (mode);
       unsigned int inherent_align = GET_MODE_ALIGNMENT (mode);
       machine_mode wider_mode = wider_subreg_mode (mode, reg_max_ref_mode[i]);
-      unsigned int total_size = GET_MODE_SIZE (wider_mode);
+      poly_uint64 total_size = GET_MODE_SIZE (wider_mode);
       unsigned int min_align = GET_MODE_BITSIZE (reg_max_ref_mode[i]);
       poly_int64 adjust = 0;
 
@@ -2174,10 +2174,15 @@ alter_reg (int i, int from_reg, bool don
 	{
 	  rtx stack_slot;
 
+	  /* The sizes are taken from a subreg operation, which guarantees
+	     that they're ordered.  */
+	  gcc_checking_assert (ordered_p (total_size, inherent_size));
+
 	  /* No known place to spill from => no slot to reuse.  */
 	  x = assign_stack_local (mode, total_size,
 				  min_align > inherent_align
-				  || total_size > inherent_size ? -1 : 0);
+				  || may_gt (total_size, inherent_size)
+				  ? -1 : 0);
 
 	  stack_slot = x;
 
@@ -2189,7 +2194,7 @@ alter_reg (int i, int from_reg, bool don
 	      adjust = inherent_size - total_size;
 	      if (maybe_nonzero (adjust))
 		{
-		  unsigned int total_bits = total_size * BITS_PER_UNIT;
+		  poly_uint64 total_bits = total_size * BITS_PER_UNIT;
 		  machine_mode mem_mode
 		    = int_mode_for_size (total_bits, 1).else_blk ();
 		  stack_slot = adjust_address_nv (x, mem_mode, adjust);
@@ -2203,9 +2208,10 @@ alter_reg (int i, int from_reg, bool don
 
       /* Reuse a stack slot if possible.  */
       else if (spill_stack_slot[from_reg] != 0
-	       && spill_stack_slot_width[from_reg] >= total_size
-	       && (GET_MODE_SIZE (GET_MODE (spill_stack_slot[from_reg]))
-		   >= inherent_size)
+	       && must_ge (spill_stack_slot_width[from_reg], total_size)
+	       && must_ge (GET_MODE_SIZE
+			   (GET_MODE (spill_stack_slot[from_reg])),
+			   inherent_size)
 	       && MEM_ALIGN (spill_stack_slot[from_reg]) >= min_align)
 	x = spill_stack_slot[from_reg];
 
@@ -2221,16 +2227,21 @@ alter_reg (int i, int from_reg, bool don
 	      if (partial_subreg_p (mode,
 				    GET_MODE (spill_stack_slot[from_reg])))
 		mode = GET_MODE (spill_stack_slot[from_reg]);
-	      if (spill_stack_slot_width[from_reg] > total_size)
-		total_size = spill_stack_slot_width[from_reg];
+	      total_size = ordered_max (total_size,
+					spill_stack_slot_width[from_reg]);
 	      if (MEM_ALIGN (spill_stack_slot[from_reg]) > min_align)
 		min_align = MEM_ALIGN (spill_stack_slot[from_reg]);
 	    }
 
+	  /* The sizes are taken from a subreg operation, which guarantees
+	     that they're ordered.  */
+	  gcc_checking_assert (ordered_p (total_size, inherent_size));
+
 	  /* Make a slot with that size.  */
 	  x = assign_stack_local (mode, total_size,
 				  min_align > inherent_align
-				  || total_size > inherent_size ? -1 : 0);
+				  || may_gt (total_size, inherent_size)
+				  ? -1 : 0);
 	  stack_slot = x;
 
 	  /* Cancel the  big-endian correction done in assign_stack_local.
@@ -2241,7 +2252,7 @@ alter_reg (int i, int from_reg, bool don
 	      adjust = GET_MODE_SIZE (mode) - total_size;
 	      if (maybe_nonzero (adjust))
 		{
-		  unsigned int total_bits = total_size * BITS_PER_UNIT;
+		  poly_uint64 total_bits = total_size * BITS_PER_UNIT;
 		  machine_mode mem_mode
 		    = int_mode_for_size (total_bits, 1).else_blk ();
 		  stack_slot = adjust_address_nv (x, mem_mode, adjust);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [099/nnn] poly_int: struct_value_size
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (96 preceding siblings ...)
  2017-10-23 17:40 ` [097/nnn] poly_int: alter_reg Richard Sandiford
@ 2017-10-23 17:40 ` Richard Sandiford
  2017-11-21  8:14   ` Jeff Law
  2017-10-23 17:40 ` [098/nnn] poly_int: load_register_parameters Richard Sandiford
                   ` (9 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:40 UTC (permalink / raw)
  To: gcc-patches

This patch makes calls.c treat struct_value_size (one of the
operands to a call pattern) as polynomial.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* calls.c (emit_call_1, expand_call): Change struct_value_size from
	a HOST_WIDE_INT to a poly_int64.

Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-10-23 17:25:45.501604113 +0100
+++ gcc/calls.c	2017-10-23 17:25:46.488568637 +0100
@@ -377,7 +377,7 @@ emit_call_1 (rtx funexp, tree fntree ATT
 	     tree funtype ATTRIBUTE_UNUSED,
 	     poly_int64 stack_size ATTRIBUTE_UNUSED,
 	     poly_int64 rounded_stack_size,
-	     HOST_WIDE_INT struct_value_size ATTRIBUTE_UNUSED,
+	     poly_int64 struct_value_size ATTRIBUTE_UNUSED,
 	     rtx next_arg_reg ATTRIBUTE_UNUSED, rtx valreg,
 	     int old_inhibit_defer_pop, rtx call_fusage, int ecf_flags,
 	     cumulative_args_t args_so_far ATTRIBUTE_UNUSED)
@@ -437,7 +437,8 @@ emit_call_1 (rtx funexp, tree fntree ATT
 					 next_arg_reg, NULL_RTX);
       else
 	pat = targetm.gen_sibcall (funmem, rounded_stack_size_rtx,
-				   next_arg_reg, GEN_INT (struct_value_size));
+				   next_arg_reg,
+				   gen_int_mode (struct_value_size, Pmode));
     }
   /* If the target has "call" or "call_value" insns, then prefer them
      if no arguments are actually popped.  If the target does not have
@@ -470,7 +471,7 @@ emit_call_1 (rtx funexp, tree fntree ATT
 				      next_arg_reg, NULL_RTX);
       else
 	pat = targetm.gen_call (funmem, rounded_stack_size_rtx, next_arg_reg,
-				GEN_INT (struct_value_size));
+				gen_int_mode (struct_value_size, Pmode));
     }
   emit_insn (pat);
 
@@ -3048,7 +3049,7 @@ expand_call (tree exp, rtx target, int i
   /* Size of aggregate value wanted, or zero if none wanted
      or if we are using the non-reentrant PCC calling convention
      or expecting the value in registers.  */
-  HOST_WIDE_INT struct_value_size = 0;
+  poly_int64 struct_value_size = 0;
   /* Nonzero if called function returns an aggregate in memory PCC style,
      by returning the address of where to find it.  */
   int pcc_struct_value = 0;
@@ -3210,7 +3211,8 @@ expand_call (tree exp, rtx target, int i
       }
 #else /* not PCC_STATIC_STRUCT_RETURN */
       {
-	struct_value_size = int_size_in_bytes (rettype);
+	if (!poly_int_tree_p (TYPE_SIZE_UNIT (rettype), &struct_value_size))
+	  struct_value_size = -1;
 
 	/* Even if it is semantically safe to use the target as the return
 	   slot, it may be not sufficiently aligned for the return type.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [100/nnn] poly_int: memrefs_conflict_p
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (98 preceding siblings ...)
  2017-10-23 17:40 ` [098/nnn] poly_int: load_register_parameters Richard Sandiford
@ 2017-10-23 17:41 ` Richard Sandiford
  2017-12-05 23:29   ` Jeff Law
  2017-10-23 17:41 ` [101/nnn] poly_int: GET_MODE_NUNITS Richard Sandiford
                   ` (7 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:41 UTC (permalink / raw)
  To: gcc-patches

The xsize and ysize arguments to memrefs_conflict_p are encode such
that:

- 0 means the size is unknown
- >0 means the size is known
- <0 means that the negative of the size is a worst-case size after
  alignment

In other words, the sign effectively encodes a boolean; it isn't
meant to be taken literally.  With poly_ints these correspond to:

- known_zero (...)
- may_gt (..., 0)
- may_lt (..., 0)

respectively.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* alias.c (addr_side_effect_eval): Take the size as a poly_int64
	rather than an int.  Use plus_constant.
	(memrefs_conflict_p): Take the sizes as poly_int64s rather than ints.
	Take the offset "c" as a poly_int64 rather than a HOST_WIDE_INT.

Index: gcc/alias.c
===================================================================
--- gcc/alias.c	2017-10-23 17:16:50.356530167 +0100
+++ gcc/alias.c	2017-10-23 17:25:47.476533124 +0100
@@ -148,7 +148,6 @@ struct GTY(()) alias_set_entry {
 };
 
 static int rtx_equal_for_memref_p (const_rtx, const_rtx);
-static int memrefs_conflict_p (int, rtx, int, rtx, HOST_WIDE_INT);
 static void record_set (rtx, const_rtx, void *);
 static int base_alias_check (rtx, rtx, rtx, rtx, machine_mode,
 			     machine_mode);
@@ -2295,9 +2294,9 @@ get_addr (rtx x)
     is not modified by the memory reference then ADDR is returned.  */
 
 static rtx
-addr_side_effect_eval (rtx addr, int size, int n_refs)
+addr_side_effect_eval (rtx addr, poly_int64 size, int n_refs)
 {
-  int offset = 0;
+  poly_int64 offset = 0;
 
   switch (GET_CODE (addr))
     {
@@ -2318,11 +2317,7 @@ addr_side_effect_eval (rtx addr, int siz
       return addr;
     }
 
-  if (offset)
-    addr = gen_rtx_PLUS (GET_MODE (addr), XEXP (addr, 0),
-			 gen_int_mode (offset, GET_MODE (addr)));
-  else
-    addr = XEXP (addr, 0);
+  addr = plus_constant (GET_MODE (addr), XEXP (addr, 0), offset);
   addr = canon_rtx (addr);
 
   return addr;
@@ -2372,7 +2367,8 @@ offset_overlap_p (poly_int64 c, poly_int
    If that is fixed the TBAA hack for union type-punning can be removed.  */
 
 static int
-memrefs_conflict_p (int xsize, rtx x, int ysize, rtx y, HOST_WIDE_INT c)
+memrefs_conflict_p (poly_int64 xsize, rtx x, poly_int64 ysize, rtx y,
+		    poly_int64 c)
 {
   if (GET_CODE (x) == VALUE)
     {
@@ -2417,13 +2413,13 @@ memrefs_conflict_p (int xsize, rtx x, in
   else if (GET_CODE (x) == LO_SUM)
     x = XEXP (x, 1);
   else
-    x = addr_side_effect_eval (x, abs (xsize), 0);
+    x = addr_side_effect_eval (x, may_lt (xsize, 0) ? -xsize : xsize, 0);
   if (GET_CODE (y) == HIGH)
     y = XEXP (y, 0);
   else if (GET_CODE (y) == LO_SUM)
     y = XEXP (y, 1);
   else
-    y = addr_side_effect_eval (y, abs (ysize), 0);
+    y = addr_side_effect_eval (y, may_lt (ysize, 0) ? -ysize : ysize, 0);
 
   if (GET_CODE (x) == SYMBOL_REF && GET_CODE (y) == SYMBOL_REF)
     {
@@ -2436,7 +2432,7 @@ memrefs_conflict_p (int xsize, rtx x, in
 	 through alignment adjustments (i.e., that have negative
 	 sizes), because we can't know how far they are from each
 	 other.  */
-      if (xsize < 0 || ysize < 0)
+      if (may_lt (xsize, 0) || may_lt (ysize, 0))
 	return -1;
       /* If decls are different or we know by offsets that there is no overlap,
 	 we win.  */
@@ -2467,6 +2463,7 @@ memrefs_conflict_p (int xsize, rtx x, in
       else if (x1 == y)
 	return memrefs_conflict_p (xsize, x0, ysize, const0_rtx, c);
 
+      poly_int64 cx1, cy1;
       if (GET_CODE (y) == PLUS)
 	{
 	  /* The fact that Y is canonicalized means that this
@@ -2483,22 +2480,21 @@ memrefs_conflict_p (int xsize, rtx x, in
 	    return memrefs_conflict_p (xsize, x0, ysize, y0, c);
 	  if (rtx_equal_for_memref_p (x0, y0))
 	    return memrefs_conflict_p (xsize, x1, ysize, y1, c);
-	  if (CONST_INT_P (x1))
+	  if (poly_int_rtx_p (x1, &cx1))
 	    {
-	      if (CONST_INT_P (y1))
+	      if (poly_int_rtx_p (y1, &cy1))
 		return memrefs_conflict_p (xsize, x0, ysize, y0,
-					   c - INTVAL (x1) + INTVAL (y1));
+					   c - cx1 + cy1);
 	      else
-		return memrefs_conflict_p (xsize, x0, ysize, y,
-					   c - INTVAL (x1));
+		return memrefs_conflict_p (xsize, x0, ysize, y, c - cx1);
 	    }
-	  else if (CONST_INT_P (y1))
-	    return memrefs_conflict_p (xsize, x, ysize, y0, c + INTVAL (y1));
+	  else if (poly_int_rtx_p (y1, &cy1))
+	    return memrefs_conflict_p (xsize, x, ysize, y0, c + cy1);
 
 	  return -1;
 	}
-      else if (CONST_INT_P (x1))
-	return memrefs_conflict_p (xsize, x0, ysize, y, c - INTVAL (x1));
+      else if (poly_int_rtx_p (x1, &cx1))
+	return memrefs_conflict_p (xsize, x0, ysize, y, c - cx1);
     }
   else if (GET_CODE (y) == PLUS)
     {
@@ -2512,8 +2508,9 @@ memrefs_conflict_p (int xsize, rtx x, in
       if (x == y1)
 	return memrefs_conflict_p (xsize, const0_rtx, ysize, y0, c);
 
-      if (CONST_INT_P (y1))
-	return memrefs_conflict_p (xsize, x, ysize, y0, c + INTVAL (y1));
+      poly_int64 cy1;
+      if (poly_int_rtx_p (y1, &cy1))
+	return memrefs_conflict_p (xsize, x, ysize, y0, c + cy1);
       else
 	return -1;
     }
@@ -2537,11 +2534,11 @@ memrefs_conflict_p (int xsize, rtx x, in
 	    return offset_overlap_p (c, xsize, ysize);
 
 	  /* Can't properly adjust our sizes.  */
-	  if (!CONST_INT_P (x1))
+	  if (!CONST_INT_P (x1)
+	      || !can_div_trunc_p (xsize, INTVAL (x1), &xsize)
+	      || !can_div_trunc_p (ysize, INTVAL (x1), &ysize)
+	      || !can_div_trunc_p (c, INTVAL (x1), &c))
 	    return -1;
-	  xsize /= INTVAL (x1);
-	  ysize /= INTVAL (x1);
-	  c /= INTVAL (x1);
 	  return memrefs_conflict_p (xsize, x0, ysize, y0, c);
 	}
 
@@ -2562,9 +2559,9 @@ memrefs_conflict_p (int xsize, rtx x, in
       unsigned HOST_WIDE_INT uc = sc;
       if (sc < 0 && pow2_or_zerop (-uc))
 	{
-	  if (xsize > 0)
+	  if (may_gt (xsize, 0))
 	    xsize = -xsize;
-	  if (xsize)
+	  if (maybe_nonzero (xsize))
 	    xsize += sc + 1;
 	  c -= sc + 1;
 	  return memrefs_conflict_p (xsize, canon_rtx (XEXP (x, 0)),
@@ -2577,9 +2574,9 @@ memrefs_conflict_p (int xsize, rtx x, in
       unsigned HOST_WIDE_INT uc = sc;
       if (sc < 0 && pow2_or_zerop (-uc))
 	{
-	  if (ysize > 0)
+	  if (may_gt (ysize, 0))
 	    ysize = -ysize;
-	  if (ysize)
+	  if (maybe_nonzero (ysize))
 	    ysize += sc + 1;
 	  c += sc + 1;
 	  return memrefs_conflict_p (xsize, x,
@@ -2589,9 +2586,10 @@ memrefs_conflict_p (int xsize, rtx x, in
 
   if (CONSTANT_P (x))
     {
-      if (CONST_INT_P (x) && CONST_INT_P (y))
+      poly_int64 cx, cy;
+      if (poly_int_rtx_p (x, &cx) && poly_int_rtx_p (y, &cy))
 	{
-	  c += (INTVAL (y) - INTVAL (x));
+	  c += cy - cx;
 	  return offset_overlap_p (c, xsize, ysize);
 	}
 
@@ -2613,7 +2611,9 @@ memrefs_conflict_p (int xsize, rtx x, in
 	 sizes), because we can't know how far they are from each
 	 other.  */
       if (CONSTANT_P (y))
-	return (xsize < 0 || ysize < 0 || offset_overlap_p (c, xsize, ysize));
+	return (may_lt (xsize, 0)
+		|| may_lt (ysize, 0)
+		|| offset_overlap_p (c, xsize, ysize));
 
       return -1;
     }

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [101/nnn] poly_int: GET_MODE_NUNITS
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (99 preceding siblings ...)
  2017-10-23 17:41 ` [100/nnn] poly_int: memrefs_conflict_p Richard Sandiford
@ 2017-10-23 17:41 ` Richard Sandiford
  2017-12-06  2:05   ` Jeff Law
  2017-10-23 17:42 ` [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS Richard Sandiford
                   ` (6 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:41 UTC (permalink / raw)
  To: gcc-patches

This patch changes GET_MODE_NUNITS from unsigned char
to poly_uint16, although it remains a macro when compiling
target code with NUM_POLY_INT_COEFFS == 1.

If the number of units isn't known at compile time, we use:

  (const:M (vec_duplicate:M X))

to represent a vector in which every element is equal to X.  The code
ensures that there is only a single instance of each constant, so that
pointer equality is enough.  (This is a requirement for the constants
that go in const_tiny_rtx, but we might as well do it for all constants.)

Similarly we use:

  (const:M (vec_series:M A B))

for a linear series starting at A and having step B.

The to_constant call in make_vector_type goes away in a later patch.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* machmode.h (mode_nunits): Change from unsigned char to
	poly_uint16_pod.
	(ONLY_FIXED_SIZE_MODES): New macro.
	(pod_mode::measurement_type, scalar_int_mode::measurement_type)
	(scalar_float_mode::measurement_type, scalar_mode::measurement_type)
	(complex_mode::measurement_type, fixed_size_mode::measurement_type):
	New typedefs.
	(mode_to_nunits): Return a poly_uint16 rather than an unsigned short.
	(GET_MODE_NUNITS): Return a constant if ONLY_FIXED_SIZE_MODES,
	or if measurement_type is not polynomial.
	* genmodes.c (ZERO_COEFFS): New macro.
	(emit_mode_nunits_inline): Make mode_nunits_inline return a
	poly_uint16.
	(emit_mode_nunits): Change the type of mode_nunits to poly_uint16_pod.
	Use ZERO_COEFFS when emitting initializers.
	* data-streamer.h (bp_pack_poly_value): New function.
	(bp_unpack_poly_value): Likewise.
	* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
	for GET_MODE_NUNITS.
	* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
	for GET_MODE_NUNITS.
	* tree.c (make_vector_type): Remove temporary shim and make
	the real function take the number of units as a poly_uint64
	rather than an int.
	(build_vector_type_for_mode): Handle polynomial nunits.
	* emit-rtl.c (gen_const_vec_duplicate_1): Likewise.
	(gen_const_vec_series, gen_rtx_CONST_VECTOR): Likewise.
	* genrecog.c (validate_pattern): Likewise.
	* optabs-query.c (can_mult_highpart_p): Likewise.
	* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
	* optabs.c (expand_vector_broadcast, expand_binop_directly)
	(shift_amt_for_vec_perm_mask, expand_vec_perm, expand_vec_cond_expr)
	(expand_mult_highpart): Likewise.
	* rtlanal.c (subreg_get_info): Likewise.
	* simplify-rtx.c (simplify_unary_operation_1): Likewise.
	(simplify_const_unary_operation, simplify_binary_operation_1)
	(simplify_const_binary_operation, simplify_ternary_operation)
	(test_vector_ops_duplicate, test_vector_ops): Likewise.
	* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
	(vect_grouped_load_supported): Likewise.
	* tree-vect-generic.c (type_for_widest_vector_mode): Likewise.
	* tree-vect-loop.c (have_whole_vector_shift): Likewise.

gcc/ada/
	* gcc-interface/misc.c (enumerate_modes): Handle polynomial
	GET_MODE_NUNITS.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2017-10-23 17:11:54.535862371 +0100
+++ gcc/machmode.h	2017-10-23 17:25:48.620492005 +0100
@@ -25,7 +25,7 @@ typedef opt_mode<machine_mode> opt_machi
 extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
 extern const unsigned short mode_precision[NUM_MACHINE_MODES];
 extern const unsigned char mode_inner[NUM_MACHINE_MODES];
-extern const unsigned char mode_nunits[NUM_MACHINE_MODES];
+extern const poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];
 extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
 extern const unsigned short mode_unit_precision[NUM_MACHINE_MODES];
 extern const unsigned char mode_wider[NUM_MACHINE_MODES];
@@ -76,6 +76,14 @@ struct mode_traits<machine_mode>
   typedef machine_mode from_int;
 };
 
+/* Always treat machine modes as fixed-size while compiling code specific
+   to targets that have no variable-size modes.  */
+#if defined (IN_TARGET_CODE) && NUM_POLY_INT_COEFFS == 1
+#define ONLY_FIXED_SIZE_MODES 1
+#else
+#define ONLY_FIXED_SIZE_MODES 0
+#endif
+
 /* Get the name of mode MODE as a string.  */
 
 extern const char * const mode_name[NUM_MACHINE_MODES];
@@ -313,6 +321,7 @@ opt_mode<T>::exists (U *mode) const
 struct pod_mode
 {
   typedef typename mode_traits<T>::from_int from_int;
+  typedef typename T::measurement_type measurement_type;
 
   machine_mode m_mode;
   ALWAYS_INLINE operator machine_mode () const { return m_mode; }
@@ -391,6 +400,7 @@ is_a (machine_mode m, U *result)
 {
 public:
   typedef mode_traits<scalar_int_mode>::from_int from_int;
+  typedef unsigned short measurement_type;
 
   ALWAYS_INLINE scalar_int_mode () {}
   ALWAYS_INLINE scalar_int_mode (from_int m) : m_mode (machine_mode (m)) {}
@@ -415,6 +425,7 @@ scalar_int_mode::includes_p (machine_mod
 {
 public:
   typedef mode_traits<scalar_float_mode>::from_int from_int;
+  typedef unsigned short measurement_type;
 
   ALWAYS_INLINE scalar_float_mode () {}
   ALWAYS_INLINE scalar_float_mode (from_int m) : m_mode (machine_mode (m)) {}
@@ -439,6 +450,7 @@ scalar_float_mode::includes_p (machine_m
 {
 public:
   typedef mode_traits<scalar_mode>::from_int from_int;
+  typedef unsigned short measurement_type;
 
   ALWAYS_INLINE scalar_mode () {}
   ALWAYS_INLINE scalar_mode (from_int m) : m_mode (machine_mode (m)) {}
@@ -480,6 +492,7 @@ scalar_mode::includes_p (machine_mode m)
 {
 public:
   typedef mode_traits<complex_mode>::from_int from_int;
+  typedef unsigned short measurement_type;
 
   ALWAYS_INLINE complex_mode () {}
   ALWAYS_INLINE complex_mode (from_int m) : m_mode (machine_mode (m)) {}
@@ -570,7 +583,7 @@ mode_to_unit_precision (machine_mode mod
 
 /* Return the base GET_MODE_NUNITS value for MODE.  */
 
-ALWAYS_INLINE unsigned short
+ALWAYS_INLINE poly_uint16
 mode_to_nunits (machine_mode mode)
 {
 #if GCC_VERSION >= 4001
@@ -627,7 +640,29 @@ #define GET_MODE_UNIT_PRECISION(MODE) (m
 /* Get the number of units in an object of mode MODE.  This is 2 for
    complex modes and the number of elements for vector modes.  */
 
-#define GET_MODE_NUNITS(MODE) (mode_to_nunits (MODE))
+#if ONLY_FIXED_SIZE_MODES
+#define GET_MODE_NUNITS(MODE) (mode_to_nunits (MODE).coeffs[0])
+#else
+ALWAYS_INLINE poly_uint16
+GET_MODE_NUNITS (machine_mode mode)
+{
+  return mode_to_nunits (mode);
+}
+
+template<typename T>
+ALWAYS_INLINE typename if_poly<typename T::measurement_type>::t
+GET_MODE_NUNITS (const T &mode)
+{
+  return mode_to_nunits (mode);
+}
+
+template<typename T>
+ALWAYS_INLINE typename if_nonpoly<typename T::measurement_type>::t
+GET_MODE_NUNITS (const T &mode)
+{
+  return mode_to_nunits (mode).coeffs[0];
+}
+#endif
 
 /* Get the next wider natural mode (eg, QI -> HI -> SI -> DI -> TI).  */
 
@@ -660,6 +695,7 @@ #define GET_MODE_COMPLEX_MODE(MODE) ((ma
 {
 public:
   typedef mode_traits<fixed_size_mode>::from_int from_int;
+  typedef unsigned short measurement_type;
 
   ALWAYS_INLINE fixed_size_mode () {}
   ALWAYS_INLINE fixed_size_mode (from_int m) : m_mode (machine_mode (m)) {}
Index: gcc/genmodes.c
===================================================================
--- gcc/genmodes.c	2017-10-23 17:11:40.124715442 +0100
+++ gcc/genmodes.c	2017-10-23 17:25:48.618492077 +0100
@@ -901,6 +901,16 @@ calc_wider_mode (void)
     }
 }
 
+/* Text to add to the constant part of a poly_int_pod initializer in
+   order to fill out te whole structure.  */
+#if NUM_POLY_INT_COEFFS == 1
+#define ZERO_COEFFS ""
+#elif NUM_POLY_INT_COEFFS == 2
+#define ZERO_COEFFS ", 0"
+#else
+#error "Unknown value of NUM_POLY_INT_COEFFS"
+#endif
+
 /* Output routines.  */
 
 #define tagged_printf(FMT, ARG, TAG) do {		\
@@ -1008,11 +1018,10 @@ inline __attribute__((__always_inline__)
 #else\n\
 extern __inline__ __attribute__((__always_inline__, __gnu_inline__))\n\
 #endif\n\
-unsigned char\n\
+poly_uint16\n\
 mode_nunits_inline (machine_mode mode)\n\
 {\n\
-  extern const unsigned char mode_nunits[NUM_MACHINE_MODES];\n\
-  gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);\n\
+  extern poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];\n\
   switch (mode)\n\
     {");
 
@@ -1381,10 +1390,10 @@ emit_mode_nunits (void)
   int c;
   struct mode_data *m;
 
-  print_decl ("unsigned char", "mode_nunits", "NUM_MACHINE_MODES");
+  print_decl ("poly_uint16_pod", "mode_nunits", "NUM_MACHINE_MODES");
 
   for_all_modes (c, m)
-    tagged_printf ("%u", m->ncomponents, m->name);
+    tagged_printf ("{ %u" ZERO_COEFFS " }", m->ncomponents, m->name);
 
   print_closer ();
 }
Index: gcc/data-streamer.h
===================================================================
--- gcc/data-streamer.h	2017-02-23 19:54:15.000000000 +0000
+++ gcc/data-streamer.h	2017-10-23 17:25:48.617492113 +0100
@@ -126,6 +126,17 @@ bp_pack_value (struct bitpack_d *bp, bit
   bp->pos = pos;
 }
 
+/* Pack VAL into the bit-packing context BP, using NBITS for each
+   coefficient.  */
+static inline void
+bp_pack_poly_value (struct bitpack_d *bp,
+		    const poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t> &val,
+		    unsigned nbits)
+{
+  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    bp_pack_value (bp, val.coeffs[i], nbits);
+}
+
 /* Finishes bit-packing of BP.  */
 static inline void
 streamer_write_bitpack (struct bitpack_d *bp)
@@ -174,6 +185,17 @@ bp_unpack_value (struct bitpack_d *bp, u
   return val & mask;
 }
 
+/* Unpacks a polynomial value from the bit-packing context BP in which each
+   coefficient has NBITS bits.  */
+static inline poly_int<NUM_POLY_INT_COEFFS, bitpack_word_t>
+bp_unpack_poly_value (struct bitpack_d *bp, unsigned nbits)
+{
+  poly_int_pod<NUM_POLY_INT_COEFFS, bitpack_word_t> x;
+  for (int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    x.coeffs[i] = bp_unpack_value (bp, nbits);
+  return x;
+}
+
 
 /* Write a character to the output block.  */
 
Index: gcc/lto-streamer-in.c
===================================================================
--- gcc/lto-streamer-in.c	2017-10-23 11:41:25.264312838 +0100
+++ gcc/lto-streamer-in.c	2017-10-23 17:25:48.619492041 +0100
@@ -1607,7 +1607,7 @@ lto_input_mode_table (struct lto_file_de
       unsigned int size = bp_unpack_value (&bp, 8);
       unsigned int prec = bp_unpack_value (&bp, 16);
       machine_mode inner = (machine_mode) bp_unpack_value (&bp, 8);
-      unsigned int nunits = bp_unpack_value (&bp, 8);
+      poly_uint16 nunits = bp_unpack_poly_value (&bp, 16);
       unsigned int ibit = 0, fbit = 0;
       unsigned int real_fmt_len = 0;
       const char *real_fmt_name = NULL;
@@ -1645,7 +1645,7 @@ lto_input_mode_table (struct lto_file_de
 		  : GET_MODE_INNER (mr) != table[(int) inner])
 	      || GET_MODE_IBIT (mr) != ibit
 	      || GET_MODE_FBIT (mr) != fbit
-	      || GET_MODE_NUNITS (mr) != nunits)
+	      || may_ne (GET_MODE_NUNITS (mr), nunits))
 	    continue;
 	  else if ((mclass == MODE_FLOAT || mclass == MODE_DECIMAL_FLOAT)
 		   && strcmp (REAL_MODE_FORMAT (mr)->name, real_fmt_name) != 0)
Index: gcc/lto-streamer-out.c
===================================================================
--- gcc/lto-streamer-out.c	2017-10-23 17:11:40.246949037 +0100
+++ gcc/lto-streamer-out.c	2017-10-23 17:25:48.620492005 +0100
@@ -2775,7 +2775,7 @@ lto_write_mode_table (void)
 	  bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
 	  bp_pack_value (&bp, GET_MODE_PRECISION (m), 16);
 	  bp_pack_value (&bp, GET_MODE_INNER (m), 8);
-	  bp_pack_value (&bp, GET_MODE_NUNITS (m), 8);
+	  bp_pack_poly_value (&bp, GET_MODE_NUNITS (m), 16);
 	  switch (GET_MODE_CLASS (m))
 	    {
 	    case MODE_FRACT:
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 17:22:35.830905181 +0100
+++ gcc/tree.c	2017-10-23 17:25:48.625491825 +0100
@@ -9654,19 +9654,19 @@ omp_clause_operand_check_failed (int idx
 }
 #endif /* ENABLE_TREE_CHECKING */
 \f
-/* Create a new vector type node holding SUBPARTS units of type INNERTYPE,
+/* Create a new vector type node holding NUNITS units of type INNERTYPE,
    and mapped to the machine mode MODE.  Initialize its fields and build
    the information necessary for debugging output.  */
 
 static tree
-make_vector_type (tree innertype, int nunits, machine_mode mode)
+make_vector_type (tree innertype, poly_int64 nunits, machine_mode mode)
 {
   tree t;
   tree mv_innertype = TYPE_MAIN_VARIANT (innertype);
 
   t = make_node (VECTOR_TYPE);
   TREE_TYPE (t) = mv_innertype;
-  SET_TYPE_VECTOR_SUBPARTS (t, nunits);
+  SET_TYPE_VECTOR_SUBPARTS (t, nunits.to_constant ()); /* Temporary */
   SET_TYPE_MODE (t, mode);
 
   if (TYPE_STRUCTURAL_EQUALITY_P (mv_innertype) || in_lto_p)
@@ -9693,13 +9693,6 @@ make_vector_type (tree innertype, int nu
   return t;
 }
 
-/* Temporary.  */
-static tree
-make_vector_type (tree innertype, poly_uint64 nunits, machine_mode mode)
-{
-  return make_vector_type (innertype, (int) nunits.to_constant (), mode);
-}
-
 static tree
 make_or_reuse_type (unsigned size, int unsignedp)
 {
@@ -10557,7 +10550,7 @@ reconstruct_complex_type (tree type, tre
 tree
 build_vector_type_for_mode (tree innertype, machine_mode mode)
 {
-  int nunits;
+  poly_int64 nunits;
   unsigned int bitsize;
 
   switch (GET_MODE_CLASS (mode))
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 17:25:30.703136044 +0100
+++ gcc/emit-rtl.c	2017-10-23 17:25:48.618492077 +0100
@@ -5926,8 +5926,8 @@ static GTY((deletable)) rtx spare_vec_du
 static rtx
 gen_const_vec_duplicate_1 (machine_mode mode, rtx el)
 {
-  int nunits = GET_MODE_NUNITS (mode);
-  if (1)
+  int nunits;
+  if (GET_MODE_NUNITS (mode).is_constant (&nunits))
     {
       rtvec v = rtvec_alloc (nunits);
 
@@ -6024,8 +6024,8 @@ gen_const_vec_series (machine_mode mode,
 {
   gcc_assert (CONSTANT_P (base) && CONSTANT_P (step));
 
-  int nunits = GET_MODE_NUNITS (mode);
-  if (1)
+  int nunits;
+  if (GET_MODE_NUNITS (mode).is_constant (&nunits))
     {
       rtvec v = rtvec_alloc (nunits);
       scalar_mode inner_mode = GET_MODE_INNER (mode);
@@ -6089,7 +6089,7 @@ gen_const_vector (machine_mode mode, int
 rtx
 gen_rtx_CONST_VECTOR (machine_mode mode, rtvec v)
 {
-  gcc_assert (GET_MODE_NUNITS (mode) == GET_NUM_ELEM (v));
+  gcc_assert (must_eq (GET_MODE_NUNITS (mode), GET_NUM_ELEM (v)));
 
   /* If the values are all the same, check to see if we can use one of the
      standard constant vectors.  */
Index: gcc/genrecog.c
===================================================================
--- gcc/genrecog.c	2017-10-23 17:16:50.367528682 +0100
+++ gcc/genrecog.c	2017-10-23 17:25:48.619492041 +0100
@@ -746,14 +746,20 @@ validate_pattern (rtx pattern, md_rtx_in
 	    = VECTOR_MODE_P (mode) ? GET_MODE_INNER (mode) : mode;
 	  if (GET_CODE (XEXP (pattern, 1)) == PARALLEL)
 	    {
-	      int expected = VECTOR_MODE_P (mode) ? GET_MODE_NUNITS (mode) : 1;
-	      if (XVECLEN (XEXP (pattern, 1), 0) != expected)
+	      int expected = 1;
+	      unsigned int nelems;
+	      if (VECTOR_MODE_P (mode)
+		  && !GET_MODE_NUNITS (mode).is_constant (&expected))
+		error_at (info->loc,
+			  "vec_select with variable-sized mode %s",
+			  GET_MODE_NAME (mode));
+	      else if (XVECLEN (XEXP (pattern, 1), 0) != expected)
 		error_at (info->loc,
 			  "vec_select parallel with %d elements, expected %d",
 			  XVECLEN (XEXP (pattern, 1), 0), expected);
-	      else if (VECTOR_MODE_P (imode))
+	      else if (VECTOR_MODE_P (imode)
+		       && GET_MODE_NUNITS (imode).is_constant (&nelems))
 		{
-		  unsigned int nelems = GET_MODE_NUNITS (imode);
 		  int i;
 		  for (i = 0; i < expected; ++i)
 		    if (CONST_INT_P (XVECEXP (XEXP (pattern, 1), 0, i))
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-10-23 17:22:32.723227539 +0100
+++ gcc/optabs-query.c	2017-10-23 17:25:48.620492005 +0100
@@ -445,7 +445,12 @@ can_mult_highpart_p (machine_mode mode,
   if (GET_MODE_CLASS (mode) != MODE_VECTOR_INT)
     return 0;
 
-  nunits = GET_MODE_NUNITS (mode);
+  /* We need a constant number of elements in order to construct
+     the permute mask below.  */
+  /* ??? Maybe we should have specific optabs for these permutations,
+     so that we can use them even for a variable number of units.  */
+  if (!GET_MODE_NUNITS (mode).is_constant (&nunits))
+    return 0;
 
   op = uns_p ? vec_widen_umult_even_optab : vec_widen_smult_even_optab;
   if (optab_handler (op, mode) != CODE_FOR_nothing)
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2017-10-23 17:11:39.939361220 +0100
+++ gcc/optabs-tree.c	2017-10-23 17:25:48.620492005 +0100
@@ -338,7 +338,7 @@ expand_vec_cond_expr_p (tree value_type,
     return true;
 
   if (GET_MODE_SIZE (value_mode) != GET_MODE_SIZE (cmp_op_mode)
-      || GET_MODE_NUNITS (value_mode) != GET_MODE_NUNITS (cmp_op_mode))
+      || may_ne (GET_MODE_NUNITS (value_mode), GET_MODE_NUNITS (cmp_op_mode)))
     return false;
 
   if (get_vcond_icode (TYPE_MODE (value_type), TYPE_MODE (cmp_op_type),
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 17:11:40.272998820 +0100
+++ gcc/optabs.c	2017-10-23 17:25:48.621491969 +0100
@@ -370,17 +370,15 @@ force_expand_binop (machine_mode mode, o
 rtx
 expand_vector_broadcast (machine_mode vmode, rtx op)
 {
-  enum insn_code icode;
+  int n;
   rtvec vec;
-  rtx ret;
-  int i, n;
 
   gcc_checking_assert (VECTOR_MODE_P (vmode));
 
   if (CONSTANT_P (op))
     return gen_const_vec_duplicate (vmode, op);
 
-  icode = optab_handler (vec_duplicate_optab, vmode);
+  insn_code icode = optab_handler (vec_duplicate_optab, vmode);
   if (icode != CODE_FOR_nothing)
     {
       struct expand_operand ops[2];
@@ -390,6 +388,9 @@ expand_vector_broadcast (machine_mode vm
       return ops[0].value;
     }
 
+  if (!GET_MODE_NUNITS (vmode).is_constant (&n))
+    return NULL;
+
   /* ??? If the target doesn't have a vec_init, then we have no easy way
      of performing this operation.  Most of this sort of generic support
      is hidden away in the vector lowering support in gimple.  */
@@ -398,11 +399,10 @@ expand_vector_broadcast (machine_mode vm
   if (icode == CODE_FOR_nothing)
     return NULL;
 
-  n = GET_MODE_NUNITS (vmode);
   vec = rtvec_alloc (n);
-  for (i = 0; i < n; ++i)
+  for (int i = 0; i < n; ++i)
     RTVEC_ELT (vec, i) = op;
-  ret = gen_reg_rtx (vmode);
+  rtx ret = gen_reg_rtx (vmode);
   emit_insn (GEN_FCN (icode) (ret, gen_rtx_PARALLEL (vmode, vec)));
 
   return ret;
@@ -1068,7 +1068,7 @@ expand_binop_directly (enum insn_code ic
 	 arguments.  */
       tmp_mode = insn_data[(int) icode].operand[0].mode;
       if (VECTOR_MODE_P (mode)
-	  && GET_MODE_NUNITS (tmp_mode) != 2 * GET_MODE_NUNITS (mode))
+	  && may_ne (GET_MODE_NUNITS (tmp_mode), 2 * GET_MODE_NUNITS (mode)))
 	{
 	  delete_insns_since (last);
 	  return NULL_RTX;
@@ -5385,16 +5385,15 @@ vector_compare_rtx (machine_mode cmp_mod
 static rtx
 shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
 {
-  unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
-  unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
-
   if (GET_CODE (sel) != CONST_VECTOR)
     return NULL_RTX;
 
-  first = INTVAL (CONST_VECTOR_ELT (sel, 0));
+  unsigned int nelt = CONST_VECTOR_NUNITS (sel);
+  unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
+  unsigned int first = INTVAL (CONST_VECTOR_ELT (sel, 0));
   if (first >= nelt)
     return NULL_RTX;
-  for (i = 1; i < nelt; i++)
+  for (unsigned int i = 1; i < nelt; i++)
     {
       int idx = INTVAL (CONST_VECTOR_ELT (sel, i));
       unsigned int expected = i + first;
@@ -5450,7 +5449,7 @@ expand_vec_perm (machine_mode mode, rtx
 {
   enum insn_code icode;
   machine_mode qimode;
-  unsigned int i, w, e, u;
+  unsigned int i, w, u;
   rtx tmp, sel_qi = NULL;
   rtvec vec;
 
@@ -5458,7 +5457,6 @@ expand_vec_perm (machine_mode mode, rtx
     target = gen_reg_rtx (mode);
 
   w = GET_MODE_SIZE (mode);
-  e = GET_MODE_NUNITS (mode);
   u = GET_MODE_UNIT_SIZE (mode);
 
   /* Set QIMODE to a different vector mode with byte elements.
@@ -5474,6 +5472,7 @@ expand_vec_perm (machine_mode mode, rtx
     {
       /* See if this can be handled with a vec_shr.  We only do this if the
 	 second vector is all zeroes.  */
+      unsigned int e = CONST_VECTOR_NUNITS (sel);
       enum insn_code shift_code = optab_handler (vec_shr_optab, mode);
       enum insn_code shift_code_qi = ((qimode != VOIDmode && qimode != mode)
 				      ? optab_handler (vec_shr_optab, qimode)
@@ -5688,7 +5687,8 @@ expand_vec_cond_expr (tree vec_cond_type
 
 
   gcc_assert (GET_MODE_SIZE (mode) == GET_MODE_SIZE (cmp_op_mode)
-	      && GET_MODE_NUNITS (mode) == GET_MODE_NUNITS (cmp_op_mode));
+	      && must_eq (GET_MODE_NUNITS (mode),
+			  GET_MODE_NUNITS (cmp_op_mode)));
 
   icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
   if (icode == CODE_FOR_nothing)
@@ -5787,7 +5787,6 @@ expand_mult_highpart (machine_mode mode,
   machine_mode wmode;
   rtx m1, m2, perm;
   optab tab1, tab2;
-  rtvec v;
 
   method = can_mult_highpart_p (mode, uns_p);
   switch (method)
@@ -5813,9 +5812,9 @@ expand_mult_highpart (machine_mode mode,
     }
 
   icode = optab_handler (tab1, mode);
-  nunits = GET_MODE_NUNITS (mode);
   wmode = insn_data[icode].operand[0].mode;
-  gcc_checking_assert (2 * GET_MODE_NUNITS (wmode) == nunits);
+  gcc_checking_assert (must_eq (2 * GET_MODE_NUNITS (wmode),
+				GET_MODE_NUNITS (mode)));
   gcc_checking_assert (GET_MODE_SIZE (wmode) == GET_MODE_SIZE (mode));
 
   create_output_operand (&eops[0], gen_reg_rtx (wmode), wmode);
@@ -5830,9 +5829,13 @@ expand_mult_highpart (machine_mode mode,
   expand_insn (optab_handler (tab2, mode), 3, eops);
   m2 = gen_lowpart (mode, eops[0].value);
 
-  v = rtvec_alloc (nunits);
   if (method == 2)
     {
+      /* ??? Might want a way of representing this with variable-width
+	 vectors.  */
+      if (!GET_MODE_NUNITS (mode).is_constant (&nunits))
+	return NULL_RTX;
+      rtvec v = rtvec_alloc (nunits);
       for (i = 0; i < nunits; ++i)
 	RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
 				    + ((i & 1) ? nunits : 0));
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:25:38.243864993 +0100
+++ gcc/rtlanal.c	2017-10-23 17:25:48.622491933 +0100
@@ -3706,11 +3706,11 @@ subreg_get_info (unsigned int xregno, ma
   if (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode))
     {
       /* As a consequence, we must be dealing with a constant number of
-	 scalars, and thus a constant offset.  */
+	 scalars, and thus a constant offset and number of units.  */
       HOST_WIDE_INT coffset = offset.to_constant ();
       HOST_WIDE_INT cysize = ysize.to_constant ();
       nregs_xmode = HARD_REGNO_NREGS_WITH_PADDING (xregno, xmode);
-      unsigned int nunits = GET_MODE_NUNITS (xmode);
+      unsigned int nunits = GET_MODE_NUNITS (xmode).to_constant ();
       scalar_mode xmode_unit = GET_MODE_INNER (xmode);
       gcc_assert (HARD_REGNO_NREGS_HAS_PADDING (xregno, xmode_unit));
       gcc_assert (nregs_xmode
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 17:18:47.665057096 +0100
+++ gcc/simplify-rtx.c	2017-10-23 17:25:48.622491933 +0100
@@ -1223,7 +1223,7 @@ simplify_unary_operation_1 (enum rtx_cod
 
       /* If we know that the value is already truncated, we can
 	 replace the TRUNCATE with a SUBREG.  */
-      if (GET_MODE_NUNITS (mode) == 1
+      if (must_eq (GET_MODE_NUNITS (mode), 1)
 	  && (TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
 	      || truncated_to_mode (mode, op)))
 	{
@@ -1740,9 +1740,10 @@ simplify_const_unary_operation (enum rtx
       }
       if (CONST_SCALAR_INT_P (op) || CONST_DOUBLE_AS_FLOAT_P (op))
 	return gen_const_vec_duplicate (mode, op);
-      if (GET_CODE (op) == CONST_VECTOR)
+      unsigned int n_elts;
+      if (GET_CODE (op) == CONST_VECTOR
+	  && GET_MODE_NUNITS (mode).is_constant (&n_elts))
 	{
-	  unsigned int n_elts = GET_MODE_NUNITS (mode);
 	  unsigned int in_n_elts = CONST_VECTOR_NUNITS (op);
 	  gcc_assert (in_n_elts < n_elts);
 	  gcc_assert ((n_elts % in_n_elts) == 0);
@@ -1755,15 +1756,14 @@ simplify_const_unary_operation (enum rtx
 
   if (VECTOR_MODE_P (mode) && GET_CODE (op) == CONST_VECTOR)
     {
-      int elt_size = GET_MODE_UNIT_SIZE (mode);
-      unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
       machine_mode opmode = GET_MODE (op);
-      int op_elt_size = GET_MODE_UNIT_SIZE (opmode);
-      unsigned op_n_elts = (GET_MODE_SIZE (opmode) / op_elt_size);
+      unsigned int n_elts = CONST_VECTOR_NUNITS (op);
+      gcc_assert (must_eq (GET_MODE_NUNITS (mode), n_elts));
+      gcc_assert (must_eq (GET_MODE_NUNITS (opmode), n_elts));
+
       rtvec v = rtvec_alloc (n_elts);
       unsigned int i;
 
-      gcc_assert (op_n_elts == n_elts);
       for (i = 0; i < n_elts; i++)
 	{
 	  rtx x = simplify_unary_operation (code, GET_MODE_INNER (mode),
@@ -3614,13 +3614,14 @@ simplify_binary_operation_1 (enum rtx_co
 	     nested VEC_SELECT expressions.  When input operand is a memory
 	     operand, this operation can be simplified to a simple scalar
 	     load from an offseted memory address.  */
-	  if (GET_CODE (trueop0) == VEC_SELECT)
+	  int n_elts;
+	  if (GET_CODE (trueop0) == VEC_SELECT
+	      && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))
+		  .is_constant (&n_elts)))
 	    {
 	      rtx op0 = XEXP (trueop0, 0);
 	      rtx op1 = XEXP (trueop0, 1);
 
-	      int n_elts = GET_MODE_NUNITS (GET_MODE (op0));
-
 	      int i = INTVAL (XVECEXP (trueop1, 0, 0));
 	      int elem;
 
@@ -3645,9 +3646,11 @@ simplify_binary_operation_1 (enum rtx_co
 		  mode00 = GET_MODE (op00);
 		  mode01 = GET_MODE (op01);
 
-		  /* Find out number of elements of each operand.  */
-		  n_elts00 = GET_MODE_NUNITS (mode00);
-		  n_elts01 = GET_MODE_NUNITS (mode01);
+		  /* Find out the number of elements of each operand.
+		     Since the concatenated result has a constant number
+		     of elements, the operands must too.  */
+		  n_elts00 = GET_MODE_NUNITS (mode00).to_constant ();
+		  n_elts01 = GET_MODE_NUNITS (mode01).to_constant ();
 
 		  gcc_assert (n_elts == n_elts00 + n_elts01);
 
@@ -3686,12 +3689,11 @@ simplify_binary_operation_1 (enum rtx_co
 
 	  if (GET_CODE (trueop0) == CONST_VECTOR)
 	    {
-	      int elt_size = GET_MODE_UNIT_SIZE (mode);
-	      unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
+	      unsigned n_elts = XVECLEN (trueop1, 0);
 	      rtvec v = rtvec_alloc (n_elts);
 	      unsigned int i;
 
-	      gcc_assert (XVECLEN (trueop1, 0) == (int) n_elts);
+	      gcc_assert (must_eq (n_elts, GET_MODE_NUNITS (mode)));
 	      for (i = 0; i < n_elts; i++)
 		{
 		  rtx x = XVECEXP (trueop1, 0, i);
@@ -3760,15 +3762,18 @@ simplify_binary_operation_1 (enum rtx_co
 	    }
 
 	  /* If we select one half of a vec_concat, return that.  */
+	  int l0, l1;
 	  if (GET_CODE (trueop0) == VEC_CONCAT
+	      && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 0)))
+		  .is_constant (&l0))
+	      && (GET_MODE_NUNITS (GET_MODE (XEXP (trueop0, 1)))
+		  .is_constant (&l1))
 	      && CONST_INT_P (XVECEXP (trueop1, 0, 0)))
 	    {
 	      rtx subop0 = XEXP (trueop0, 0);
 	      rtx subop1 = XEXP (trueop0, 1);
 	      machine_mode mode0 = GET_MODE (subop0);
 	      machine_mode mode1 = GET_MODE (subop1);
-	      int l0 = GET_MODE_NUNITS (mode0);
-	      int l1 = GET_MODE_NUNITS (mode1);
 	      int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
 	      if (i0 == 0 && !side_effects_p (op1) && mode == mode0)
 		{
@@ -3875,7 +3880,7 @@ simplify_binary_operation_1 (enum rtx_co
 	{
 	  rtx op0_subop1 = XEXP (trueop0, 1);
 	  gcc_assert (GET_CODE (op0_subop1) == PARALLEL);
-	  gcc_assert (XVECLEN (trueop1, 0) == GET_MODE_NUNITS (mode));
+	  gcc_assert (must_eq (XVECLEN (trueop1, 0), GET_MODE_NUNITS (mode)));
 
 	  /* Apply the outer ordering vector to the inner one.  (The inner
 	     ordering vector is expressly permitted to be of a different
@@ -3919,15 +3924,16 @@ simplify_binary_operation_1 (enum rtx_co
 	else
 	  gcc_assert (GET_MODE_INNER (mode) == op1_mode);
 
+	unsigned int n_elts, in_n_elts;
 	if ((GET_CODE (trueop0) == CONST_VECTOR
 	     || CONST_SCALAR_INT_P (trueop0) 
 	     || CONST_DOUBLE_AS_FLOAT_P (trueop0))
 	    && (GET_CODE (trueop1) == CONST_VECTOR
 		|| CONST_SCALAR_INT_P (trueop1) 
-		|| CONST_DOUBLE_AS_FLOAT_P (trueop1)))
+		|| CONST_DOUBLE_AS_FLOAT_P (trueop1))
+	    && GET_MODE_NUNITS (mode).is_constant (&n_elts)
+	    && GET_MODE_NUNITS (op0_mode).is_constant (&in_n_elts))
 	  {
-	    unsigned n_elts = GET_MODE_NUNITS (mode);
-	    unsigned in_n_elts = GET_MODE_NUNITS (op0_mode);
 	    rtvec v = rtvec_alloc (n_elts);
 	    unsigned int i;
 	    for (i = 0; i < n_elts; i++)
@@ -4019,7 +4025,7 @@ simplify_const_binary_operation (enum rt
     {
       unsigned int n_elts = CONST_VECTOR_NUNITS (op0);
       gcc_assert (n_elts == (unsigned int) CONST_VECTOR_NUNITS (op1));
-      gcc_assert (n_elts == GET_MODE_NUNITS (mode));
+      gcc_assert (must_eq (n_elts, GET_MODE_NUNITS (mode)));
       rtvec v = rtvec_alloc (n_elts);
       unsigned int i;
 
@@ -4045,7 +4051,9 @@ simplify_const_binary_operation (enum rt
 	  || CONST_DOUBLE_AS_FLOAT_P (op1)
 	  || GET_CODE (op1) == CONST_FIXED))
     {
-      unsigned n_elts = GET_MODE_NUNITS (mode);
+      /* Both inputs have a constant number of elements, so the result
+	 must too.  */
+      unsigned n_elts = GET_MODE_NUNITS (mode).to_constant ();
       rtvec v = rtvec_alloc (n_elts);
 
       gcc_assert (n_elts >= 2);
@@ -4059,8 +4067,8 @@ simplify_const_binary_operation (enum rt
 	}
       else
 	{
-	  unsigned op0_n_elts = GET_MODE_NUNITS (GET_MODE (op0));
-	  unsigned op1_n_elts = GET_MODE_NUNITS (GET_MODE (op1));
+	  unsigned op0_n_elts = GET_MODE_NUNITS (GET_MODE (op0)).to_constant ();
+	  unsigned op1_n_elts = GET_MODE_NUNITS (GET_MODE (op1)).to_constant ();
 	  unsigned i;
 
 	  gcc_assert (GET_CODE (op0) == CONST_VECTOR);
@@ -5563,6 +5571,7 @@ simplify_ternary_operation (enum rtx_cod
   bool any_change = false;
   rtx tem, trueop2;
   scalar_int_mode int_mode, int_op0_mode;
+  unsigned int n_elts;
 
   switch (code)
     {
@@ -5748,9 +5757,9 @@ simplify_ternary_operation (enum rtx_cod
       gcc_assert (GET_MODE (op1) == mode);
       gcc_assert (VECTOR_MODE_P (mode));
       trueop2 = avoid_constant_pool_reference (op2);
-      if (CONST_INT_P (trueop2))
+      if (CONST_INT_P (trueop2)
+	  && GET_MODE_NUNITS (mode).is_constant (&n_elts))
 	{
-	  unsigned n_elts = GET_MODE_NUNITS (mode);
 	  unsigned HOST_WIDE_INT sel = UINTVAL (trueop2);
 	  unsigned HOST_WIDE_INT mask;
 	  if (n_elts == HOST_BITS_PER_WIDE_INT)
@@ -5814,7 +5823,7 @@ simplify_ternary_operation (enum rtx_cod
 	  if (GET_CODE (op0) == VEC_DUPLICATE
 	      && GET_CODE (XEXP (op0, 0)) == VEC_SELECT
 	      && GET_CODE (XEXP (XEXP (op0, 0), 1)) == PARALLEL
-	      && mode_nunits[GET_MODE (XEXP (op0, 0))] == 1)
+	      && must_eq (GET_MODE_NUNITS (GET_MODE (XEXP (op0, 0))), 1))
 	    {
 	      tem = XVECEXP ((XEXP (XEXP (op0, 0), 1)), 0, 0);
 	      if (CONST_INT_P (tem) && CONST_INT_P (op2))
@@ -6574,7 +6583,7 @@ test_vector_ops_duplicate (machine_mode
 {
   scalar_mode inner_mode = GET_MODE_INNER (mode);
   rtx duplicate = gen_rtx_VEC_DUPLICATE (mode, scalar_reg);
-  unsigned int nunits = GET_MODE_NUNITS (mode);
+  poly_uint64 nunits = GET_MODE_NUNITS (mode);
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
     {
       /* Test some simple unary cases with VEC_DUPLICATE arguments.  */
@@ -6611,11 +6620,15 @@ test_vector_ops_duplicate (machine_mode
 						duplicate, zero_par));
 
   /* And again with the final element.  */
-  rtx last_index = gen_int_mode (GET_MODE_NUNITS (mode) - 1, word_mode);
-  rtx last_par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, last_index));
-  ASSERT_RTX_PTR_EQ (scalar_reg,
-		     simplify_binary_operation (VEC_SELECT, inner_mode,
-						duplicate, last_par));
+  unsigned HOST_WIDE_INT const_nunits;
+  if (nunits.is_constant (&const_nunits))
+    {
+      rtx last_index = gen_int_mode (const_nunits - 1, word_mode);
+      rtx last_par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, last_index));
+      ASSERT_RTX_PTR_EQ (scalar_reg,
+			 simplify_binary_operation (VEC_SELECT, inner_mode,
+						    duplicate, last_par));
+    }
 
   /* Test a scalar subreg of a VEC_DUPLICATE.  */
   poly_uint64 offset = subreg_lowpart_offset (inner_mode, mode);
@@ -6624,7 +6637,7 @@ test_vector_ops_duplicate (machine_mode
 				      mode, offset));
 
   machine_mode narrower_mode;
-  if (nunits > 2
+  if (may_gt (nunits, 2U)
       && mode_for_vector (inner_mode, 2).exists (&narrower_mode)
       && VECTOR_MODE_P (narrower_mode))
     {
@@ -6712,7 +6725,7 @@ test_vector_ops ()
 	  rtx scalar_reg = make_test_reg (GET_MODE_INNER (mode));
 	  test_vector_ops_duplicate (mode, scalar_reg);
 	  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
-	      && GET_MODE_NUNITS (mode) > 2)
+	      && may_gt (GET_MODE_NUNITS (mode), 2))
 	    test_vector_ops_series (mode, scalar_reg);
 	}
     }
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 17:22:44.864968082 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:25:48.623491897 +0100
@@ -4568,9 +4568,10 @@ vect_grouped_store_supported (tree vecty
     }
 
   /* Check that the permutation is supported.  */
-  if (VECTOR_MODE_P (mode))
+  unsigned int nelt;
+  if (VECTOR_MODE_P (mode) && GET_MODE_NUNITS (mode).is_constant (&nelt))
     {
-      unsigned int i, nelt = GET_MODE_NUNITS (mode);
+      unsigned int i;
       auto_vec_perm_indices sel (nelt);
       sel.quick_grow (nelt);
 
@@ -5156,9 +5157,10 @@ vect_grouped_load_supported (tree vectyp
     }
 
   /* Check that the permutation is supported.  */
-  if (VECTOR_MODE_P (mode))
+  unsigned int nelt;
+  if (VECTOR_MODE_P (mode) && GET_MODE_NUNITS (mode).is_constant (&nelt))
     {
-      unsigned int i, j, nelt = GET_MODE_NUNITS (mode);
+      unsigned int i, j;
       auto_vec_perm_indices sel (nelt);
       sel.quick_grow (nelt);
 
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-10-23 17:22:45.856865193 +0100
+++ gcc/tree-vect-generic.c	2017-10-23 17:25:48.623491897 +0100
@@ -1159,7 +1159,7 @@ type_for_widest_vector_mode (tree type,
 {
   machine_mode inner_mode = TYPE_MODE (type);
   machine_mode best_mode = VOIDmode, mode;
-  int best_nunits = 0;
+  poly_int64 best_nunits = 0;
 
   if (SCALAR_FLOAT_MODE_P (inner_mode))
     mode = MIN_MODE_VECTOR_FLOAT;
@@ -1176,7 +1176,7 @@ type_for_widest_vector_mode (tree type,
 
   FOR_EACH_MODE_FROM (mode, mode)
     if (GET_MODE_INNER (mode) == inner_mode
-        && GET_MODE_NUNITS (mode) > best_nunits
+	&& may_gt (GET_MODE_NUNITS (mode), best_nunits)
 	&& optab_handler (op, mode) != CODE_FOR_nothing)
       best_mode = mode, best_nunits = GET_MODE_NUNITS (mode);
 
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-10-23 17:22:37.879692661 +0100
+++ gcc/tree-vect-loop.c	2017-10-23 17:25:48.624491861 +0100
@@ -3750,10 +3750,13 @@ have_whole_vector_shift (machine_mode mo
   if (direct_optab_handler (vec_perm_const_optab, mode) == CODE_FOR_nothing)
     return false;
 
-  unsigned int i, nelt = GET_MODE_NUNITS (mode);
-  auto_vec_perm_indices sel (nelt);
+  /* Variable-length vectors should be handled via the optab.  */
+  unsigned int nelt;
+  if (!GET_MODE_NUNITS (mode).is_constant (&nelt))
+    return false;
 
-  for (i = nelt/2; i >= 1; i/=2)
+  auto_vec_perm_indices sel (nelt);
+  for (unsigned int i = nelt / 2; i >= 1; i /= 2)
     {
       sel.truncate (0);
       calc_vec_perm_mask_for_shift (i, nelt, &sel);
Index: gcc/ada/gcc-interface/misc.c
===================================================================
--- gcc/ada/gcc-interface/misc.c	2017-10-23 11:41:24.995420946 +0100
+++ gcc/ada/gcc-interface/misc.c	2017-10-23 17:25:48.617492113 +0100
@@ -1298,9 +1298,10 @@ enumerate_modes (void (*f) (const char *
 	  }
 
       /* If no predefined C types were found, register the mode itself.  */
-      if (!skip_p)
+      int nunits;
+      if (!skip_p && GET_MODE_NUNITS (i).is_constant (&nunits))
 	f (GET_MODE_NAME (i), digs, complex_p,
-	   vector_p ? GET_MODE_NUNITS (i) : 0, float_rep,
+	   vector_p ? nunits : 0, float_rep,
 	   GET_MODE_PRECISION (i), GET_MODE_BITSIZE (i),
 	   GET_MODE_ALIGNMENT (i));
     }

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [102/nnn] poly_int: vect_permute_load/store_chain
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (101 preceding siblings ...)
  2017-10-23 17:42 ` [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS Richard Sandiford
@ 2017-10-23 17:42 ` Richard Sandiford
  2017-11-21  8:01   ` Jeff Law
  2017-10-23 17:43 ` [105/nnn] poly_int: expand_assignment Richard Sandiford
                   ` (4 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:42 UTC (permalink / raw)
  To: gcc-patches

The GET_MODE_NUNITS patch made vect_grouped_store_supported and
vect_grouped_load_supported check for a constant number of elements,
so vect_permute_store_chain and vect_permute_load_chain can assert
for that.  This patch adds commentary to that effect; the actual
asserts will be added by a later, more mechanical, patch.

The patch also reorganises the function so that the asserts
are linked specifically to code that builds permute vectors
element-by-element.  This allows a later patch to add support
for some variable-length permutes.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-vect-data-refs.c (vect_permute_store_chain): Reorganize
	so that both the length == 3 and length != 3 cases set up their
	own permute vectors.  Add comments explaining why we know the
	number of elements is constant.
	(vect_permute_load_chain): Likewise.

Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 17:25:48.623491897 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:25:50.361429427 +0100
@@ -4734,11 +4734,7 @@ vect_permute_store_chain (vec<tree> dr_c
   tree perm_mask_low, perm_mask_high;
   tree data_ref;
   tree perm3_mask_low, perm3_mask_high;
-  unsigned int i, n, log_length = exact_log2 (length);
-  unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
-
-  auto_vec_perm_indices sel (nelt);
-  sel.quick_grow (nelt);
+  unsigned int i, j, n, log_length = exact_log2 (length);
 
   result_chain->quick_grow (length);
   memcpy (result_chain->address (), dr_chain.address (),
@@ -4746,8 +4742,12 @@ vect_permute_store_chain (vec<tree> dr_c
 
   if (length == 3)
     {
+      /* vect_grouped_store_supported ensures that this is constant.  */
+      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype);
       unsigned int j0 = 0, j1 = 0, j2 = 0;
 
+      auto_vec_perm_indices sel (nelt);
+      sel.quick_grow (nelt);
       for (j = 0; j < 3; j++)
         {
 	  int nelt0 = ((3 - j) * nelt) % 3;
@@ -4806,6 +4806,10 @@ vect_permute_store_chain (vec<tree> dr_c
       /* If length is not equal to 3 then only power of 2 is supported.  */
       gcc_assert (pow2p_hwi (length));
 
+      /* vect_grouped_store_supported ensures that this is constant.  */
+      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype);
+      auto_vec_perm_indices sel (nelt);
+      sel.quick_grow (nelt);
       for (i = 0, n = nelt / 2; i < n; i++)
 	{
 	  sel[i * 2] = i;
@@ -5321,10 +5325,6 @@ vect_permute_load_chain (vec<tree> dr_ch
   gimple *perm_stmt;
   tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt));
   unsigned int i, j, log_length = exact_log2 (length);
-  unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
-
-  auto_vec_perm_indices sel (nelt);
-  sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
   memcpy (result_chain->address (), dr_chain.address (),
@@ -5332,8 +5332,12 @@ vect_permute_load_chain (vec<tree> dr_ch
 
   if (length == 3)
     {
+      /* vect_grouped_load_supported ensures that this is constant.  */
+      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
       unsigned int k;
 
+      auto_vec_perm_indices sel (nelt);
+      sel.quick_grow (nelt);
       for (k = 0; k < 3; k++)
 	{
 	  for (i = 0; i < nelt; i++)
@@ -5379,6 +5383,10 @@ vect_permute_load_chain (vec<tree> dr_ch
       /* If length is not equal to 3 then only power of 2 is supported.  */
       gcc_assert (pow2p_hwi (length));
 
+      /* vect_grouped_load_supported ensures that this is constant.  */
+      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
+      auto_vec_perm_indices sel (nelt);
+      sel.quick_grow (nelt);
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2;
       perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (100 preceding siblings ...)
  2017-10-23 17:41 ` [101/nnn] poly_int: GET_MODE_NUNITS Richard Sandiford
@ 2017-10-23 17:42 ` Richard Sandiford
  2017-10-24  9:06   ` Richard Biener
  2017-12-06  2:31   ` Jeff Law
  2017-10-23 17:42 ` [102/nnn] poly_int: vect_permute_load/store_chain Richard Sandiford
                   ` (5 subsequent siblings)
  107 siblings, 2 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:42 UTC (permalink / raw)
  To: gcc-patches

This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64.  The value is
encoded in the 10-bit precision field and was previously always stored
as a simple log2 value.  The challenge was to use this 10 bits to
encode the number of elements in variable-length vectors, so that
we didn't need to increase the size of the tree.

In practice the number of vector elements should always have the form
N + N * X (where X is the runtime value), and as for constant-length
vectors, N must be a power of 2 (even though X itself might not be).
The patch therefore uses the low bit to select between constant-length
and variable-length and uses the upper 9 bits to encode log2(N).
Targets without variable-length vectors continue to use the old scheme.

A new valid_vector_subparts_p function tests whether a given number
of elements can be encoded.  This is false for the vector modes that
represent an LD3 or ST3 vector triple (which we want to treat as arrays
of vectors rather than single vectors).

Most of the patch is mechanical; previous patches handled the changes
that weren't entirely straightforward.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle
	polynomial numbers of units.
	(SET_TYPE_VECTOR_SUBPARTS): Likewise.
	(valid_vector_subparts_p): New function.
	(build_vector_type): Remove temporary shim and take the number
	of units as a poly_uint64 rather than an int.
	(build_opaque_vector_type): Take the number of units as a
	poly_uint64 rather than an int.
	* tree.c (build_vector): Handle polynomial TYPE_VECTOR_SUBPARTS.
	(build_vector_from_ctor, type_hash_canon_hash): Likewise.
	(type_cache_hasher::equal, uniform_vector_p): Likewise.
	(vector_type_mode): Likewise.
	(build_vector_from_val): If the number of units isn't constant,
	use build_vec_duplicate_cst for constant operands and
	VEC_DUPLICATE_EXPR otherwise.
	(make_vector_type): Remove temporary is_constant ().
	(build_vector_type, build_opaque_vector_type): Take the number of
	units as a poly_uint64 rather than an int.
	* cfgexpand.c (expand_debug_expr): Handle polynomial
	TYPE_VECTOR_SUBPARTS.
	* expr.c (count_type_elements, store_constructor): Likewise.
	* fold-const.c (const_binop, const_unop, fold_convert_const)
	(operand_equal_p, fold_view_convert_expr, fold_vec_perm)
	(fold_ternary_loc, fold_relational_const): Likewise.
	(native_interpret_vector): Likewise.  Change the size from an
	int to an unsigned int.
	* gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial
	TYPE_VECTOR_SUBPARTS.
	(gimple_fold_indirect_ref, gimple_build_vector): Likewise.
	(gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when
	duplicating a non-constant operand into a variable-length vector.
	* match.pd: Handle polynomial TYPE_VECTOR_SUBPARTS.
	* omp-simd-clone.c (simd_clone_subparts): Likewise.
	* print-tree.c (print_node): Likewise.
	* stor-layout.c (layout_type): Likewise.
	* targhooks.c (default_builtin_vectorization_cost): Likewise.
	* tree-cfg.c (verify_gimple_comparison): Likewise.
	(verify_gimple_assign_binary): Likewise.
	(verify_gimple_assign_ternary): Likewise.
	(verify_gimple_assign_single): Likewise.
	* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
	* tree-vect-data-refs.c (vect_permute_store_chain): Likewise.
	(vect_grouped_load_supported, vect_permute_load_chain): Likewise.
	(vect_shift_permute_load_chain): Likewise.
	* tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise.
	(expand_vector_condition, optimize_vector_constructor): Likewise.
	(lower_vec_perm, get_compute_type): Likewise.
	* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
	(get_initial_defs_for_reduction, vect_transform_loop): Likewise.
	* tree-vect-patterns.c (vect_recog_bool_pattern): Likewise.
	(vect_recog_mask_conversion_pattern): Likewise.
	* tree-vect-slp.c (vect_supported_load_permutation_p): Likewise.
	(vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise.
	* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
	(get_group_load_store_type, vectorizable_mask_load_store): Likewise.
	(vectorizable_bswap, simd_clone_subparts, vectorizable_assignment)
	(vectorizable_shift, vectorizable_operation, vectorizable_store)
	(vect_gen_perm_mask_any, vectorizable_load, vect_is_simple_cond)
	(vectorizable_comparison, supportable_widening_operation): Likewise.
	(supportable_narrowing_operation): Likewise.

gcc/ada/
	* gcc-interface/utils.c (gnat_types_compatible_p): Handle
	polynomial TYPE_VECTOR_SUBPARTS.

gcc/brig/
	* brigfrontend/brig-to-generic.cc (get_unsigned_int_type): Handle
	polynomial TYPE_VECTOR_SUBPARTS.
	* brigfrontend/brig-util.h (gccbrig_type_vector_subparts): Likewise.

gcc/c-family/
	* c-common.c (vector_types_convertible_p, c_build_vec_perm_expr)
	(convert_vector_to_array_for_subscript): Handle polynomial
	TYPE_VECTOR_SUBPARTS.
	(c_common_type_for_mode): Check valid_vector_subparts_p.

gcc/c/
	* c-typeck.c (comptypes_internal, build_binary_op): Handle polynomial
	TYPE_VECTOR_SUBPARTS.

gcc/cp/
	* call.c (build_conditional_expr_1): Handle polynomial
	TYPE_VECTOR_SUBPARTS.
	* constexpr.c (cxx_fold_indirect_ref): Likewise.
	* decl.c (cp_finish_decomp): Likewise.
	* mangle.c (write_type): Likewise.
	* typeck.c (structural_comptypes): Likewise.
	(cp_build_binary_op): Likewise.
	* typeck2.c (process_init_constructor_array): Likewise.

gcc/fortran/
	* trans-types.c (gfc_type_for_mode): Check valid_vector_subparts_p.

gcc/lto/
	* lto-lang.c (lto_type_for_mode): Check valid_vector_subparts_p.
	* lto.c (hash_canonical_type): Handle polynomial TYPE_VECTOR_SUBPARTS.

gcc/go/
	* go-lang.c (go_langhook_type_for_mode): Check valid_vector_subparts_p.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 17:22:35.831905077 +0100
+++ gcc/tree.h	2017-10-23 17:25:51.773378674 +0100
@@ -2041,15 +2041,6 @@ #define TREE_VISITED(NODE) ((NODE)->base
    If set in a INTEGER_TYPE, indicates a character type.  */
 #define TYPE_STRING_FLAG(NODE) (TYPE_CHECK (NODE)->type_common.string_flag)
 
-/* For a VECTOR_TYPE, this is the number of sub-parts of the vector.  */
-#define TYPE_VECTOR_SUBPARTS(VECTOR_TYPE) \
-  (HOST_WIDE_INT_1U \
-   << VECTOR_TYPE_CHECK (VECTOR_TYPE)->type_common.precision)
-
-/* Set precision to n when we have 2^n sub-parts of the vector.  */
-#define SET_TYPE_VECTOR_SUBPARTS(VECTOR_TYPE, X) \
-  (VECTOR_TYPE_CHECK (VECTOR_TYPE)->type_common.precision = exact_log2 (X))
-
 /* Nonzero in a VECTOR_TYPE if the frontends should not emit warnings
    about missing conversions to other vector types of the same size.  */
 #define TYPE_VECTOR_OPAQUE(NODE) \
@@ -3671,6 +3662,64 @@ id_equal (const char *str, const_tree id
   return !strcmp (str, IDENTIFIER_POINTER (id));
 }
 
+/* Return the number of elements in the VECTOR_TYPE given by NODE.  */
+
+inline poly_uint64
+TYPE_VECTOR_SUBPARTS (const_tree node)
+{
+  STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 2);
+  unsigned int precision = VECTOR_TYPE_CHECK (node)->type_common.precision;
+  if (NUM_POLY_INT_COEFFS == 2)
+    {
+      poly_uint64 res = 0;
+      res.coeffs[0] = 1 << (precision / 2);
+      if (precision & 1)
+	res.coeffs[1] = 1 << (precision / 2);
+      return res;
+    }
+  else
+    return 1 << precision;
+}
+
+/* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
+   satisfy valid_vector_subparts_p.  */
+
+inline void
+SET_TYPE_VECTOR_SUBPARTS (tree node, poly_uint64 subparts)
+{
+  STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 2);
+  unsigned HOST_WIDE_INT coeff0 = subparts.coeffs[0];
+  int index = exact_log2 (coeff0);
+  gcc_assert (index >= 0);
+  if (NUM_POLY_INT_COEFFS == 2)
+    {
+      unsigned HOST_WIDE_INT coeff1 = subparts.coeffs[1];
+      gcc_assert (coeff1 == 0 || coeff1 == coeff0);
+      VECTOR_TYPE_CHECK (node)->type_common.precision
+	= index * 2 + (coeff1 != 0);
+    }
+  else
+    VECTOR_TYPE_CHECK (node)->type_common.precision = index;
+}
+
+/* Return true if we can construct vector types with the given number
+   of subparts.  */
+
+static inline bool
+valid_vector_subparts_p (poly_uint64 subparts)
+{
+  unsigned HOST_WIDE_INT coeff0 = subparts.coeffs[0];
+  if (!pow2p_hwi (coeff0))
+    return false;
+  if (NUM_POLY_INT_COEFFS == 2)
+    {
+      unsigned HOST_WIDE_INT coeff1 = subparts.coeffs[1];
+      if (coeff1 != 0 && coeff1 != coeff0)
+	return false;
+    }
+  return true;
+}
+
 #define error_mark_node			global_trees[TI_ERROR_MARK]
 
 #define intQI_type_node			global_trees[TI_INTQI_TYPE]
@@ -4108,16 +4157,10 @@ extern tree build_pointer_type (tree);
 extern tree build_reference_type_for_mode (tree, machine_mode, bool);
 extern tree build_reference_type (tree);
 extern tree build_vector_type_for_mode (tree, machine_mode);
-extern tree build_vector_type (tree innertype, int nunits);
-/* Temporary.  */
-inline tree
-build_vector_type (tree innertype, poly_uint64 nunits)
-{
-  return build_vector_type (innertype, (int) nunits.to_constant ());
-}
+extern tree build_vector_type (tree, poly_int64);
 extern tree build_truth_vector_type (poly_uint64, poly_uint64);
 extern tree build_same_sized_truth_vector_type (tree vectype);
-extern tree build_opaque_vector_type (tree innertype, int nunits);
+extern tree build_opaque_vector_type (tree, poly_int64);
 extern tree build_index_type (tree);
 extern tree build_array_type (tree, tree, bool = false);
 extern tree build_nonshared_array_type (tree, tree);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 17:25:48.625491825 +0100
+++ gcc/tree.c	2017-10-23 17:25:51.771378746 +0100
@@ -1877,7 +1877,7 @@ make_vector (unsigned len MEM_STAT_DECL)
 build_vector (tree type, vec<tree> vals MEM_STAT_DECL)
 {
   unsigned int nelts = vals.length ();
-  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
+  gcc_assert (must_eq (nelts, TYPE_VECTOR_SUBPARTS (type)));
   int over = 0;
   unsigned cnt = 0;
   tree v = make_vector (nelts);
@@ -1907,10 +1907,11 @@ build_vector (tree type, vec<tree> vals
 tree
 build_vector_from_ctor (tree type, vec<constructor_elt, va_gc> *v)
 {
-  unsigned int nelts = TYPE_VECTOR_SUBPARTS (type);
-  unsigned HOST_WIDE_INT idx;
+  unsigned HOST_WIDE_INT idx, nelts;
   tree value;
 
+  /* We can't construct a VECTOR_CST for a variable number of elements.  */
+  nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
   auto_vec<tree, 32> vec (nelts);
   FOR_EACH_CONSTRUCTOR_VALUE (v, idx, value)
     {
@@ -1928,9 +1929,9 @@ build_vector_from_ctor (tree type, vec<c
 
 /* Build a vector of type VECTYPE where all the elements are SCs.  */
 tree
-build_vector_from_val (tree vectype, tree sc) 
+build_vector_from_val (tree vectype, tree sc)
 {
-  int i, nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  unsigned HOST_WIDE_INT i, nunits;
 
   if (sc == error_mark_node)
     return sc;
@@ -1944,6 +1945,13 @@ build_vector_from_val (tree vectype, tre
   gcc_checking_assert (types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (sc)),
 					   TREE_TYPE (vectype)));
 
+  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
+    {
+      if (CONSTANT_CLASS_P (sc))
+	return build_vec_duplicate_cst (vectype, sc);
+      return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc);
+    }
+
   if (CONSTANT_CLASS_P (sc))
     {
       auto_vec<tree, 32> v (nunits);
@@ -6575,11 +6583,8 @@ type_hash_canon_hash (tree type)
       }
 
     case VECTOR_TYPE:
-      {
-	unsigned nunits = TYPE_VECTOR_SUBPARTS (type);
-	hstate.add_object (nunits);
-	break;
-      }
+      hstate.add_poly_int (TYPE_VECTOR_SUBPARTS (type));
+      break;
 
     default:
       break;
@@ -6623,7 +6628,8 @@ type_cache_hasher::equal (type_hash *a,
       return 1;
 
     case VECTOR_TYPE:
-      return TYPE_VECTOR_SUBPARTS (a->type) == TYPE_VECTOR_SUBPARTS (b->type);
+      return must_eq (TYPE_VECTOR_SUBPARTS (a->type),
+		      TYPE_VECTOR_SUBPARTS (b->type));
 
     case ENUMERAL_TYPE:
       if (TYPE_VALUES (a->type) != TYPE_VALUES (b->type)
@@ -9666,7 +9672,7 @@ make_vector_type (tree innertype, poly_i
 
   t = make_node (VECTOR_TYPE);
   TREE_TYPE (t) = mv_innertype;
-  SET_TYPE_VECTOR_SUBPARTS (t, nunits.to_constant ()); /* Temporary */
+  SET_TYPE_VECTOR_SUBPARTS (t, nunits);
   SET_TYPE_MODE (t, mode);
 
   if (TYPE_STRUCTURAL_EQUALITY_P (mv_innertype) || in_lto_p)
@@ -10582,7 +10588,7 @@ build_vector_type_for_mode (tree innerty
    a power of two.  */
 
 tree
-build_vector_type (tree innertype, int nunits)
+build_vector_type (tree innertype, poly_int64 nunits)
 {
   return make_vector_type (innertype, nunits, VOIDmode);
 }
@@ -10627,7 +10633,7 @@ build_same_sized_truth_vector_type (tree
 /* Similarly, but builds a variant type with TYPE_VECTOR_OPAQUE set.  */
 
 tree
-build_opaque_vector_type (tree innertype, int nunits)
+build_opaque_vector_type (tree innertype, poly_int64 nunits)
 {
   tree t = make_vector_type (innertype, nunits, VOIDmode);
   tree cand;
@@ -10730,7 +10736,7 @@ initializer_zerop (const_tree init)
 uniform_vector_p (const_tree vec)
 {
   tree first, t;
-  unsigned i;
+  unsigned HOST_WIDE_INT i, nelts;
 
   if (vec == NULL_TREE)
     return NULL_TREE;
@@ -10753,7 +10759,8 @@ uniform_vector_p (const_tree vec)
       return first;
     }
 
-  else if (TREE_CODE (vec) == CONSTRUCTOR)
+  else if (TREE_CODE (vec) == CONSTRUCTOR
+	   && TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec)).is_constant (&nelts))
     {
       first = error_mark_node;
 
@@ -10767,7 +10774,7 @@ uniform_vector_p (const_tree vec)
 	  if (!operand_equal_p (first, t, 0))
 	    return NULL_TREE;
         }
-      if (i != TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec)))
+      if (i != nelts)
 	return NULL_TREE;
 
       return first;
@@ -13011,8 +13018,8 @@ vector_type_mode (const_tree t)
       /* For integers, try mapping it to a same-sized scalar mode.  */
       if (is_int_mode (TREE_TYPE (t)->type_common.mode, &innermode))
 	{
-	  unsigned int size = (TYPE_VECTOR_SUBPARTS (t)
-			       * GET_MODE_BITSIZE (innermode));
+	  poly_int64 size = (TYPE_VECTOR_SUBPARTS (t)
+			     * GET_MODE_BITSIZE (innermode));
 	  scalar_int_mode mode;
 	  if (int_mode_for_size (size, 0).exists (&mode)
 	      && have_regs_of_mode[mode])
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-10-23 17:19:04.559212322 +0100
+++ gcc/cfgexpand.c	2017-10-23 17:25:51.727380328 +0100
@@ -4961,10 +4961,13 @@ expand_debug_expr (tree exp)
       else if (TREE_CODE (TREE_TYPE (exp)) == VECTOR_TYPE)
 	{
 	  unsigned i;
+	  unsigned HOST_WIDE_INT nelts;
 	  tree val;
 
-	  op0 = gen_rtx_CONCATN
-	    (mode, rtvec_alloc (TYPE_VECTOR_SUBPARTS (TREE_TYPE (exp))));
+	  if (!TYPE_VECTOR_SUBPARTS (TREE_TYPE (exp)).is_constant (&nelts))
+	    goto flag_unsupported;
+
+	  op0 = gen_rtx_CONCATN (mode, rtvec_alloc (nelts));
 
 	  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), i, val)
 	    {
@@ -4974,7 +4977,7 @@ expand_debug_expr (tree exp)
 	      XVECEXP (op0, 0, i) = op1;
 	    }
 
-	  if (i < TYPE_VECTOR_SUBPARTS (TREE_TYPE (exp)))
+	  if (i < nelts)
 	    {
 	      op1 = expand_debug_expr
 		(build_zero_cst (TREE_TYPE (TREE_TYPE (exp))));
@@ -4982,7 +4985,7 @@ expand_debug_expr (tree exp)
 	      if (!op1)
 		return NULL;
 
-	      for (; i < TYPE_VECTOR_SUBPARTS (TREE_TYPE (exp)); i++)
+	      for (; i < nelts; i++)
 		XVECEXP (op0, 0, i) = op1;
 	    }
 
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:25:38.241865064 +0100
+++ gcc/expr.c	2017-10-23 17:25:51.740379860 +0100
@@ -5847,7 +5847,13 @@ count_type_elements (const_tree type, bo
       return 2;
 
     case VECTOR_TYPE:
-      return TYPE_VECTOR_SUBPARTS (type);
+      {
+	unsigned HOST_WIDE_INT nelts;
+	if (TYPE_VECTOR_SUBPARTS (type).is_constant (&nelts))
+	  return nelts;
+	else
+	  return -1;
+      }
 
     case INTEGER_TYPE:
     case REAL_TYPE:
@@ -6594,7 +6600,8 @@ store_constructor (tree exp, rtx target,
 	HOST_WIDE_INT bitsize;
 	HOST_WIDE_INT bitpos;
 	rtvec vector = NULL;
-	unsigned n_elts;
+	poly_uint64 n_elts;
+	unsigned HOST_WIDE_INT const_n_elts;
 	alias_set_type alias;
 	bool vec_vec_init_p = false;
 	machine_mode mode = GET_MODE (target);
@@ -6619,7 +6626,9 @@ store_constructor (tree exp, rtx target,
 	  }
 
 	n_elts = TYPE_VECTOR_SUBPARTS (type);
-	if (REG_P (target) && VECTOR_MODE_P (mode))
+	if (REG_P (target)
+	    && VECTOR_MODE_P (mode)
+	    && n_elts.is_constant (&const_n_elts))
 	  {
 	    machine_mode emode = eltmode;
 
@@ -6628,14 +6637,15 @@ store_constructor (tree exp, rtx target,
 		    == VECTOR_TYPE))
 	      {
 		tree etype = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
-		gcc_assert (CONSTRUCTOR_NELTS (exp) * TYPE_VECTOR_SUBPARTS (etype)
-			    == n_elts);
+		gcc_assert (must_eq (CONSTRUCTOR_NELTS (exp)
+				     * TYPE_VECTOR_SUBPARTS (etype),
+				     n_elts));
 		emode = TYPE_MODE (etype);
 	      }
 	    icode = convert_optab_handler (vec_init_optab, mode, emode);
 	    if (icode != CODE_FOR_nothing)
 	      {
-		unsigned int i, n = n_elts;
+		unsigned int i, n = const_n_elts;
 
 		if (emode != eltmode)
 		  {
@@ -6674,7 +6684,8 @@ store_constructor (tree exp, rtx target,
 
 	    /* Clear the entire vector first if there are any missing elements,
 	       or if the incidence of zero elements is >= 75%.  */
-	    need_to_clear = (count < n_elts || 4 * zero_count >= 3 * count);
+	    need_to_clear = (may_lt (count, n_elts)
+			     || 4 * zero_count >= 3 * count);
 	  }
 
 	if (need_to_clear && may_gt (size, 0) && !vector)
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 17:22:48.984540760 +0100
+++ gcc/fold-const.c	2017-10-23 17:25:51.744379717 +0100
@@ -1645,7 +1645,7 @@ const_binop (enum tree_code code, tree t
 	in_nelts = VECTOR_CST_NELTS (arg1);
 	out_nelts = in_nelts * 2;
 	gcc_assert (in_nelts == VECTOR_CST_NELTS (arg2)
-		    && out_nelts == TYPE_VECTOR_SUBPARTS (type));
+		    && must_eq (out_nelts, TYPE_VECTOR_SUBPARTS (type)));
 
 	auto_vec<tree, 32> elts (out_nelts);
 	for (i = 0; i < out_nelts; i++)
@@ -1677,7 +1677,7 @@ const_binop (enum tree_code code, tree t
 	in_nelts = VECTOR_CST_NELTS (arg1);
 	out_nelts = in_nelts / 2;
 	gcc_assert (in_nelts == VECTOR_CST_NELTS (arg2)
-		    && out_nelts == TYPE_VECTOR_SUBPARTS (type));
+		    && must_eq (out_nelts, TYPE_VECTOR_SUBPARTS (type)));
 
 	if (code == VEC_WIDEN_MULT_LO_EXPR)
 	  scale = 0, ofs = BYTES_BIG_ENDIAN ? out_nelts : 0;
@@ -1841,7 +1841,7 @@ const_unop (enum tree_code code, tree ty
 
 	in_nelts = VECTOR_CST_NELTS (arg0);
 	out_nelts = in_nelts / 2;
-	gcc_assert (out_nelts == TYPE_VECTOR_SUBPARTS (type));
+	gcc_assert (must_eq (out_nelts, TYPE_VECTOR_SUBPARTS (type)));
 
 	unsigned int offset = 0;
 	if ((!BYTES_BIG_ENDIAN) ^ (code == VEC_UNPACK_LO_EXPR
@@ -2329,7 +2329,7 @@ fold_convert_const (enum tree_code code,
   else if (TREE_CODE (type) == VECTOR_TYPE)
     {
       if (TREE_CODE (arg1) == VECTOR_CST
-	  && TYPE_VECTOR_SUBPARTS (type) == VECTOR_CST_NELTS (arg1))
+	  && must_eq (TYPE_VECTOR_SUBPARTS (type), VECTOR_CST_NELTS (arg1)))
 	{
 	  int len = VECTOR_CST_NELTS (arg1);
 	  tree elttype = TREE_TYPE (type);
@@ -2345,8 +2345,8 @@ fold_convert_const (enum tree_code code,
 	  return build_vector (type, v);
 	}
       if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
-	  && (TYPE_VECTOR_SUBPARTS (type)
-	      == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
+	  && must_eq (TYPE_VECTOR_SUBPARTS (type),
+		      TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
 	{
 	  tree sub = fold_convert_const (code, TREE_TYPE (type),
 					 VEC_DUPLICATE_CST_ELT (arg1));
@@ -3491,8 +3491,8 @@ #define OP_SAME_WITH_NULL(N)				\
 	     We only tested element precision and modes to match.
 	     Vectors may be BLKmode and thus also check that the number of
 	     parts match.  */
-	  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))
-	      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)))
+	  if (may_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)),
+		      TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
 	    return 0;
 
 	  vec<constructor_elt, va_gc> *v0 = CONSTRUCTOR_ELTS (arg0);
@@ -7613,15 +7613,16 @@ native_interpret_complex (tree type, con
    If the buffer cannot be interpreted, return NULL_TREE.  */
 
 static tree
-native_interpret_vector (tree type, const unsigned char *ptr, int len)
+native_interpret_vector (tree type, const unsigned char *ptr, unsigned int len)
 {
   tree etype, elem;
-  int i, size, count;
+  unsigned int i, size;
+  unsigned HOST_WIDE_INT count;
 
   etype = TREE_TYPE (type);
   size = GET_MODE_SIZE (SCALAR_TYPE_MODE (etype));
-  count = TYPE_VECTOR_SUBPARTS (type);
-  if (size * count > len)
+  if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&count)
+      || size * count > len)
     return NULL_TREE;
 
   auto_vec<tree, 32> elements (count);
@@ -7707,7 +7708,8 @@ fold_view_convert_expr (tree type, tree
   tree expr_type = TREE_TYPE (expr);
   if (TREE_CODE (expr) == VEC_DUPLICATE_CST
       && VECTOR_TYPE_P (type)
-      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type)
+      && must_eq (TYPE_VECTOR_SUBPARTS (type),
+		  TYPE_VECTOR_SUBPARTS (expr_type))
       && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type)))
     {
       tree sub = fold_view_convert_expr (TREE_TYPE (type),
@@ -9025,9 +9027,9 @@ fold_vec_perm (tree type, tree arg0, tre
   bool need_ctor = false;
 
   unsigned int nelts = sel.length ();
-  gcc_assert (TYPE_VECTOR_SUBPARTS (type) == nelts
-	      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)) == nelts
-	      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)) == nelts);
+  gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
+	      && must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
+	      && must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
   if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
       || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
     return NULL_TREE;
@@ -11440,7 +11442,7 @@ fold_ternary_loc (location_t loc, enum t
 		  || TREE_CODE (arg2) == CONSTRUCTOR))
 	    {
 	      unsigned int nelts = VECTOR_CST_NELTS (arg0), i;
-	      gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
+	      gcc_assert (must_eq (nelts, TYPE_VECTOR_SUBPARTS (type)));
 	      auto_vec_perm_indices sel (nelts);
 	      for (i = 0; i < nelts; i++)
 		{
@@ -11706,7 +11708,8 @@ fold_ternary_loc (location_t loc, enum t
 	  if (n != 0
 	      && (idx % width) == 0
 	      && (n % width) == 0
-	      && ((idx + n) / width) <= TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))
+	      && must_le ((idx + n) / width,
+			  TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
 	    {
 	      idx = idx / width;
 	      n = n / width;
@@ -11783,7 +11786,7 @@ fold_ternary_loc (location_t loc, enum t
 
 	  mask2 = 2 * nelts - 1;
 	  mask = single_arg ? (nelts - 1) : mask2;
-	  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
+	  gcc_assert (must_eq (nelts, TYPE_VECTOR_SUBPARTS (type)));
 	  auto_vec_perm_indices sel (nelts);
 	  auto_vec_perm_indices sel2 (nelts);
 	  for (i = 0; i < nelts; i++)
@@ -14034,7 +14037,7 @@ fold_relational_const (enum tree_code co
 	}
       unsigned count = VECTOR_CST_NELTS (op0);
       gcc_assert (VECTOR_CST_NELTS (op1) == count
-		  && TYPE_VECTOR_SUBPARTS (type) == count);
+		  && must_eq (TYPE_VECTOR_SUBPARTS (type), count));
 
       auto_vec<tree, 32> elts (count);
       for (unsigned i = 0; i < count; i++)
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	2017-10-23 17:22:18.228825053 +0100
+++ gcc/gimple-fold.c	2017-10-23 17:25:51.747379609 +0100
@@ -5909,13 +5909,13 @@ gimple_fold_stmt_to_constant_1 (gimple *
 		}
 	      else if (TREE_CODE (rhs) == CONSTRUCTOR
 		       && TREE_CODE (TREE_TYPE (rhs)) == VECTOR_TYPE
-		       && (CONSTRUCTOR_NELTS (rhs)
-			   == TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs))))
+		       && must_eq (CONSTRUCTOR_NELTS (rhs),
+				   TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs))))
 		{
 		  unsigned i, nelts;
 		  tree val;
 
-		  nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs));
+		  nelts = CONSTRUCTOR_NELTS (rhs);
 		  auto_vec<tree, 32> vec (nelts);
 		  FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
 		    {
@@ -6761,8 +6761,8 @@ gimple_fold_indirect_ref (tree t)
             = tree_to_shwi (part_width) / BITS_PER_UNIT;
           unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
           tree index = bitsize_int (indexi);
-          if (offset / part_widthi
-	      < TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype)))
+	  if (must_lt (offset / part_widthi,
+		       TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype))))
             return fold_build3 (BIT_FIELD_REF, type, TREE_OPERAND (addr, 0),
                                 part_width, index);
 	}
@@ -7064,6 +7064,10 @@ gimple_convert_to_ptrofftype (gimple_seq
 gimple_build_vector_from_val (gimple_seq *seq, location_t loc, tree type,
 			      tree op)
 {
+  if (!TYPE_VECTOR_SUBPARTS (type).is_constant ()
+      && !CONSTANT_CLASS_P (op))
+    return gimple_build (seq, loc, VEC_DUPLICATE_EXPR, type, op);
+
   tree res, vec = build_vector_from_val (type, op);
   if (is_gimple_val (vec))
     return vec;
@@ -7086,7 +7090,7 @@ gimple_build_vector (gimple_seq *seq, lo
 		     vec<tree> elts)
 {
   unsigned int nelts = elts.length ();
-  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
+  gcc_assert (must_eq (nelts, TYPE_VECTOR_SUBPARTS (type)));
   for (unsigned int i = 0; i < nelts; ++i)
     if (!TREE_CONSTANT (elts[i]))
       {
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2017-10-23 17:22:50.031432167 +0100
+++ gcc/match.pd	2017-10-23 17:25:51.750379501 +0100
@@ -83,7 +83,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (nop_convert @0)
  (view_convert @0)
  (if (VECTOR_TYPE_P (type) && VECTOR_TYPE_P (TREE_TYPE (@0))
-      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
+      && must_eq (TYPE_VECTOR_SUBPARTS (type),
+		  TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)))
       && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE (@0))))))
 /* This one has to be last, or it shadows the others.  */
 (match (nop_convert @0)
@@ -2628,7 +2629,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (plus:c @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 integer_zerop@2)))
  (if (VECTOR_TYPE_P (type)
-      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1))
+      && must_eq (TYPE_VECTOR_SUBPARTS (type),
+		  TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))
       && (TYPE_MODE (TREE_TYPE (type))
           == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1)))))
   (minus @3 (view_convert (vec_cond @0 (negate @1) @2)))))
@@ -2637,7 +2639,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (minus @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 integer_zerop@2)))
  (if (VECTOR_TYPE_P (type)
-      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1))
+      && must_eq (TYPE_VECTOR_SUBPARTS (type),
+		  TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))
       && (TYPE_MODE (TREE_TYPE (type))
           == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1)))))
   (plus @3 (view_convert (vec_cond @0 (negate @1) @2)))))
@@ -4301,7 +4304,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
    (if (n != 0
 	&& (idx % width) == 0
 	&& (n % width) == 0
-	&& ((idx + n) / width) <= TYPE_VECTOR_SUBPARTS (TREE_TYPE (ctor)))
+	&& must_le ((idx + n) / width,
+		    TYPE_VECTOR_SUBPARTS (TREE_TYPE (ctor))))
     (with
      {
        idx = idx / width;
Index: gcc/omp-simd-clone.c
===================================================================
--- gcc/omp-simd-clone.c	2017-10-23 17:22:47.947648317 +0100
+++ gcc/omp-simd-clone.c	2017-10-23 17:25:51.751379465 +0100
@@ -57,7 +57,7 @@ Software Foundation; either version 3, o
 static unsigned HOST_WIDE_INT
 simd_clone_subparts (tree vectype)
 {
-  return TYPE_VECTOR_SUBPARTS (vectype);
+  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
 }
 
 /* Allocate a fresh `simd_clone' and return it.  NARGS is the number
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c	2017-10-23 17:11:40.246949037 +0100
+++ gcc/print-tree.c	2017-10-23 17:25:51.751379465 +0100
@@ -630,7 +630,10 @@ print_node (FILE *file, const char *pref
       else if (code == ARRAY_TYPE)
 	print_node (file, "domain", TYPE_DOMAIN (node), indent + 4);
       else if (code == VECTOR_TYPE)
-	fprintf (file, " nunits:%d", (int) TYPE_VECTOR_SUBPARTS (node));
+	{
+	  fprintf (file, " nunits:");
+	  print_dec (TYPE_VECTOR_SUBPARTS (node), file);
+	}
       else if (code == RECORD_TYPE
 	       || code == UNION_TYPE
 	       || code == QUAL_UNION_TYPE)
Index: gcc/stor-layout.c
===================================================================
--- gcc/stor-layout.c	2017-10-23 17:11:54.535862371 +0100
+++ gcc/stor-layout.c	2017-10-23 17:25:51.753379393 +0100
@@ -2267,11 +2267,9 @@ layout_type (tree type)
 
     case VECTOR_TYPE:
       {
-	int nunits = TYPE_VECTOR_SUBPARTS (type);
+	poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (type);
 	tree innertype = TREE_TYPE (type);
 
-	gcc_assert (!(nunits & (nunits - 1)));
-
 	/* Find an appropriate mode for the vector type.  */
 	if (TYPE_MODE (type) == VOIDmode)
 	  SET_TYPE_MODE (type,
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2017-10-23 17:22:32.725227332 +0100
+++ gcc/targhooks.c	2017-10-23 17:25:51.753379393 +0100
@@ -683,7 +683,7 @@ default_builtin_vectorization_cost (enum
         return 3;
 
       case vec_construct:
-	return TYPE_VECTOR_SUBPARTS (vectype) - 1;
+	return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype)) - 1;
 
       default:
         gcc_unreachable ();
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-10-23 17:20:50.883679845 +0100
+++ gcc/tree-cfg.c	2017-10-23 17:25:51.756379285 +0100
@@ -3640,7 +3640,8 @@ verify_gimple_comparison (tree type, tre
           return true;
         }
 
-      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type))
+      if (may_ne (TYPE_VECTOR_SUBPARTS (type),
+		  TYPE_VECTOR_SUBPARTS (op0_type)))
         {
           error ("invalid vector comparison resulting type");
           debug_generic_expr (type);
@@ -4070,8 +4071,8 @@ verify_gimple_assign_binary (gassign *st
       if (VECTOR_BOOLEAN_TYPE_P (lhs_type)
 	  && VECTOR_BOOLEAN_TYPE_P (rhs1_type)
 	  && types_compatible_p (rhs1_type, rhs2_type)
-	  && (TYPE_VECTOR_SUBPARTS (lhs_type)
-	      == 2 * TYPE_VECTOR_SUBPARTS (rhs1_type)))
+	  && must_eq (TYPE_VECTOR_SUBPARTS (lhs_type),
+		      2 * TYPE_VECTOR_SUBPARTS (rhs1_type)))
 	return false;
 
       /* Fallthru.  */
@@ -4221,8 +4222,8 @@ verify_gimple_assign_ternary (gassign *s
 
     case VEC_COND_EXPR:
       if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type)
-	  || TYPE_VECTOR_SUBPARTS (rhs1_type)
-	     != TYPE_VECTOR_SUBPARTS (lhs_type))
+	  || may_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
+		     TYPE_VECTOR_SUBPARTS (lhs_type)))
 	{
 	  error ("the first argument of a VEC_COND_EXPR must be of a "
 		 "boolean vector type of the same number of elements "
@@ -4268,11 +4269,12 @@ verify_gimple_assign_ternary (gassign *s
 	  return true;
 	}
 
-      if (TYPE_VECTOR_SUBPARTS (rhs1_type) != TYPE_VECTOR_SUBPARTS (rhs2_type)
-	  || TYPE_VECTOR_SUBPARTS (rhs2_type)
-	     != TYPE_VECTOR_SUBPARTS (rhs3_type)
-	  || TYPE_VECTOR_SUBPARTS (rhs3_type)
-	     != TYPE_VECTOR_SUBPARTS (lhs_type))
+      if (may_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
+		  TYPE_VECTOR_SUBPARTS (rhs2_type))
+	  || may_ne (TYPE_VECTOR_SUBPARTS (rhs2_type),
+		     TYPE_VECTOR_SUBPARTS (rhs3_type))
+	  || may_ne (TYPE_VECTOR_SUBPARTS (rhs3_type),
+		     TYPE_VECTOR_SUBPARTS (lhs_type)))
 	{
 	  error ("vectors with different element number found "
 		 "in vector permute expression");
@@ -4554,9 +4556,9 @@ verify_gimple_assign_single (gassign *st
 			  debug_generic_stmt (rhs1);
 			  return true;
 			}
-		      else if (CONSTRUCTOR_NELTS (rhs1)
-			       * TYPE_VECTOR_SUBPARTS (elt_t)
-			       != TYPE_VECTOR_SUBPARTS (rhs1_type))
+		      else if (may_ne (CONSTRUCTOR_NELTS (rhs1)
+				       * TYPE_VECTOR_SUBPARTS (elt_t),
+				       TYPE_VECTOR_SUBPARTS (rhs1_type)))
 			{
 			  error ("incorrect number of vector CONSTRUCTOR"
 				 " elements");
@@ -4571,8 +4573,8 @@ verify_gimple_assign_single (gassign *st
 		      debug_generic_stmt (rhs1);
 		      return true;
 		    }
-		  else if (CONSTRUCTOR_NELTS (rhs1)
-			   > TYPE_VECTOR_SUBPARTS (rhs1_type))
+		  else if (may_gt (CONSTRUCTOR_NELTS (rhs1),
+				   TYPE_VECTOR_SUBPARTS (rhs1_type)))
 		    {
 		      error ("incorrect number of vector CONSTRUCTOR elements");
 		      debug_generic_stmt (rhs1);
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2017-10-23 17:20:50.883679845 +0100
+++ gcc/tree-ssa-forwprop.c	2017-10-23 17:25:51.756379285 +0100
@@ -1948,7 +1948,8 @@ simplify_vector_constructor (gimple_stmt
   gimple *stmt = gsi_stmt (*gsi);
   gimple *def_stmt;
   tree op, op2, orig, type, elem_type;
-  unsigned elem_size, nelts, i;
+  unsigned elem_size, i;
+  unsigned HOST_WIDE_INT nelts;
   enum tree_code code, conv_code;
   constructor_elt *elt;
   bool maybe_ident;
@@ -1959,7 +1960,8 @@ simplify_vector_constructor (gimple_stmt
   type = TREE_TYPE (op);
   gcc_checking_assert (TREE_CODE (type) == VECTOR_TYPE);
 
-  nelts = TYPE_VECTOR_SUBPARTS (type);
+  if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&nelts))
+    return false;
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
@@ -2031,8 +2033,8 @@ simplify_vector_constructor (gimple_stmt
     return false;
 
   if (! VECTOR_TYPE_P (TREE_TYPE (orig))
-      || (TYPE_VECTOR_SUBPARTS (type)
-	  != TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig))))
+      || may_ne (TYPE_VECTOR_SUBPARTS (type),
+		 TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig))))
     return false;
 
   tree tem;
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 17:25:50.361429427 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:25:51.758379213 +0100
@@ -4743,7 +4743,7 @@ vect_permute_store_chain (vec<tree> dr_c
   if (length == 3)
     {
       /* vect_grouped_store_supported ensures that this is constant.  */
-      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype);
+      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
       unsigned int j0 = 0, j1 = 0, j2 = 0;
 
       auto_vec_perm_indices sel (nelt);
@@ -4807,7 +4807,7 @@ vect_permute_store_chain (vec<tree> dr_c
       gcc_assert (pow2p_hwi (length));
 
       /* vect_grouped_store_supported ensures that this is constant.  */
-      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype);
+      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
       auto_vec_perm_indices sel (nelt);
       sel.quick_grow (nelt);
       for (i = 0, n = nelt / 2; i < n; i++)
@@ -5140,7 +5140,7 @@ vect_grouped_load_supported (tree vectyp
      that leaves unused vector loads around punt - we at least create
      very sub-optimal code in that case (and blow up memory,
      see PR65518).  */
-  if (single_element_p && count > TYPE_VECTOR_SUBPARTS (vectype))
+  if (single_element_p && may_gt (count, TYPE_VECTOR_SUBPARTS (vectype)))
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5333,7 +5333,7 @@ vect_permute_load_chain (vec<tree> dr_ch
   if (length == 3)
     {
       /* vect_grouped_load_supported ensures that this is constant.  */
-      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
+      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
       unsigned int k;
 
       auto_vec_perm_indices sel (nelt);
@@ -5384,7 +5384,7 @@ vect_permute_load_chain (vec<tree> dr_ch
       gcc_assert (pow2p_hwi (length));
 
       /* vect_grouped_load_supported ensures that this is constant.  */
-      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
+      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
       auto_vec_perm_indices sel (nelt);
       sel.quick_grow (nelt);
       for (i = 0; i < nelt; ++i)
@@ -5525,12 +5525,12 @@ vect_shift_permute_load_chain (vec<tree>
 
   tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt));
   unsigned int i;
-  unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
 
-  unsigned HOST_WIDE_INT vf;
-  if (!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&vf))
+  unsigned HOST_WIDE_INT nelt, vf;
+  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nelt)
+      || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&vf))
     /* Not supported for variable-length vectors.  */
     return false;
 
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-10-23 17:25:48.623491897 +0100
+++ gcc/tree-vect-generic.c	2017-10-23 17:25:51.759379177 +0100
@@ -48,7 +48,7 @@ static void expand_vector_operations_1 (
 static unsigned int
 nunits_for_known_piecewise_op (const_tree type)
 {
-  return TYPE_VECTOR_SUBPARTS (type);
+  return TYPE_VECTOR_SUBPARTS (type).to_constant ();
 }
 
 /* Return true if TYPE1 has more elements than TYPE2, where either
@@ -916,9 +916,9 @@ expand_vector_condition (gimple_stmt_ite
      Similarly for vbfld_10 instead of x_2 < y_3.  */
   if (VECTOR_BOOLEAN_TYPE_P (type)
       && SCALAR_INT_MODE_P (TYPE_MODE (type))
-      && (GET_MODE_BITSIZE (TYPE_MODE (type))
-	  < (TYPE_VECTOR_SUBPARTS (type)
-	     * GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (type)))))
+      && must_lt (GET_MODE_BITSIZE (TYPE_MODE (type)),
+		  TYPE_VECTOR_SUBPARTS (type)
+		  * GET_MODE_BITSIZE (SCALAR_TYPE_MODE (TREE_TYPE (type))))
       && (a_is_comparison
 	  ? useless_type_conversion_p (type, TREE_TYPE (a))
 	  : expand_vec_cmp_expr_p (TREE_TYPE (a1), type, TREE_CODE (a))))
@@ -1083,14 +1083,17 @@ optimize_vector_constructor (gimple_stmt
   tree lhs = gimple_assign_lhs (stmt);
   tree rhs = gimple_assign_rhs1 (stmt);
   tree type = TREE_TYPE (rhs);
-  unsigned int i, j, nelts = TYPE_VECTOR_SUBPARTS (type);
+  unsigned int i, j;
+  unsigned HOST_WIDE_INT nelts;
   bool all_same = true;
   constructor_elt *elt;
   gimple *g;
   tree base = NULL_TREE;
   optab op;
 
-  if (nelts <= 2 || CONSTRUCTOR_NELTS (rhs) != nelts)
+  if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&nelts)
+      || nelts <= 2
+      || CONSTRUCTOR_NELTS (rhs) != nelts)
     return;
   op = optab_for_tree_code (PLUS_EXPR, type, optab_default);
   if (op == unknown_optab
@@ -1302,7 +1305,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
   tree mask_type = TREE_TYPE (mask);
   tree vect_elt_type = TREE_TYPE (vect_type);
   tree mask_elt_type = TREE_TYPE (mask_type);
-  unsigned int elements = TYPE_VECTOR_SUBPARTS (vect_type);
+  unsigned HOST_WIDE_INT elements;
   vec<constructor_elt, va_gc> *v;
   tree constr, t, si, i_val;
   tree vec0tmp = NULL_TREE, vec1tmp = NULL_TREE, masktmp = NULL_TREE;
@@ -1310,6 +1313,9 @@ lower_vec_perm (gimple_stmt_iterator *gs
   location_t loc = gimple_location (gsi_stmt (*gsi));
   unsigned i;
 
+  if (!TYPE_VECTOR_SUBPARTS (vect_type).is_constant (&elements))
+    return;
+
   if (TREE_CODE (mask) == SSA_NAME)
     {
       gimple *def_stmt = SSA_NAME_DEF_STMT (mask);
@@ -1467,7 +1473,7 @@ get_compute_type (enum tree_code code, o
 	= type_for_widest_vector_mode (TREE_TYPE (type), op);
       if (vector_compute_type != NULL_TREE
 	  && subparts_gt (compute_type, vector_compute_type)
-	  && TYPE_VECTOR_SUBPARTS (vector_compute_type) > 1
+	  && may_ne (TYPE_VECTOR_SUBPARTS (vector_compute_type), 1U)
 	  && (optab_handler (op, TYPE_MODE (vector_compute_type))
 	      != CODE_FOR_nothing))
 	compute_type = vector_compute_type;
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-10-23 17:25:48.624491861 +0100
+++ gcc/tree-vect-loop.c	2017-10-23 17:25:51.761379105 +0100
@@ -255,9 +255,11 @@ vect_determine_vectorization_factor (loo
 		}
 
 	      if (dump_enabled_p ())
-		dump_printf_loc (MSG_NOTE, vect_location,
-				 "nunits = " HOST_WIDE_INT_PRINT_DEC "\n",
-                                 TYPE_VECTOR_SUBPARTS (vectype));
+		{
+		  dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");
+		  dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (vectype));
+		  dump_printf (MSG_NOTE, "\n");
+		}
 
 	      vect_update_max_nunits (&vectorization_factor, vectype);
 	    }
@@ -548,9 +550,11 @@ vect_determine_vectorization_factor (loo
 	    }
 
 	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, vect_location,
-			     "nunits = " HOST_WIDE_INT_PRINT_DEC "\n",
-			     TYPE_VECTOR_SUBPARTS (vf_vectype));
+	    {
+	      dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");
+	      dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (vf_vectype));
+	      dump_printf (MSG_NOTE, "\n");
+	    }
 
 	  vect_update_max_nunits (&vectorization_factor, vf_vectype);
 
@@ -632,8 +636,8 @@ vect_determine_vectorization_factor (loo
 
 	      if (!mask_type)
 		mask_type = vectype;
-	      else if (TYPE_VECTOR_SUBPARTS (mask_type)
-		       != TYPE_VECTOR_SUBPARTS (vectype))
+	      else if (may_ne (TYPE_VECTOR_SUBPARTS (mask_type),
+			       TYPE_VECTOR_SUBPARTS (vectype)))
 		{
 		  if (dump_enabled_p ())
 		    {
@@ -4152,7 +4156,7 @@ get_initial_defs_for_reduction (slp_tree
   scalar_type = TREE_TYPE (vector_type);
   /* vectorizable_reduction has already rejected SLP reductions on
      variable-length vectors.  */
-  nunits = TYPE_VECTOR_SUBPARTS (vector_type);
+  nunits = TYPE_VECTOR_SUBPARTS (vector_type).to_constant ();
 
   gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def);
 
@@ -7672,9 +7676,8 @@ vect_transform_loop (loop_vec_info loop_
 
 	  if (STMT_VINFO_VECTYPE (stmt_info))
 	    {
-	      unsigned int nunits
-		= (unsigned int)
-		  TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
+	      poly_uint64 nunits
+		= TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
 	      if (!STMT_SLP_TYPE (stmt_info)
 		  && may_ne (nunits, vf)
 		  && dump_enabled_p ())
Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c	2017-10-10 17:55:22.109175458 +0100
+++ gcc/tree-vect-patterns.c	2017-10-23 17:25:51.763379034 +0100
@@ -3714,8 +3714,9 @@ vect_recog_bool_pattern (vec<gimple *> *
          vectorized matches the vector type of the result in
 	 size and number of elements.  */
       unsigned prec
-	= wi::udiv_trunc (wi::to_wide (TYPE_SIZE (vectype)),
-			  TYPE_VECTOR_SUBPARTS (vectype)).to_uhwi ();
+	= vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vectype)),
+			       TYPE_VECTOR_SUBPARTS (vectype));
+
       tree type
 	= build_nonstandard_integer_type (prec,
 					  TYPE_UNSIGNED (TREE_TYPE (var)));
@@ -3898,7 +3899,8 @@ vect_recog_mask_conversion_pattern (vec<
       vectype2 = get_mask_type_for_scalar_type (rhs1_type);
 
       if (!vectype1 || !vectype2
-	  || TYPE_VECTOR_SUBPARTS (vectype1) == TYPE_VECTOR_SUBPARTS (vectype2))
+	  || must_eq (TYPE_VECTOR_SUBPARTS (vectype1),
+		      TYPE_VECTOR_SUBPARTS (vectype2)))
 	return NULL;
 
       tmp = build_mask_conversion (rhs1, vectype1, stmt_vinfo, vinfo);
@@ -3973,7 +3975,8 @@ vect_recog_mask_conversion_pattern (vec<
       vectype2 = get_mask_type_for_scalar_type (rhs1_type);
 
       if (!vectype1 || !vectype2
-	  || TYPE_VECTOR_SUBPARTS (vectype1) == TYPE_VECTOR_SUBPARTS (vectype2))
+	  || must_eq (TYPE_VECTOR_SUBPARTS (vectype1),
+		      TYPE_VECTOR_SUBPARTS (vectype2)))
 	return NULL;
 
       /* If rhs1 is a comparison we need to move it into a
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-10-23 17:22:43.865071801 +0100
+++ gcc/tree-vect-slp.c	2017-10-23 17:25:51.764378998 +0100
@@ -1621,15 +1621,16 @@ vect_supported_load_permutation_p (slp_i
 	      stmt_vec_info group_info
 		= vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (node)[0]);
 	      group_info = vinfo_for_stmt (GROUP_FIRST_ELEMENT (group_info));
-	      unsigned nunits
-		= TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (group_info));
+	      unsigned HOST_WIDE_INT nunits;
 	      unsigned k, maxk = 0;
 	      FOR_EACH_VEC_ELT (SLP_TREE_LOAD_PERMUTATION (node), j, k)
 		if (k > maxk)
 		  maxk = k;
 	      /* In BB vectorization we may not actually use a loaded vector
 		 accessing elements in excess of GROUP_SIZE.  */
-	      if (maxk >= (GROUP_SIZE (group_info) & ~(nunits - 1)))
+	      tree vectype = STMT_VINFO_VECTYPE (group_info);
+	      if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits)
+		  || maxk >= (GROUP_SIZE (group_info) & ~(nunits - 1)))
 		{
 		  dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 				   "BB vectorization with gaps at the end of "
@@ -3243,7 +3244,7 @@ vect_get_constant_vectors (tree op, slp_
   else
     vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
   /* Enforced by vect_get_and_check_slp_defs.  */
-  nunits = TYPE_VECTOR_SUBPARTS (vector_type);
+  nunits = TYPE_VECTOR_SUBPARTS (vector_type).to_constant ();
 
   if (STMT_VINFO_DATA_REF (stmt_vinfo))
     {
@@ -3600,12 +3601,12 @@ vect_transform_slp_perm_load (slp_tree n
   gimple *stmt = SLP_TREE_SCALAR_STMTS (node)[0];
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   tree mask_element_type = NULL_TREE, mask_type;
-  int nunits, vec_index = 0;
+  int vec_index = 0;
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
   int group_size = SLP_INSTANCE_GROUP_SIZE (slp_node_instance);
-  int mask_element;
+  unsigned int mask_element;
   machine_mode mode;
-  unsigned HOST_WIDE_INT const_vf;
+  unsigned HOST_WIDE_INT nunits, const_vf;
 
   if (!STMT_VINFO_GROUPED_ACCESS (stmt_info))
     return false;
@@ -3615,8 +3616,10 @@ vect_transform_slp_perm_load (slp_tree n
   mode = TYPE_MODE (vectype);
 
   /* At the moment, all permutations are represented using per-element
-     indices, so we can't cope with variable vectorization factors.  */
-  if (!vf.is_constant (&const_vf))
+     indices, so we can't cope with variable vector lengths or
+     vectorization factors.  */
+  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits)
+      || !vf.is_constant (&const_vf))
     return false;
 
   /* The generic VEC_PERM_EXPR code always uses an integral type of the
@@ -3624,7 +3627,6 @@ vect_transform_slp_perm_load (slp_tree n
   mask_element_type = lang_hooks.types.type_for_mode
     (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
   mask_type = get_vectype_for_scalar_type (mask_element_type);
-  nunits = TYPE_VECTOR_SUBPARTS (vectype);
   auto_vec_perm_indices mask (nunits);
   mask.quick_grow (nunits);
 
@@ -3654,7 +3656,7 @@ vect_transform_slp_perm_load (slp_tree n
      {c2,a3,b3,c3}.  */
 
   int vect_stmts_counter = 0;
-  int index = 0;
+  unsigned int index = 0;
   int first_vec_index = -1;
   int second_vec_index = -1;
   bool noop_p = true;
@@ -3664,8 +3666,8 @@ vect_transform_slp_perm_load (slp_tree n
     {
       for (int k = 0; k < group_size; k++)
 	{
-	  int i = (SLP_TREE_LOAD_PERMUTATION (node)[k]
-		   + j * STMT_VINFO_GROUP_SIZE (stmt_info));
+	  unsigned int i = (SLP_TREE_LOAD_PERMUTATION (node)[k]
+			    + j * STMT_VINFO_GROUP_SIZE (stmt_info));
 	  vec_index = i / nunits;
 	  mask_element = i % nunits;
 	  if (vec_index == first_vec_index
@@ -3693,8 +3695,7 @@ vect_transform_slp_perm_load (slp_tree n
 	      return false;
 	    }
 
-	  gcc_assert (mask_element >= 0
-		      && mask_element < 2 * nunits);
+	  gcc_assert (mask_element < 2 * nunits);
 	  if (mask_element != index)
 	    noop_p = false;
 	  mask[index++] = mask_element;
@@ -3727,7 +3728,7 @@ vect_transform_slp_perm_load (slp_tree n
 		  if (! noop_p)
 		    {
 		      auto_vec<tree, 32> mask_elts (nunits);
-		      for (int l = 0; l < nunits; ++l)
+		      for (unsigned int l = 0; l < nunits; ++l)
 			mask_elts.quick_push (build_int_cst (mask_element_type,
 							     mask[l]));
 		      mask_vec = build_vector (mask_type, mask_elts);
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-10-23 17:22:41.879277786 +0100
+++ gcc/tree-vect-stmts.c	2017-10-23 17:25:51.767378890 +0100
@@ -1713,9 +1713,10 @@ compare_step_with_zero (gimple *stmt)
 static tree
 perm_mask_for_reverse (tree vectype)
 {
-  int i, nunits;
+  unsigned HOST_WIDE_INT i, nunits;
 
-  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
+    return NULL_TREE;
 
   auto_vec_perm_indices sel (nunits);
   for (i = 0; i < nunits; ++i)
@@ -1750,7 +1751,7 @@ get_group_load_store_type (gimple *stmt,
   bool single_element_p = (stmt == first_stmt
 			   && !GROUP_NEXT_ELEMENT (stmt_info));
   unsigned HOST_WIDE_INT gap = GROUP_GAP (vinfo_for_stmt (first_stmt));
-  unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
   /* True if the vectorized statements would access beyond the last
      statement in the group.  */
@@ -1774,7 +1775,7 @@ get_group_load_store_type (gimple *stmt,
 	  /* Try to use consecutive accesses of GROUP_SIZE elements,
 	     separated by the stride, until we have a complete vector.
 	     Fall back to scalar accesses if that isn't possible.  */
-	  if (nunits % group_size == 0)
+	  if (multiple_p (nunits, group_size))
 	    *memory_access_type = VMAT_STRIDED_SLP;
 	  else
 	    *memory_access_type = VMAT_ELEMENTWISE;
@@ -2102,7 +2103,8 @@ vectorizable_mask_load_store (gimple *st
     mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
 
   if (!mask_vectype || !VECTOR_BOOLEAN_TYPE_P (mask_vectype)
-      || TYPE_VECTOR_SUBPARTS (mask_vectype) != TYPE_VECTOR_SUBPARTS (vectype))
+      || may_ne (TYPE_VECTOR_SUBPARTS (mask_vectype),
+		 TYPE_VECTOR_SUBPARTS (vectype)))
     return false;
 
   if (gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
@@ -2255,8 +2257,8 @@ vectorizable_mask_load_store (gimple *st
 
 	  if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
 	    {
-	      gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op))
-			  == TYPE_VECTOR_SUBPARTS (idxtype));
+	      gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
+				   TYPE_VECTOR_SUBPARTS (idxtype)));
 	      var = vect_get_new_ssa_name (idxtype, vect_simple_var);
 	      op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
 	      new_stmt
@@ -2281,8 +2283,9 @@ vectorizable_mask_load_store (gimple *st
 	      mask_op = vec_mask;
 	      if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask)))
 		{
-		  gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op))
-			      == TYPE_VECTOR_SUBPARTS (masktype));
+		  gcc_assert
+		    (must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op)),
+			      TYPE_VECTOR_SUBPARTS (masktype)));
 		  var = vect_get_new_ssa_name (masktype, vect_simple_var);
 		  mask_op = build1 (VIEW_CONVERT_EXPR, masktype, mask_op);
 		  new_stmt
@@ -2298,8 +2301,8 @@ vectorizable_mask_load_store (gimple *st
 
 	  if (!useless_type_conversion_p (vectype, rettype))
 	    {
-	      gcc_assert (TYPE_VECTOR_SUBPARTS (vectype)
-			  == TYPE_VECTOR_SUBPARTS (rettype));
+	      gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (vectype),
+				   TYPE_VECTOR_SUBPARTS (rettype)));
 	      op = vect_get_new_ssa_name (rettype, vect_simple_var);
 	      gimple_call_set_lhs (new_stmt, op);
 	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
@@ -2493,11 +2496,14 @@ vectorizable_bswap (gimple *stmt, gimple
   tree op, vectype;
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
-  unsigned ncopies, nunits;
+  unsigned ncopies;
+  unsigned HOST_WIDE_INT nunits, num_bytes;
 
   op = gimple_call_arg (stmt, 0);
   vectype = STMT_VINFO_VECTYPE (stmt_info);
-  nunits = TYPE_VECTOR_SUBPARTS (vectype);
+
+  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
+    return false;
 
   /* Multiple types in SLP are handled by creating the appropriate number of
      vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
@@ -2513,7 +2519,9 @@ vectorizable_bswap (gimple *stmt, gimple
   if (! char_vectype)
     return false;
 
-  unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
+  if (!TYPE_VECTOR_SUBPARTS (char_vectype).is_constant (&num_bytes))
+    return false;
+
   unsigned word_bytes = num_bytes / nunits;
 
   auto_vec_perm_indices elts (num_bytes);
@@ -3213,7 +3221,7 @@ vect_simd_lane_linear (tree op, struct l
 static unsigned HOST_WIDE_INT
 simd_clone_subparts (tree vectype)
 {
-  return TYPE_VECTOR_SUBPARTS (vectype);
+  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
 }
 
 /* Function vectorizable_simd_clone_call.
@@ -4732,7 +4740,7 @@ vectorizable_assignment (gimple *stmt, g
     op = TREE_OPERAND (op, 0);
 
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
-  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
   /* Multiple types in SLP are handled by creating the appropriate number of
      vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
@@ -4757,7 +4765,7 @@ vectorizable_assignment (gimple *stmt, g
   if ((CONVERT_EXPR_CODE_P (code)
        || code == VIEW_CONVERT_EXPR)
       && (!vectype_in
-	  || TYPE_VECTOR_SUBPARTS (vectype_in) != nunits
+	  || may_ne (TYPE_VECTOR_SUBPARTS (vectype_in), nunits)
 	  || (GET_MODE_SIZE (TYPE_MODE (vectype))
 	      != GET_MODE_SIZE (TYPE_MODE (vectype_in)))))
     return false;
@@ -4906,8 +4914,8 @@ vectorizable_shift (gimple *stmt, gimple
   int ndts = 2;
   gimple *new_stmt = NULL;
   stmt_vec_info prev_stmt_info;
-  int nunits_in;
-  int nunits_out;
+  poly_uint64 nunits_in;
+  poly_uint64 nunits_out;
   tree vectype_out;
   tree op1_vectype;
   int ncopies;
@@ -4974,7 +4982,7 @@ vectorizable_shift (gimple *stmt, gimple
 
   nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype);
-  if (nunits_out != nunits_in)
+  if (may_ne (nunits_out, nunits_in))
     return false;
 
   op1 = gimple_assign_rhs2 (stmt);
@@ -5274,8 +5282,8 @@ vectorizable_operation (gimple *stmt, gi
   int ndts = 3;
   gimple *new_stmt = NULL;
   stmt_vec_info prev_stmt_info;
-  int nunits_in;
-  int nunits_out;
+  poly_uint64 nunits_in;
+  poly_uint64 nunits_out;
   tree vectype_out;
   int ncopies;
   int j, i;
@@ -5385,7 +5393,7 @@ vectorizable_operation (gimple *stmt, gi
 
   nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
   nunits_in = TYPE_VECTOR_SUBPARTS (vectype);
-  if (nunits_out != nunits_in)
+  if (may_ne (nunits_out, nunits_in))
     return false;
 
   if (op_type == binary_op || op_type == ternary_op)
@@ -5937,8 +5945,8 @@ vectorizable_store (gimple *stmt, gimple
 
 	  if (!useless_type_conversion_p (srctype, TREE_TYPE (src)))
 	    {
-	      gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src))
-			  == TYPE_VECTOR_SUBPARTS (srctype));
+	      gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src)),
+				   TYPE_VECTOR_SUBPARTS (srctype)));
 	      var = vect_get_new_ssa_name (srctype, vect_simple_var);
 	      src = build1 (VIEW_CONVERT_EXPR, srctype, src);
 	      new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src);
@@ -5948,8 +5956,8 @@ vectorizable_store (gimple *stmt, gimple
 
 	  if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
 	    {
-	      gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op))
-			  == TYPE_VECTOR_SUBPARTS (idxtype));
+	      gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
+				   TYPE_VECTOR_SUBPARTS (idxtype)));
 	      var = vect_get_new_ssa_name (idxtype, vect_simple_var);
 	      op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
 	      new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op);
@@ -6554,7 +6562,7 @@ vect_gen_perm_mask_any (tree vectype, ve
   tree mask_elt_type, mask_type, mask_vec;
 
   unsigned int nunits = sel.length ();
-  gcc_checking_assert (nunits == TYPE_VECTOR_SUBPARTS (vectype));
+  gcc_checking_assert (must_eq (nunits, TYPE_VECTOR_SUBPARTS (vectype)));
 
   mask_elt_type = lang_hooks.types.type_for_mode
     (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
@@ -6993,8 +7001,8 @@ vectorizable_load (gimple *stmt, gimple_
 
 	  if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
 	    {
-	      gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op))
-			  == TYPE_VECTOR_SUBPARTS (idxtype));
+	      gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
+				   TYPE_VECTOR_SUBPARTS (idxtype)));
 	      var = vect_get_new_ssa_name (idxtype, vect_simple_var);
 	      op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
 	      new_stmt
@@ -7008,8 +7016,8 @@ vectorizable_load (gimple *stmt, gimple_
 
 	  if (!useless_type_conversion_p (vectype, rettype))
 	    {
-	      gcc_assert (TYPE_VECTOR_SUBPARTS (vectype)
-			  == TYPE_VECTOR_SUBPARTS (rettype));
+	      gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (vectype),
+				   TYPE_VECTOR_SUBPARTS (rettype)));
 	      op = vect_get_new_ssa_name (rettype, vect_simple_var);
 	      gimple_call_set_lhs (new_stmt, op);
 	      vect_finish_stmt_generation (stmt, new_stmt, gsi);
@@ -7905,7 +7913,8 @@ vect_is_simple_cond (tree cond, vec_info
     return false;
 
   if (vectype1 && vectype2
-      && TYPE_VECTOR_SUBPARTS (vectype1) != TYPE_VECTOR_SUBPARTS (vectype2))
+      && may_ne (TYPE_VECTOR_SUBPARTS (vectype1),
+		 TYPE_VECTOR_SUBPARTS (vectype2)))
     return false;
 
   *comp_vectype = vectype1 ? vectype1 : vectype2;
@@ -8308,7 +8317,7 @@ vectorizable_comparison (gimple *stmt, g
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
   enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type};
   int ndts = 2;
-  unsigned nunits;
+  poly_uint64 nunits;
   int ncopies;
   enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
   stmt_vec_info prev_stmt_info = NULL;
@@ -8368,7 +8377,8 @@ vectorizable_comparison (gimple *stmt, g
     return false;
 
   if (vectype1 && vectype2
-      && TYPE_VECTOR_SUBPARTS (vectype1) != TYPE_VECTOR_SUBPARTS (vectype2))
+      && may_ne (TYPE_VECTOR_SUBPARTS (vectype1),
+		 TYPE_VECTOR_SUBPARTS (vectype2)))
     return false;
 
   vectype = vectype1 ? vectype1 : vectype2;
@@ -8377,10 +8387,10 @@ vectorizable_comparison (gimple *stmt, g
   if (!vectype)
     {
       vectype = get_vectype_for_scalar_type (TREE_TYPE (rhs1));
-      if (TYPE_VECTOR_SUBPARTS (vectype) != nunits)
+      if (may_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits))
 	return false;
     }
-  else if (nunits != TYPE_VECTOR_SUBPARTS (vectype))
+  else if (may_ne (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
     return false;
 
   /* Can't compare mask and non-mask types.  */
@@ -9611,8 +9621,8 @@ supportable_widening_operation (enum tre
 	 vector types having the same QImode.  Thus we
 	 add additional check for elements number.  */
     return (!VECTOR_BOOLEAN_TYPE_P (vectype)
-	    || (TYPE_VECTOR_SUBPARTS (vectype) / 2
-		== TYPE_VECTOR_SUBPARTS (wide_vectype)));
+	    || must_eq (TYPE_VECTOR_SUBPARTS (vectype),
+			TYPE_VECTOR_SUBPARTS (wide_vectype) * 2));
 
   /* Check if it's a multi-step conversion that can be done using intermediate
      types.  */
@@ -9633,8 +9643,10 @@ supportable_widening_operation (enum tre
       intermediate_mode = insn_data[icode1].operand[0].mode;
       if (VECTOR_BOOLEAN_TYPE_P (prev_type))
 	{
+	  poly_uint64 intermediate_nelts
+	    = exact_div (TYPE_VECTOR_SUBPARTS (prev_type), 2);
 	  intermediate_type
-	    = build_truth_vector_type (TYPE_VECTOR_SUBPARTS (prev_type) / 2,
+	    = build_truth_vector_type (intermediate_nelts,
 				       current_vector_size);
 	  if (intermediate_mode != TYPE_MODE (intermediate_type))
 	    return false;
@@ -9664,8 +9676,8 @@ supportable_widening_operation (enum tre
       if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
 	  && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
 	return (!VECTOR_BOOLEAN_TYPE_P (vectype)
-		|| (TYPE_VECTOR_SUBPARTS (intermediate_type) / 2
-		    == TYPE_VECTOR_SUBPARTS (wide_vectype)));
+		|| must_eq (TYPE_VECTOR_SUBPARTS (intermediate_type),
+			    TYPE_VECTOR_SUBPARTS (wide_vectype) * 2));
 
       prev_type = intermediate_type;
       prev_mode = intermediate_mode;
@@ -9753,8 +9765,8 @@ supportable_narrowing_operation (enum tr
        vector types having the same QImode.  Thus we
        add additional check for elements number.  */
     return (!VECTOR_BOOLEAN_TYPE_P (vectype)
-	    || (TYPE_VECTOR_SUBPARTS (vectype) * 2
-		== TYPE_VECTOR_SUBPARTS (narrow_vectype)));
+	    || must_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,
+			TYPE_VECTOR_SUBPARTS (narrow_vectype)));
 
   /* Check if it's a multi-step conversion that can be done using intermediate
      types.  */
@@ -9820,8 +9832,8 @@ supportable_narrowing_operation (enum tr
 
       if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype))
 	return (!VECTOR_BOOLEAN_TYPE_P (vectype)
-		|| (TYPE_VECTOR_SUBPARTS (intermediate_type) * 2
-		    == TYPE_VECTOR_SUBPARTS (narrow_vectype)));
+		|| must_eq (TYPE_VECTOR_SUBPARTS (intermediate_type) * 2,
+			    TYPE_VECTOR_SUBPARTS (narrow_vectype)));
 
       prev_mode = intermediate_mode;
       prev_type = intermediate_type;
Index: gcc/ada/gcc-interface/utils.c
===================================================================
--- gcc/ada/gcc-interface/utils.c	2017-10-23 11:41:24.988650286 +0100
+++ gcc/ada/gcc-interface/utils.c	2017-10-23 17:25:51.723380471 +0100
@@ -3528,7 +3528,7 @@ gnat_types_compatible_p (tree t1, tree t
   /* Vector types are also compatible if they have the same number of subparts
      and the same form of (scalar) element type.  */
   if (code == VECTOR_TYPE
-      && TYPE_VECTOR_SUBPARTS (t1) == TYPE_VECTOR_SUBPARTS (t2)
+      && must_eq (TYPE_VECTOR_SUBPARTS (t1), TYPE_VECTOR_SUBPARTS (t2))
       && TREE_CODE (TREE_TYPE (t1)) == TREE_CODE (TREE_TYPE (t2))
       && TYPE_PRECISION (TREE_TYPE (t1)) == TYPE_PRECISION (TREE_TYPE (t2)))
     return 1;
Index: gcc/brig/brigfrontend/brig-to-generic.cc
===================================================================
--- gcc/brig/brigfrontend/brig-to-generic.cc	2017-10-10 16:57:41.296192291 +0100
+++ gcc/brig/brigfrontend/brig-to-generic.cc	2017-10-23 17:25:51.724380435 +0100
@@ -869,7 +869,7 @@ get_unsigned_int_type (tree original_typ
     {
       size_t esize
 	= int_size_in_bytes (TREE_TYPE (original_type)) * BITS_PER_UNIT;
-      size_t ecount = TYPE_VECTOR_SUBPARTS (original_type);
+      poly_uint64 ecount = TYPE_VECTOR_SUBPARTS (original_type);
       return build_vector_type (build_nonstandard_integer_type (esize, true),
 				ecount);
     }
Index: gcc/brig/brigfrontend/brig-util.h
===================================================================
--- gcc/brig/brigfrontend/brig-util.h	2017-10-23 17:22:46.882758777 +0100
+++ gcc/brig/brigfrontend/brig-util.h	2017-10-23 17:25:51.724380435 +0100
@@ -81,7 +81,7 @@ bool hsa_type_packed_p (BrigType16_t typ
 inline unsigned HOST_WIDE_INT
 gccbrig_type_vector_subparts (const_tree type)
 {
-  return TYPE_VECTOR_SUBPARTS (type);
+  return TYPE_VECTOR_SUBPARTS (type).to_constant ();
 }
 
 #endif
Index: gcc/c-family/c-common.c
===================================================================
--- gcc/c-family/c-common.c	2017-10-23 11:41:23.219573771 +0100
+++ gcc/c-family/c-common.c	2017-10-23 17:25:51.725380399 +0100
@@ -942,15 +942,16 @@ vector_types_convertible_p (const_tree t
 
   convertible_lax =
     (tree_int_cst_equal (TYPE_SIZE (t1), TYPE_SIZE (t2))
-     && (TREE_CODE (TREE_TYPE (t1)) != REAL_TYPE ||
-	 TYPE_VECTOR_SUBPARTS (t1) == TYPE_VECTOR_SUBPARTS (t2))
+     && (TREE_CODE (TREE_TYPE (t1)) != REAL_TYPE
+	 || must_eq (TYPE_VECTOR_SUBPARTS (t1),
+		     TYPE_VECTOR_SUBPARTS (t2)))
      && (INTEGRAL_TYPE_P (TREE_TYPE (t1))
 	 == INTEGRAL_TYPE_P (TREE_TYPE (t2))));
 
   if (!convertible_lax || flag_lax_vector_conversions)
     return convertible_lax;
 
-  if (TYPE_VECTOR_SUBPARTS (t1) == TYPE_VECTOR_SUBPARTS (t2)
+  if (must_eq (TYPE_VECTOR_SUBPARTS (t1), TYPE_VECTOR_SUBPARTS (t2))
       && lang_hooks.types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2)))
     return true;
 
@@ -1018,10 +1019,10 @@ c_build_vec_perm_expr (location_t loc, t
       return error_mark_node;
     }
 
-  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
-      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
-      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
-	 != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+  if (may_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0)),
+	      TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
+      && may_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1)),
+		 TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))))
     {
       if (complain)
 	error_at (loc, "__builtin_shuffle number of elements of the "
@@ -2280,7 +2281,8 @@ c_common_type_for_mode (machine_mode mod
       if (inner_type != NULL_TREE)
 	return build_complex_type (inner_type);
     }
-  else if (VECTOR_MODE_P (mode))
+  else if (VECTOR_MODE_P (mode)
+	   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
     {
       machine_mode inner_mode = GET_MODE_INNER (mode);
       tree inner_type = c_common_type_for_mode (inner_mode, unsignedp);
@@ -7591,7 +7593,7 @@ convert_vector_to_array_for_subscript (l
 
       if (TREE_CODE (index) == INTEGER_CST)
         if (!tree_fits_uhwi_p (index)
-            || tree_to_uhwi (index) >= TYPE_VECTOR_SUBPARTS (type))
+	    || may_ge (tree_to_uhwi (index), TYPE_VECTOR_SUBPARTS (type)))
           warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
 
       /* We are building an ARRAY_REF so mark the vector as addressable
Index: gcc/c/c-typeck.c
===================================================================
--- gcc/c/c-typeck.c	2017-10-10 17:55:22.067175462 +0100
+++ gcc/c/c-typeck.c	2017-10-23 17:25:51.726380364 +0100
@@ -1238,7 +1238,7 @@ comptypes_internal (const_tree type1, co
       break;
 
     case VECTOR_TYPE:
-      val = (TYPE_VECTOR_SUBPARTS (t1) == TYPE_VECTOR_SUBPARTS (t2)
+      val = (must_eq (TYPE_VECTOR_SUBPARTS (t1), TYPE_VECTOR_SUBPARTS (t2))
 	     && comptypes_internal (TREE_TYPE (t1), TREE_TYPE (t2),
 				    enum_and_int_p, different_types_p));
       break;
@@ -11343,7 +11343,8 @@ build_binary_op (location_t location, en
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
 	  && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
 	  && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
-	  && TYPE_VECTOR_SUBPARTS (type0) == TYPE_VECTOR_SUBPARTS (type1))
+	  && must_eq (TYPE_VECTOR_SUBPARTS (type0),
+		      TYPE_VECTOR_SUBPARTS (type1)))
 	{
 	  result_type = type0;
 	  converted = 1;
@@ -11400,7 +11401,8 @@ build_binary_op (location_t location, en
       if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
 	  && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
 	  && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
-	  && TYPE_VECTOR_SUBPARTS (type0) == TYPE_VECTOR_SUBPARTS (type1))
+	  && must_eq (TYPE_VECTOR_SUBPARTS (type0),
+		      TYPE_VECTOR_SUBPARTS (type1)))
 	{
 	  result_type = type0;
 	  converted = 1;
@@ -11474,7 +11476,8 @@ build_binary_op (location_t location, en
               return error_mark_node;
             }
 
-          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+	  if (may_ne (TYPE_VECTOR_SUBPARTS (type0),
+		      TYPE_VECTOR_SUBPARTS (type1)))
             {
               error_at (location, "comparing vectors with different "
                                   "number of elements");
@@ -11634,7 +11637,8 @@ build_binary_op (location_t location, en
               return error_mark_node;
             }
 
-          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+	  if (may_ne (TYPE_VECTOR_SUBPARTS (type0),
+		      TYPE_VECTOR_SUBPARTS (type1)))
             {
               error_at (location, "comparing vectors with different "
                                   "number of elements");
Index: gcc/cp/call.c
===================================================================
--- gcc/cp/call.c	2017-10-23 11:41:24.251615675 +0100
+++ gcc/cp/call.c	2017-10-23 17:25:51.728380292 +0100
@@ -4928,8 +4928,8 @@ build_conditional_expr_1 (location_t loc
 	}
 
       if (!same_type_p (arg2_type, arg3_type)
-	  || TYPE_VECTOR_SUBPARTS (arg1_type)
-	     != TYPE_VECTOR_SUBPARTS (arg2_type)
+	  || may_ne (TYPE_VECTOR_SUBPARTS (arg1_type),
+		     TYPE_VECTOR_SUBPARTS (arg2_type))
 	  || TYPE_SIZE (arg1_type) != TYPE_SIZE (arg2_type))
 	{
 	  if (complain & tf_error)
Index: gcc/cp/constexpr.c
===================================================================
--- gcc/cp/constexpr.c	2017-10-23 17:18:47.657057799 +0100
+++ gcc/cp/constexpr.c	2017-10-23 17:25:51.728380292 +0100
@@ -3059,7 +3059,8 @@ cxx_fold_indirect_ref (location_t loc, t
 	      unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
 	      tree index = bitsize_int (indexi);
 
-	      if (offset / part_widthi < TYPE_VECTOR_SUBPARTS (op00type))
+	      if (must_lt (offset / part_widthi,
+			   TYPE_VECTOR_SUBPARTS (op00type)))
 		return fold_build3_loc (loc,
 					BIT_FIELD_REF, type, op00,
 					part_width, index);
Index: gcc/cp/decl.c
===================================================================
--- gcc/cp/decl.c	2017-10-23 11:41:24.223565801 +0100
+++ gcc/cp/decl.c	2017-10-23 17:25:51.732380148 +0100
@@ -7454,7 +7454,11 @@ cp_finish_decomp (tree decl, tree first,
     }
   else if (TREE_CODE (type) == VECTOR_TYPE)
     {
-      eltscnt = TYPE_VECTOR_SUBPARTS (type);
+      if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&eltscnt))
+	{
+	  error_at (loc, "cannot decompose variable length vector %qT", type);
+	  goto error_out;
+	}
       if (count != eltscnt)
 	goto cnt_mismatch;
       eltype = cp_build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
Index: gcc/cp/mangle.c
===================================================================
--- gcc/cp/mangle.c	2017-10-10 17:55:22.087175461 +0100
+++ gcc/cp/mangle.c	2017-10-23 17:25:51.733380112 +0100
@@ -2260,7 +2260,8 @@ write_type (tree type)
 		  write_string ("Dv");
 		  /* Non-constant vector size would be encoded with
 		     _ expression, but we don't support that yet.  */
-		  write_unsigned_number (TYPE_VECTOR_SUBPARTS (type));
+		  write_unsigned_number (TYPE_VECTOR_SUBPARTS (type)
+					 .to_constant ());
 		  write_char ('_');
 		}
 	      else
Index: gcc/cp/typeck.c
===================================================================
--- gcc/cp/typeck.c	2017-10-23 11:41:24.212926194 +0100
+++ gcc/cp/typeck.c	2017-10-23 17:25:51.735380040 +0100
@@ -1359,7 +1359,7 @@ structural_comptypes (tree t1, tree t2,
       break;
 
     case VECTOR_TYPE:
-      if (TYPE_VECTOR_SUBPARTS (t1) != TYPE_VECTOR_SUBPARTS (t2)
+      if (may_ne (TYPE_VECTOR_SUBPARTS (t1), TYPE_VECTOR_SUBPARTS (t2))
 	  || !same_type_p (TREE_TYPE (t1), TREE_TYPE (t2)))
 	return false;
       break;
@@ -4513,9 +4513,10 @@ cp_build_binary_op (location_t location,
           converted = 1;
         }
       else if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
-	  && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
-	  && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
-	  && TYPE_VECTOR_SUBPARTS (type0) == TYPE_VECTOR_SUBPARTS (type1))
+	       && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
+	       && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
+	       && must_eq (TYPE_VECTOR_SUBPARTS (type0),
+			   TYPE_VECTOR_SUBPARTS (type1)))
 	{
 	  result_type = type0;
 	  converted = 1;
@@ -4558,9 +4559,10 @@ cp_build_binary_op (location_t location,
           converted = 1;
         }
       else if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
-	  && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
-	  && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
-	  && TYPE_VECTOR_SUBPARTS (type0) == TYPE_VECTOR_SUBPARTS (type1))
+	       && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
+	       && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
+	       && must_eq (TYPE_VECTOR_SUBPARTS (type0),
+			   TYPE_VECTOR_SUBPARTS (type1)))
 	{
 	  result_type = type0;
 	  converted = 1;
@@ -4925,7 +4927,8 @@ cp_build_binary_op (location_t location,
 	      return error_mark_node;
 	    }
 
-	  if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
+	  if (may_ne (TYPE_VECTOR_SUBPARTS (type0),
+		      TYPE_VECTOR_SUBPARTS (type1)))
 	    {
 	      if (complain & tf_error)
 		{
Index: gcc/cp/typeck2.c
===================================================================
--- gcc/cp/typeck2.c	2017-10-09 11:50:52.214211104 +0100
+++ gcc/cp/typeck2.c	2017-10-23 17:25:51.736380004 +0100
@@ -1276,7 +1276,7 @@ process_init_constructor_array (tree typ
     }
   else
     /* Vectors are like simple fixed-size arrays.  */
-    len = TYPE_VECTOR_SUBPARTS (type);
+    unbounded = !TYPE_VECTOR_SUBPARTS (type).is_constant (&len);
 
   /* There must not be more initializers than needed.  */
   if (!unbounded && vec_safe_length (v) > len)
Index: gcc/fortran/trans-types.c
===================================================================
--- gcc/fortran/trans-types.c	2017-09-25 13:57:12.591118003 +0100
+++ gcc/fortran/trans-types.c	2017-10-23 17:25:51.745379681 +0100
@@ -3159,7 +3159,8 @@ gfc_type_for_mode (machine_mode mode, in
       tree type = gfc_type_for_size (GET_MODE_PRECISION (int_mode), unsignedp);
       return type != NULL_TREE && mode == TYPE_MODE (type) ? type : NULL_TREE;
     }
-  else if (VECTOR_MODE_P (mode))
+  else if (VECTOR_MODE_P (mode)
+	   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
     {
       machine_mode inner_mode = GET_MODE_INNER (mode);
       tree inner_type = gfc_type_for_mode (inner_mode, unsignedp);
Index: gcc/lto/lto-lang.c
===================================================================
--- gcc/lto/lto-lang.c	2017-10-23 11:41:25.563189078 +0100
+++ gcc/lto/lto-lang.c	2017-10-23 17:25:51.748379573 +0100
@@ -971,7 +971,8 @@ lto_type_for_mode (machine_mode mode, in
       if (inner_type != NULL_TREE)
 	return build_complex_type (inner_type);
     }
-  else if (VECTOR_MODE_P (mode))
+  else if (VECTOR_MODE_P (mode)
+	   && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
     {
       machine_mode inner_mode = GET_MODE_INNER (mode);
       tree inner_type = lto_type_for_mode (inner_mode, unsigned_p);
Index: gcc/lto/lto.c
===================================================================
--- gcc/lto/lto.c	2017-10-13 10:23:39.776947828 +0100
+++ gcc/lto/lto.c	2017-10-23 17:25:51.749379537 +0100
@@ -316,7 +316,7 @@ hash_canonical_type (tree type)
 
   if (VECTOR_TYPE_P (type))
     {
-      hstate.add_int (TYPE_VECTOR_SUBPARTS (type));
+      hstate.add_poly_int (TYPE_VECTOR_SUBPARTS (type));
       hstate.add_int (TYPE_UNSIGNED (type));
     }
 
Index: gcc/go/go-lang.c
===================================================================
--- gcc/go/go-lang.c	2017-08-30 12:20:57.010045759 +0100
+++ gcc/go/go-lang.c	2017-10-23 17:25:51.747379609 +0100
@@ -372,7 +372,8 @@ go_langhook_type_for_mode (machine_mode
      make sense for the middle-end to ask the frontend for a type
      which the frontend does not support.  However, at least for now
      it is required.  See PR 46805.  */
-  if (VECTOR_MODE_P (mode))
+  if (VECTOR_MODE_P (mode)
+      && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
     {
       tree inner;
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [105/nnn] poly_int: expand_assignment
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (102 preceding siblings ...)
  2017-10-23 17:42 ` [102/nnn] poly_int: vect_permute_load/store_chain Richard Sandiford
@ 2017-10-23 17:43 ` Richard Sandiford
  2017-11-21  7:50   ` Jeff Law
  2017-10-23 17:43 ` [104/nnn] poly_int: GET_MODE_PRECISION Richard Sandiford
                   ` (3 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:43 UTC (permalink / raw)
  To: gcc-patches

This patch makes the CONCAT handing in expand_assignment cope with
polynomial mode sizes.  The mode of the CONCAT must be complex,
so we can base the tests on the sizes of the real and imaginary
components.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* expr.c (expand_assignment): Cope with polynomial mode sizes
	when assigning to a CONCAT.

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:25:54.178292230 +0100
+++ gcc/expr.c	2017-10-23 17:25:56.086223649 +0100
@@ -5109,32 +5109,36 @@ expand_assignment (tree to, tree from, b
       /* Handle expand_expr of a complex value returning a CONCAT.  */
       else if (GET_CODE (to_rtx) == CONCAT)
 	{
-	  unsigned short mode_bitsize = GET_MODE_BITSIZE (GET_MODE (to_rtx));
+	  machine_mode to_mode = GET_MODE (to_rtx);
+	  gcc_checking_assert (COMPLEX_MODE_P (to_mode));
+	  poly_int64 mode_bitsize = GET_MODE_BITSIZE (to_mode);
+	  unsigned short inner_bitsize = GET_MODE_UNIT_BITSIZE (to_mode);
 	  if (COMPLEX_MODE_P (TYPE_MODE (TREE_TYPE (from)))
 	      && known_zero (bitpos)
 	      && must_eq (bitsize, mode_bitsize))
 	    result = store_expr (from, to_rtx, false, nontemporal, reversep);
-	  else if (must_eq (bitsize, mode_bitsize / 2)
+	  else if (must_eq (bitsize, inner_bitsize)
 		   && (known_zero (bitpos)
-		       || must_eq (bitpos, mode_bitsize / 2)))
+		       || must_eq (bitpos, inner_bitsize)))
 	    result = store_expr (from, XEXP (to_rtx, maybe_nonzero (bitpos)),
 				 false, nontemporal, reversep);
-	  else if (must_le (bitpos + bitsize, mode_bitsize / 2))
+	  else if (must_le (bitpos + bitsize, inner_bitsize))
 	    result = store_field (XEXP (to_rtx, 0), bitsize, bitpos,
 				  bitregion_start, bitregion_end,
 				  mode1, from, get_alias_set (to),
 				  nontemporal, reversep);
-	  else if (must_ge (bitpos, mode_bitsize / 2))
+	  else if (must_ge (bitpos, inner_bitsize))
 	    result = store_field (XEXP (to_rtx, 1), bitsize,
-				  bitpos - mode_bitsize / 2,
+				  bitpos - inner_bitsize,
 				  bitregion_start, bitregion_end,
 				  mode1, from, get_alias_set (to),
 				  nontemporal, reversep);
-	  else if (known_zero (bitpos) && must_eq (bitsize, mode_bitsize))
+	  else if (known_zero (bitpos)
+		   && must_eq (bitsize, mode_bitsize))
 	    {
 	      rtx from_rtx;
 	      result = expand_normal (from);
-	      from_rtx = simplify_gen_subreg (GET_MODE (to_rtx), result,
+	      from_rtx = simplify_gen_subreg (to_mode, result,
 					      TYPE_MODE (TREE_TYPE (from)), 0);
 	      emit_move_insn (XEXP (to_rtx, 0),
 			      read_complex_part (from_rtx, false));

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [104/nnn] poly_int: GET_MODE_PRECISION
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (103 preceding siblings ...)
  2017-10-23 17:43 ` [105/nnn] poly_int: expand_assignment Richard Sandiford
@ 2017-10-23 17:43 ` Richard Sandiford
  2017-11-28  8:07   ` Jeff Law
  2017-10-23 17:43 ` [106/nnn] poly_int: GET_MODE_BITSIZE Richard Sandiford
                   ` (2 subsequent siblings)
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:43 UTC (permalink / raw)
  To: gcc-patches

This patch changes GET_MODE_PRECISION from an unsigned short
to a poly_uint16.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* machmode.h (mode_precision): Change from unsigned short to
	poly_uint16_pod.
	(mode_to_precision): Return a poly_uint16 rather than an unsigned
	short.
	(GET_MODE_PRECISION): Return a constant if ONLY_FIXED_SIZE_MODES,
	or if measurement_type is not polynomial.
	(HWI_COMPUTABLE_MODE_P): Turn into a function.  Optimize the case
	in which the mode is already known to be a scalar_int_mode.
	* genmodes.c (emit_mode_precision): Change the type of mode_precision
	from unsigned short to poly_uint16_pod.  Use ZERO_COEFFS for the
	initializer.
	* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
	for GET_MODE_PRECISION.
	* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
	for GET_MODE_PRECISION.
	* combine.c (update_rsp_from_reg_equal): Treat GET_MODE_PRECISION
	as polynomial.
	(try_combine, find_split_point, combine_simplify_rtx): Likewise.
	(expand_field_assignment, make_extraction): Likewise.
	(make_compound_operation_int, record_dead_and_set_regs_1): Likewise.
	(get_last_value): Likewise.
	* convert.c (convert_to_integer_1): Likewise.
	* cse.c (cse_insn): Likewise.
	* expr.c (expand_expr_real_1): Likewise.
	* lra-constraints.c (simplify_operand_subreg): Likewise.
	* optabs-query.c (can_atomic_load_p): Likewise.
	* optabs.c (expand_atomic_load): Likewise.
	(expand_atomic_store): Likewise.
	* ree.c (combine_reaching_defs): Likewise.
	* rtl.h (partial_subreg_p, paradoxical_subreg_p): Likewise.
	* rtlanal.c (nonzero_bits1, lsb_bitfield_op_p): Likewise.
	* tree.h (type_has_mode_precision_p): Likewise.
	* ubsan.c (instrument_si_overflow): Likewise.

gcc/ada/
	* gcc-interface/misc.c (enumerate_modes): Treat GET_MODE_PRECISION
	as polynomial.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2017-10-23 17:25:48.620492005 +0100
+++ gcc/machmode.h	2017-10-23 17:25:54.180292158 +0100
@@ -23,7 +23,7 @@ #define HAVE_MACHINE_MODES
 typedef opt_mode<machine_mode> opt_machine_mode;
 
 extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
-extern const unsigned short mode_precision[NUM_MACHINE_MODES];
+extern const poly_uint16_pod mode_precision[NUM_MACHINE_MODES];
 extern const unsigned char mode_inner[NUM_MACHINE_MODES];
 extern const poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];
 extern CONST_MODE_UNIT_SIZE unsigned char mode_unit_size[NUM_MACHINE_MODES];
@@ -535,7 +535,7 @@ mode_to_bits (machine_mode mode)
 
 /* Return the base GET_MODE_PRECISION value for MODE.  */
 
-ALWAYS_INLINE unsigned short
+ALWAYS_INLINE poly_uint16
 mode_to_precision (machine_mode mode)
 {
   return mode_precision[mode];
@@ -604,7 +604,30 @@ #define GET_MODE_BITSIZE(MODE) (mode_to_
 
 /* Get the number of value bits of an object of mode MODE.  */
 
-#define GET_MODE_PRECISION(MODE) (mode_to_precision (MODE))
+#if ONLY_FIXED_SIZE_MODES
+#define GET_MODE_PRECISION(MODE) \
+  ((unsigned short) mode_to_precision (MODE).coeffs[0])
+#else
+ALWAYS_INLINE poly_uint16
+GET_MODE_PRECISION (machine_mode mode)
+{
+  return mode_to_precision (mode);
+}
+
+template<typename T>
+ALWAYS_INLINE typename if_poly<typename T::measurement_type>::t
+GET_MODE_PRECISION (const T &mode)
+{
+  return mode_to_precision (mode);
+}
+
+template<typename T>
+ALWAYS_INLINE typename if_nonpoly<typename T::measurement_type>::t
+GET_MODE_PRECISION (const T &mode)
+{
+  return mode_to_precision (mode).coeffs[0];
+}
+#endif
 
 /* Get the number of integral bits of an object of mode MODE.  */
 extern CONST_MODE_IBIT unsigned char mode_ibit[NUM_MACHINE_MODES];
@@ -863,9 +886,22 @@ #define TRULY_NOOP_TRUNCATION_MODES_P(MO
   (targetm.truly_noop_truncation (GET_MODE_PRECISION (MODE1), \
 				  GET_MODE_PRECISION (MODE2)))
 
-#define HWI_COMPUTABLE_MODE_P(MODE) \
-  (SCALAR_INT_MODE_P (MODE) \
-   && GET_MODE_PRECISION (MODE) <= HOST_BITS_PER_WIDE_INT)
+/* Return true if MODE is a scalar integer mode that fits in a
+   HOST_WIDE_INT.  */
+
+inline bool
+HWI_COMPUTABLE_MODE_P (machine_mode mode)
+{
+  machine_mode mme = mode;
+  return (SCALAR_INT_MODE_P (mme)
+	  && mode_to_precision (mme).coeffs[0] <= HOST_BITS_PER_WIDE_INT);
+}
+
+inline bool
+HWI_COMPUTABLE_MODE_P (scalar_int_mode mode)
+{
+  return GET_MODE_PRECISION (mode) <= HOST_BITS_PER_WIDE_INT;
+}
 
 struct int_n_data_t {
   /* These parts are initailized by genmodes output */
Index: gcc/genmodes.c
===================================================================
--- gcc/genmodes.c	2017-10-23 17:25:48.618492077 +0100
+++ gcc/genmodes.c	2017-10-23 17:25:54.178292230 +0100
@@ -1358,13 +1358,14 @@ emit_mode_precision (void)
   int c;
   struct mode_data *m;
 
-  print_decl ("unsigned short", "mode_precision", "NUM_MACHINE_MODES");
+  print_decl ("poly_uint16_pod", "mode_precision", "NUM_MACHINE_MODES");
 
   for_all_modes (c, m)
     if (m->precision != (unsigned int)-1)
-      tagged_printf ("%u", m->precision, m->name);
+      tagged_printf ("{ %u" ZERO_COEFFS " }", m->precision, m->name);
     else
-      tagged_printf ("%u*BITS_PER_UNIT", m->bytesize, m->name);
+      tagged_printf ("{ %u * BITS_PER_UNIT" ZERO_COEFFS " }",
+		     m->bytesize, m->name);
 
   print_closer ();
 }
Index: gcc/lto-streamer-in.c
===================================================================
--- gcc/lto-streamer-in.c	2017-10-23 17:25:48.619492041 +0100
+++ gcc/lto-streamer-in.c	2017-10-23 17:25:54.179292194 +0100
@@ -1605,7 +1605,7 @@ lto_input_mode_table (struct lto_file_de
       enum mode_class mclass
 	= bp_unpack_enum (&bp, mode_class, MAX_MODE_CLASS);
       unsigned int size = bp_unpack_value (&bp, 8);
-      unsigned int prec = bp_unpack_value (&bp, 16);
+      poly_uint16 prec = bp_unpack_poly_value (&bp, 16);
       machine_mode inner = (machine_mode) bp_unpack_value (&bp, 8);
       poly_uint16 nunits = bp_unpack_poly_value (&bp, 16);
       unsigned int ibit = 0, fbit = 0;
@@ -1639,7 +1639,7 @@ lto_input_mode_table (struct lto_file_de
 		  : mr = GET_MODE_WIDER_MODE (mr).else_void ())
 	  if (GET_MODE_CLASS (mr) != mclass
 	      || GET_MODE_SIZE (mr) != size
-	      || GET_MODE_PRECISION (mr) != prec
+	      || may_ne (GET_MODE_PRECISION (mr), prec)
 	      || (inner == m
 		  ? GET_MODE_INNER (mr) != mr
 		  : GET_MODE_INNER (mr) != table[(int) inner])
Index: gcc/lto-streamer-out.c
===================================================================
--- gcc/lto-streamer-out.c	2017-10-23 17:25:48.620492005 +0100
+++ gcc/lto-streamer-out.c	2017-10-23 17:25:54.180292158 +0100
@@ -2773,7 +2773,7 @@ lto_write_mode_table (void)
 	  bp_pack_value (&bp, m, 8);
 	  bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
 	  bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
-	  bp_pack_value (&bp, GET_MODE_PRECISION (m), 16);
+	  bp_pack_poly_value (&bp, GET_MODE_PRECISION (m), 16);
 	  bp_pack_value (&bp, GET_MODE_INNER (m), 8);
 	  bp_pack_poly_value (&bp, GET_MODE_NUNITS (m), 16);
 	  switch (GET_MODE_CLASS (m))
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	2017-10-23 17:25:30.702136080 +0100
+++ gcc/combine.c	2017-10-23 17:25:54.176292301 +0100
@@ -1703,7 +1703,7 @@ update_rsp_from_reg_equal (reg_stat_type
   if (rsp->sign_bit_copies != 1)
     {
       num = num_sign_bit_copies (SET_SRC (set), GET_MODE (x));
-      if (reg_equal && num != GET_MODE_PRECISION (GET_MODE (x)))
+      if (reg_equal && may_ne (num, GET_MODE_PRECISION (GET_MODE (x))))
 	{
 	  unsigned int numeq = num_sign_bit_copies (reg_equal, GET_MODE (x));
 	  if (num == 0 || numeq > num)
@@ -3938,16 +3938,20 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	   && ! (temp_expr = SET_DEST (XVECEXP (newpat, 0, 1)),
 		 (REG_P (temp_expr)
 		  && reg_stat[REGNO (temp_expr)].nonzero_bits != 0
-		  && GET_MODE_PRECISION (GET_MODE (temp_expr)) < BITS_PER_WORD
-		  && GET_MODE_PRECISION (GET_MODE (temp_expr)) < HOST_BITS_PER_INT
+		  && must_lt (GET_MODE_PRECISION (GET_MODE (temp_expr)),
+			      BITS_PER_WORD)
+		  && must_lt (GET_MODE_PRECISION (GET_MODE (temp_expr)),
+			      HOST_BITS_PER_INT)
 		  && (reg_stat[REGNO (temp_expr)].nonzero_bits
 		      != GET_MODE_MASK (word_mode))))
 	   && ! (GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) == SUBREG
 		 && (temp_expr = SUBREG_REG (SET_DEST (XVECEXP (newpat, 0, 1))),
 		     (REG_P (temp_expr)
 		      && reg_stat[REGNO (temp_expr)].nonzero_bits != 0
-		      && GET_MODE_PRECISION (GET_MODE (temp_expr)) < BITS_PER_WORD
-		      && GET_MODE_PRECISION (GET_MODE (temp_expr)) < HOST_BITS_PER_INT
+		      && must_lt (GET_MODE_PRECISION (GET_MODE (temp_expr)),
+				  BITS_PER_WORD)
+		      && must_lt (GET_MODE_PRECISION (GET_MODE (temp_expr)),
+				  HOST_BITS_PER_INT)
 		      && (reg_stat[REGNO (temp_expr)].nonzero_bits
 			  != GET_MODE_MASK (word_mode)))))
 	   && ! reg_overlap_mentioned_p (SET_DEST (XVECEXP (newpat, 0, 1)),
@@ -5115,8 +5119,9 @@ find_split_point (rtx *loc, rtx_insn *in
 	  break;
 	}
 
-      if (len && pos >= 0
-	  && pos + len <= GET_MODE_PRECISION (GET_MODE (inner))
+      if (len
+	  && known_subrange_p (pos, len,
+			       0, GET_MODE_PRECISION (GET_MODE (inner)))
 	  && is_a <scalar_int_mode> (GET_MODE (SET_SRC (x)), &mode))
 	{
 	  /* For unsigned, we have a choice of a shift followed by an
@@ -5982,8 +5987,9 @@ combine_simplify_rtx (rtx x, machine_mod
 	       && (UINTVAL (XEXP (XEXP (XEXP (x, 0), 0), 1))
 		   == (HOST_WIDE_INT_1U << (i + 1)) - 1))
 	      || (GET_CODE (XEXP (XEXP (x, 0), 0)) == ZERO_EXTEND
-		  && (GET_MODE_PRECISION (GET_MODE (XEXP (XEXP (XEXP (x, 0), 0), 0)))
-		      == (unsigned int) i + 1))))
+		  && must_eq ((GET_MODE_PRECISION
+			       (GET_MODE (XEXP (XEXP (XEXP (x, 0), 0), 0)))),
+			      (unsigned int) i + 1))))
 	return simplify_shift_const
 	  (NULL_RTX, ASHIFTRT, int_mode,
 	   simplify_shift_const (NULL_RTX, ASHIFT, int_mode,
@@ -7314,7 +7320,7 @@ expand_field_assignment (const_rtx x)
 {
   rtx inner;
   rtx pos;			/* Always counts from low bit.  */
-  int len;
+  int len, inner_len;
   rtx mask, cleared, masked;
   scalar_int_mode compute_mode;
 
@@ -7324,8 +7330,10 @@ expand_field_assignment (const_rtx x)
       if (GET_CODE (SET_DEST (x)) == STRICT_LOW_PART
 	  && GET_CODE (XEXP (SET_DEST (x), 0)) == SUBREG)
 	{
+	  rtx x0 = XEXP (SET_DEST (x), 0);
+	  if (!GET_MODE_PRECISION (GET_MODE (x0)).is_constant (&len))
+	    break;
 	  inner = SUBREG_REG (XEXP (SET_DEST (x), 0));
-	  len = GET_MODE_PRECISION (GET_MODE (XEXP (SET_DEST (x), 0)));
 	  pos = gen_int_mode (subreg_lsb (XEXP (SET_DEST (x), 0)),
 			      MAX_MODE_INT);
 	}
@@ -7333,33 +7341,30 @@ expand_field_assignment (const_rtx x)
 	       && CONST_INT_P (XEXP (SET_DEST (x), 1)))
 	{
 	  inner = XEXP (SET_DEST (x), 0);
+	  if (!GET_MODE_PRECISION (GET_MODE (inner)).is_constant (&inner_len))
+	    break;
+
 	  len = INTVAL (XEXP (SET_DEST (x), 1));
 	  pos = XEXP (SET_DEST (x), 2);
 
 	  /* A constant position should stay within the width of INNER.  */
-	  if (CONST_INT_P (pos)
-	      && INTVAL (pos) + len > GET_MODE_PRECISION (GET_MODE (inner)))
+	  if (CONST_INT_P (pos) && INTVAL (pos) + len > inner_len)
 	    break;
 
 	  if (BITS_BIG_ENDIAN)
 	    {
 	      if (CONST_INT_P (pos))
-		pos = GEN_INT (GET_MODE_PRECISION (GET_MODE (inner)) - len
-			       - INTVAL (pos));
+		pos = GEN_INT (inner_len - len - INTVAL (pos));
 	      else if (GET_CODE (pos) == MINUS
 		       && CONST_INT_P (XEXP (pos, 1))
-		       && (INTVAL (XEXP (pos, 1))
-			   == GET_MODE_PRECISION (GET_MODE (inner)) - len))
+		       && INTVAL (XEXP (pos, 1)) == inner_len - len)
 		/* If position is ADJUST - X, new position is X.  */
 		pos = XEXP (pos, 0);
 	      else
-		{
-		  HOST_WIDE_INT prec = GET_MODE_PRECISION (GET_MODE (inner));
-		  pos = simplify_gen_binary (MINUS, GET_MODE (pos),
-					     gen_int_mode (prec - len,
-							   GET_MODE (pos)),
-					     pos);
-		}
+		pos = simplify_gen_binary (MINUS, GET_MODE (pos),
+					   gen_int_mode (inner_len - len,
+							 GET_MODE (pos)),
+					   pos);
 	    }
 	}
 
@@ -7479,7 +7484,7 @@ make_extraction (machine_mode mode, rtx
 	     bits outside of is_mode, don't look through
 	     non-paradoxical SUBREGs.  See PR82192.  */
 	  || (pos_rtx == NULL_RTX
-	      && pos + len <= GET_MODE_PRECISION (is_mode))))
+	      && must_le (pos + len, GET_MODE_PRECISION (is_mode)))))
     {
       /* If going from (subreg:SI (mem:QI ...)) to (mem:QI ...),
 	 consider just the QI as the memory to extract from.
@@ -7510,7 +7515,7 @@ make_extraction (machine_mode mode, rtx
 	      bits outside of is_mode, don't look through
 	      TRUNCATE.  See PR82192.  */
 	   && pos_rtx == NULL_RTX
-	   && pos + len <= GET_MODE_PRECISION (is_mode))
+	   && must_le (pos + len, GET_MODE_PRECISION (is_mode)))
     inner = XEXP (inner, 0);
 
   inner_mode = GET_MODE (inner);
@@ -7557,11 +7562,12 @@ make_extraction (machine_mode mode, rtx
 
       if (MEM_P (inner))
 	{
-	  HOST_WIDE_INT offset;
+	  poly_int64 offset;
 
 	  /* POS counts from lsb, but make OFFSET count in memory order.  */
 	  if (BYTES_BIG_ENDIAN)
-	    offset = (GET_MODE_PRECISION (is_mode) - len - pos) / BITS_PER_UNIT;
+	    offset = bits_to_bytes_round_down (GET_MODE_PRECISION (is_mode)
+					       - len - pos);
 	  else
 	    offset = pos / BITS_PER_UNIT;
 
@@ -7653,7 +7659,7 @@ make_extraction (machine_mode mode, rtx
      other cases, we would only be going outside our object in cases when
      an original shift would have been undefined.  */
   if (MEM_P (inner)
-      && ((pos_rtx == 0 && pos + len > GET_MODE_PRECISION (is_mode))
+      && ((pos_rtx == 0 && may_gt (pos + len, GET_MODE_PRECISION (is_mode)))
 	  || (pos_rtx != 0 && len != 1)))
     return 0;
 
@@ -8132,8 +8138,10 @@ make_compound_operation_int (scalar_int_
 
 	  sub = XEXP (XEXP (x, 0), 0);
 	  machine_mode sub_mode = GET_MODE (sub);
+	  int sub_width;
 	  if ((REG_P (sub) || MEM_P (sub))
-	      && GET_MODE_PRECISION (sub_mode) < mode_width)
+	      && GET_MODE_PRECISION (sub_mode).is_constant (&sub_width)
+	      && sub_width < mode_width)
 	    {
 	      unsigned HOST_WIDE_INT mode_mask = GET_MODE_MASK (sub_mode);
 	      unsigned HOST_WIDE_INT mask;
@@ -8143,8 +8151,7 @@ make_compound_operation_int (scalar_int_
 	      if ((mask & mode_mask) == mode_mask)
 		{
 		  new_rtx = make_compound_operation (sub, next_code);
-		  new_rtx = make_extraction (mode, new_rtx, 0, 0,
-					     GET_MODE_PRECISION (sub_mode),
+		  new_rtx = make_extraction (mode, new_rtx, 0, 0, sub_width,
 					     1, 0, in_code == COMPARE);
 		}
 	    }
@@ -13215,7 +13222,7 @@ record_dead_and_set_regs_1 (rtx dest, co
       else if (GET_CODE (setter) == SET
 	       && GET_CODE (SET_DEST (setter)) == SUBREG
 	       && SUBREG_REG (SET_DEST (setter)) == dest
-	       && GET_MODE_PRECISION (GET_MODE (dest)) <= BITS_PER_WORD
+	       && must_le (GET_MODE_PRECISION (GET_MODE (dest)), BITS_PER_WORD)
 	       && subreg_lowpart_p (SET_DEST (setter)))
 	record_value_for_reg (dest, record_dead_insn,
 			      gen_lowpart (GET_MODE (dest),
@@ -13617,8 +13624,8 @@ get_last_value (const_rtx x)
 
   /* If fewer bits were set than what we are asked for now, we cannot use
      the value.  */
-  if (GET_MODE_PRECISION (rsp->last_set_mode)
-      < GET_MODE_PRECISION (GET_MODE (x)))
+  if (may_lt (GET_MODE_PRECISION (rsp->last_set_mode),
+	      GET_MODE_PRECISION (GET_MODE (x))))
     return 0;
 
   /* If the value has all its registers valid, return it.  */
Index: gcc/convert.c
===================================================================
--- gcc/convert.c	2017-09-15 14:47:33.181331910 +0100
+++ gcc/convert.c	2017-10-23 17:25:54.176292301 +0100
@@ -731,7 +731,7 @@ convert_to_integer_1 (tree type, tree ex
 	 type corresponding to its mode, then do a nop conversion
 	 to TYPE.  */
       else if (TREE_CODE (type) == ENUMERAL_TYPE
-	       || outprec != GET_MODE_PRECISION (TYPE_MODE (type)))
+	       || may_ne (outprec, GET_MODE_PRECISION (TYPE_MODE (type))))
 	{
 	  expr = convert (lang_hooks.types.type_for_mode
 			  (TYPE_MODE (type), TYPE_UNSIGNED (type)), expr);
Index: gcc/cse.c
===================================================================
--- gcc/cse.c	2017-10-23 17:16:50.359529762 +0100
+++ gcc/cse.c	2017-10-23 17:25:54.177292265 +0100
@@ -5231,8 +5231,9 @@ cse_insn (rtx_insn *insn)
 	      && CONST_INT_P (XEXP (SET_DEST (sets[i].rtl), 1))
 	      && CONST_INT_P (XEXP (SET_DEST (sets[i].rtl), 2))
 	      && REG_P (XEXP (SET_DEST (sets[i].rtl), 0))
-	      && (GET_MODE_PRECISION (GET_MODE (SET_DEST (sets[i].rtl)))
-		  >= INTVAL (XEXP (SET_DEST (sets[i].rtl), 1)))
+	      && (must_ge
+		  (GET_MODE_PRECISION (GET_MODE (SET_DEST (sets[i].rtl))),
+		   INTVAL (XEXP (SET_DEST (sets[i].rtl), 1))))
 	      && ((unsigned) INTVAL (XEXP (SET_DEST (sets[i].rtl), 1))
 		  + (unsigned) INTVAL (XEXP (SET_DEST (sets[i].rtl), 2))
 		  <= HOST_BITS_PER_WIDE_INT))
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:25:51.740379860 +0100
+++ gcc/expr.c	2017-10-23 17:25:54.178292230 +0100
@@ -11034,9 +11034,10 @@ expand_expr_real_1 (tree exp, rtx target
 	;
       /* If neither mode is BLKmode, and both modes are the same size
 	 then we can use gen_lowpart.  */
-      else if (mode != BLKmode && GET_MODE (op0) != BLKmode
-	       && (GET_MODE_PRECISION (mode)
-		   == GET_MODE_PRECISION (GET_MODE (op0)))
+      else if (mode != BLKmode
+	       && GET_MODE (op0) != BLKmode
+	       && must_eq (GET_MODE_PRECISION (mode),
+			   GET_MODE_PRECISION (GET_MODE (op0)))
 	       && !COMPLEX_MODE_P (GET_MODE (op0)))
 	{
 	  if (GET_CODE (op0) == SUBREG)
Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c	2017-10-23 17:25:42.597708494 +0100
+++ gcc/lra-constraints.c	2017-10-23 17:25:54.179292194 +0100
@@ -1555,7 +1555,8 @@ simplify_operand_subreg (int nop, machin
 	     missing important data from memory when the inner is wider than
 	     outer.  This rule only applies to modes that are no wider than
 	     a word.  */
-	  if (!(GET_MODE_PRECISION (mode) != GET_MODE_PRECISION (innermode)
+	  if (!(may_ne (GET_MODE_PRECISION (mode),
+			GET_MODE_PRECISION (innermode))
 		&& GET_MODE_SIZE (mode) <= UNITS_PER_WORD
 		&& GET_MODE_SIZE (innermode) <= UNITS_PER_WORD
 		&& WORD_REGISTER_OPERATIONS)
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-10-23 17:25:48.620492005 +0100
+++ gcc/optabs-query.c	2017-10-23 17:25:54.180292158 +0100
@@ -592,7 +592,7 @@ can_atomic_load_p (machine_mode mode)
   /* If the size of the object is greater than word size on this target,
      then we assume that a load will not be atomic.  Also see
      expand_atomic_load.  */
-  return GET_MODE_PRECISION (mode) <= BITS_PER_WORD;
+  return must_le (GET_MODE_PRECISION (mode), BITS_PER_WORD);
 }
 
 /* Determine whether "1 << x" is relatively cheap in word_mode.  */
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 17:25:48.621491969 +0100
+++ gcc/optabs.c	2017-10-23 17:25:54.181292122 +0100
@@ -6416,7 +6416,7 @@ expand_atomic_load (rtx target, rtx mem,
      emulate a load with a compare-and-swap operation, but the store that
      doing this could result in would be incorrect if this is a volatile
      atomic load or targetting read-only-mapped memory.  */
-  if (GET_MODE_PRECISION (mode) > BITS_PER_WORD)
+  if (may_gt (GET_MODE_PRECISION (mode), BITS_PER_WORD))
     /* If there is no atomic load, leave the library call.  */
     return NULL_RTX;
 
@@ -6490,7 +6490,7 @@ expand_atomic_store (rtx mem, rtx val, e
 
   /* If the size of the object is greater than word size on this target,
      a default store will not be atomic.  */
-  if (GET_MODE_PRECISION (mode) > BITS_PER_WORD)
+  if (may_gt (GET_MODE_PRECISION (mode), BITS_PER_WORD))
     {
       /* If loads are atomic or we are called to provide a __sync builtin,
 	 we can try a atomic_exchange and throw away the result.  Otherwise,
Index: gcc/ree.c
===================================================================
--- gcc/ree.c	2017-10-23 11:41:25.865934266 +0100
+++ gcc/ree.c	2017-10-23 17:25:54.181292122 +0100
@@ -860,9 +860,9 @@ combine_reaching_defs (ext_cand *cand, c
 	 as destination register will not affect its reaching uses, which may
 	 read its value in a larger mode because DEF_INSN implicitly sets it
 	 in word mode.  */
-      const unsigned int prec
+      poly_int64 prec
 	= GET_MODE_PRECISION (GET_MODE (SET_DEST (*dest_sub_rtx)));
-      if (WORD_REGISTER_OPERATIONS && prec < BITS_PER_WORD)
+      if (WORD_REGISTER_OPERATIONS && must_lt (prec, BITS_PER_WORD))
 	{
 	  struct df_link *uses = get_uses (def_insn, src_reg);
 	  if (!uses)
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 17:18:57.862160702 +0100
+++ gcc/rtl.h	2017-10-23 17:25:54.182292086 +0100
@@ -3033,7 +3033,12 @@ extern poly_uint64 subreg_size_lowpart_o
 inline bool
 partial_subreg_p (machine_mode outermode, machine_mode innermode)
 {
-  return GET_MODE_PRECISION (outermode) < GET_MODE_PRECISION (innermode);
+  /* Modes involved in a subreg must be ordered.  In particular, we must
+     always know at compile time whether the subreg is paradoxical.  */
+  poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
+  poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
+  gcc_checking_assert (ordered_p (outer_prec, inner_prec));
+  return may_lt (outer_prec, inner_prec);
 }
 
 /* Likewise return true if X is a subreg that is smaller than the inner
@@ -3054,7 +3059,12 @@ partial_subreg_p (const_rtx x)
 inline bool
 paradoxical_subreg_p (machine_mode outermode, machine_mode innermode)
 {
-  return GET_MODE_PRECISION (outermode) > GET_MODE_PRECISION (innermode);
+  /* Modes involved in a subreg must be ordered.  In particular, we must
+     always know at compile time whether the subreg is paradoxical.  */
+  poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
+  poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
+  gcc_checking_assert (ordered_p (outer_prec, inner_prec));
+  return may_gt (outer_prec, inner_prec);
 }
 
 /* Return true if X is a paradoxical subreg, false otherwise.  */
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:25:48.622491933 +0100
+++ gcc/rtlanal.c	2017-10-23 17:25:54.182292086 +0100
@@ -4431,6 +4431,7 @@ nonzero_bits1 (const_rtx x, scalar_int_m
   unsigned HOST_WIDE_INT inner_nz;
   enum rtx_code code;
   machine_mode inner_mode;
+  unsigned int inner_width;
   scalar_int_mode xmode;
 
   unsigned int mode_width = GET_MODE_PRECISION (mode);
@@ -4735,8 +4736,9 @@ nonzero_bits1 (const_rtx x, scalar_int_m
 	 machines, we can compute this from which bits of the inner
 	 object might be nonzero.  */
       inner_mode = GET_MODE (SUBREG_REG (x));
-      if (GET_MODE_PRECISION (inner_mode) <= BITS_PER_WORD
-	  && GET_MODE_PRECISION (inner_mode) <= HOST_BITS_PER_WIDE_INT)
+      if (GET_MODE_PRECISION (inner_mode).is_constant (&inner_width)
+	  && inner_width <= BITS_PER_WORD
+	  && inner_width <= HOST_BITS_PER_WIDE_INT)
 	{
 	  nonzero &= cached_nonzero_bits (SUBREG_REG (x), mode,
 					  known_x, known_mode, known_ret);
@@ -4752,8 +4754,9 @@ nonzero_bits1 (const_rtx x, scalar_int_m
 		   ? val_signbit_known_set_p (inner_mode, nonzero)
 		   : extend_op != ZERO_EXTEND)
 	       || (!MEM_P (SUBREG_REG (x)) && !REG_P (SUBREG_REG (x))))
-	      && xmode_width > GET_MODE_PRECISION (inner_mode))
-	    nonzero |= (GET_MODE_MASK (xmode) & ~GET_MODE_MASK (inner_mode));
+	      && xmode_width > inner_width)
+	    nonzero
+	      |= (GET_MODE_MASK (GET_MODE (x)) & ~GET_MODE_MASK (inner_mode));
 	}
       break;
 
@@ -6068,8 +6071,9 @@ lsb_bitfield_op_p (rtx x)
       machine_mode mode = GET_MODE (XEXP (x, 0));
       HOST_WIDE_INT len = INTVAL (XEXP (x, 1));
       HOST_WIDE_INT pos = INTVAL (XEXP (x, 2));
+      poly_int64 remaining_bits = GET_MODE_PRECISION (mode) - len;
 
-      return (pos == (BITS_BIG_ENDIAN ? GET_MODE_PRECISION (mode) - len : 0));
+      return must_eq (pos, BITS_BIG_ENDIAN ? remaining_bits : 0);
     }
   return false;
 }
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 17:25:51.773378674 +0100
+++ gcc/tree.h	2017-10-23 17:25:54.183292050 +0100
@@ -5773,7 +5773,7 @@ struct builtin_structptr_type
 inline bool
 type_has_mode_precision_p (const_tree t)
 {
-  return TYPE_PRECISION (t) == GET_MODE_PRECISION (TYPE_MODE (t));
+  return must_eq (TYPE_PRECISION (t), GET_MODE_PRECISION (TYPE_MODE (t)));
 }
 
 #endif  /* GCC_TREE_H  */
Index: gcc/ubsan.c
===================================================================
--- gcc/ubsan.c	2017-10-23 17:18:47.669056745 +0100
+++ gcc/ubsan.c	2017-10-23 17:25:54.183292050 +0100
@@ -1583,7 +1583,8 @@ instrument_si_overflow (gimple_stmt_iter
      Also punt on bit-fields.  */
   if (!INTEGRAL_TYPE_P (lhsinner)
       || TYPE_OVERFLOW_WRAPS (lhsinner)
-      || GET_MODE_BITSIZE (TYPE_MODE (lhsinner)) != TYPE_PRECISION (lhsinner))
+      || may_ne (GET_MODE_BITSIZE (TYPE_MODE (lhsinner)),
+		 TYPE_PRECISION (lhsinner)))
     return;
 
   switch (code)
Index: gcc/ada/gcc-interface/misc.c
===================================================================
--- gcc/ada/gcc-interface/misc.c	2017-10-23 17:25:48.617492113 +0100
+++ gcc/ada/gcc-interface/misc.c	2017-10-23 17:25:54.174292373 +0100
@@ -1298,11 +1298,13 @@ enumerate_modes (void (*f) (const char *
 	  }
 
       /* If no predefined C types were found, register the mode itself.  */
-      int nunits;
-      if (!skip_p && GET_MODE_NUNITS (i).is_constant (&nunits))
+      int nunits, precision;
+      if (!skip_p
+	  && GET_MODE_NUNITS (i).is_constant (&nunits)
+	  && GET_MODE_PRECISION (i).is_constant (&precision))
 	f (GET_MODE_NAME (i), digs, complex_p,
 	   vector_p ? nunits : 0, float_rep,
-	   GET_MODE_PRECISION (i), GET_MODE_BITSIZE (i),
+	   precision, GET_MODE_BITSIZE (i),
 	   GET_MODE_ALIGNMENT (i));
     }
 }

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [106/nnn] poly_int: GET_MODE_BITSIZE
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (104 preceding siblings ...)
  2017-10-23 17:43 ` [104/nnn] poly_int: GET_MODE_PRECISION Richard Sandiford
@ 2017-10-23 17:43 ` Richard Sandiford
  2017-11-21  7:49   ` Jeff Law
  2017-10-23 17:48 ` [107/nnn] poly_int: GET_MODE_SIZE Richard Sandiford
  2017-10-24  9:25 ` [000/nnn] poly_int: representation of runtime offsets and sizes Eric Botcazou
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:43 UTC (permalink / raw)
  To: gcc-patches

This patch changes GET_MODE_BITSIZE from an unsigned short
to a poly_uint16.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* machmode.h (mode_to_bits): Return a poly_uint16 rather than an
	unsigned short.
	(GET_MODE_BITSIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
	or if measurement_type is polynomial.
	* calls.c (shift_return_value): Treat GET_MODE_BITSIZE as polynomial.
	* combine.c (make_extraction): Likewise.
	* dse.c (find_shift_sequence): Likewise.
	* dwarf2out.c (mem_loc_descriptor): Likewise.
	* expmed.c (store_integral_bit_field, extract_bit_field_1): Likewise.
	(extract_bit_field, extract_low_bits): Likewise.
	* expr.c (convert_move, convert_modes, emit_move_insn_1): Likewise.
	(optimize_bitfield_assignment_op, expand_assignment): Likewise.
	(store_field, expand_expr_real_1): Likewise.
	* fold-const.c (optimize_bit_field_compare, merge_ranges): Likewise.
	* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
	* reload.c (find_reloads): Likewise.
	* reload1.c (alter_reg): Likewise.
	* stor-layout.c (bitwise_mode_for_mode, compute_record_mode): Likewise.
	* targhooks.c (default_secondary_memory_needed_mode): Likewise.
	* tree-if-conv.c (predicate_mem_writes): Likewise.
	* tree-ssa-strlen.c (handle_builtin_memcmp): Likewise.
	* tree-vect-patterns.c (adjust_bool_pattern): Likewise.
	* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
	* valtrack.c (dead_debug_insert_temp): Likewise.
	* varasm.c (mergeable_constant_section): Likewise.
	* config/sh/sh.h (LOCAL_ALIGNMENT): Use as_a <fixed_size_mode>.

gcc/ada/
	* gcc-interface/misc.c (enumerate_modes): Treat GET_MODE_BITSIZE
	as polynomial.

gcc/c-family/
	* c-ubsan.c (ubsan_instrument_shift): Treat GET_MODE_BITSIZE
	as polynomial.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2017-10-23 17:25:54.180292158 +0100
+++ gcc/machmode.h	2017-10-23 17:25:57.265181271 +0100
@@ -527,7 +527,7 @@ mode_to_bytes (machine_mode mode)
 
 /* Return the base GET_MODE_BITSIZE value for MODE.  */
 
-ALWAYS_INLINE unsigned short
+ALWAYS_INLINE poly_uint16
 mode_to_bits (machine_mode mode)
 {
   return mode_to_bytes (mode) * BITS_PER_UNIT;
@@ -600,7 +600,29 @@ #define GET_MODE_SIZE(MODE) (mode_to_byt
 
 /* Get the size in bits of an object of mode MODE.  */
 
-#define GET_MODE_BITSIZE(MODE) (mode_to_bits (MODE))
+#if ONLY_FIXED_SIZE_MODES
+#define GET_MODE_BITSIZE(MODE) ((unsigned short) mode_to_bits (MODE).coeffs[0])
+#else
+ALWAYS_INLINE poly_uint16
+GET_MODE_BITSIZE (machine_mode mode)
+{
+  return mode_to_bits (mode);
+}
+
+template<typename T>
+ALWAYS_INLINE typename if_poly<typename T::measurement_type>::t
+GET_MODE_BITSIZE (const T &mode)
+{
+  return mode_to_bits (mode);
+}
+
+template<typename T>
+ALWAYS_INLINE typename if_nonpoly<typename T::measurement_type>::t
+GET_MODE_BITSIZE (const T &mode)
+{
+  return mode_to_bits (mode).coeffs[0];
+}
+#endif
 
 /* Get the number of value bits of an object of mode MODE.  */
 
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-10-23 17:25:46.488568637 +0100
+++ gcc/calls.c	2017-10-23 17:25:57.257181559 +0100
@@ -2835,12 +2835,11 @@ check_sibcall_argument_overlap (rtx_insn
 bool
 shift_return_value (machine_mode mode, bool left_p, rtx value)
 {
-  HOST_WIDE_INT shift;
-
   gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
   machine_mode value_mode = GET_MODE (value);
-  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
-  if (shift == 0)
+  poly_int64 shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
+
+  if (known_zero (shift))
     return false;
 
   /* Use ashr rather than lshr for right shifts.  This is for the benefit
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	2017-10-23 17:25:54.176292301 +0100
+++ gcc/combine.c	2017-10-23 17:25:57.258181523 +0100
@@ -7675,8 +7675,9 @@ make_extraction (machine_mode mode, rtx
      are the same as for a register operation, since at present we don't
      have named patterns for aligned memory structures.  */
   struct extraction_insn insn;
-  if (get_best_reg_extraction_insn (&insn, pattern,
-				    GET_MODE_BITSIZE (inner_mode), mode))
+  unsigned int inner_size;
+  if (GET_MODE_BITSIZE (inner_mode).is_constant (&inner_size)
+      && get_best_reg_extraction_insn (&insn, pattern, inner_size, mode))
     {
       wanted_inner_reg_mode = insn.struct_mode.require ();
       pos_mode = insn.pos_mode;
@@ -7712,9 +7713,11 @@ make_extraction (machine_mode mode, rtx
 	 If it's a MEM we need to recompute POS relative to that.
 	 However, if we're extracting from (or inserting into) a register,
 	 we want to recompute POS relative to wanted_inner_mode.  */
-      int width = (MEM_P (inner)
-		   ? GET_MODE_BITSIZE (is_mode)
-		   : GET_MODE_BITSIZE (wanted_inner_mode));
+      int width;
+      if (!MEM_P (inner))
+	width = GET_MODE_BITSIZE (wanted_inner_mode);
+      else if (!GET_MODE_BITSIZE (is_mode).is_constant (&width))
+	return NULL_RTX;
 
       if (pos_rtx == 0)
 	pos = width - len - pos;
Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2017-10-23 17:16:50.360529627 +0100
+++ gcc/dse.c	2017-10-23 17:25:57.259181487 +0100
@@ -1728,7 +1728,7 @@ find_shift_sequence (poly_int64 access_s
 
       /* Try a wider mode if truncating the store mode to NEW_MODE
 	 requires a real instruction.  */
-      if (GET_MODE_BITSIZE (new_mode) < GET_MODE_BITSIZE (store_mode)
+      if (may_lt (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (store_mode))
 	  && !TRULY_NOOP_TRUNCATION_MODES_P (new_mode, store_mode))
 	continue;
 
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 17:18:47.659057624 +0100
+++ gcc/dwarf2out.c	2017-10-23 17:25:57.261181415 +0100
@@ -15339,7 +15339,8 @@ mem_loc_descriptor (rtx rtl, machine_mod
 	     We output CONST_DOUBLEs as blocks.  */
 	  if (mode == VOIDmode
 	      || (GET_MODE (rtl) == VOIDmode
-		  && GET_MODE_BITSIZE (mode) != HOST_BITS_PER_DOUBLE_INT))
+		  && may_ne (GET_MODE_BITSIZE (mode),
+			     HOST_BITS_PER_DOUBLE_INT)))
 	    break;
 	  type_die = base_type_for_mode (mode, SCALAR_INT_MODE_P (mode));
 	  if (type_die == NULL)
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 17:25:30.703136044 +0100
+++ gcc/expmed.c	2017-10-23 17:25:57.262181379 +0100
@@ -866,7 +866,7 @@ store_integral_bit_field (rtx op0, opt_s
   if (!MEM_P (op0)
       && !reverse
       && lowpart_bit_field_p (bitnum, bitsize, op0_mode.require ())
-      && bitsize == GET_MODE_BITSIZE (fieldmode)
+      && must_eq (bitsize, GET_MODE_BITSIZE (fieldmode))
       && optab_handler (movstrict_optab, fieldmode) != CODE_FOR_nothing)
     {
       struct expand_operand ops[2];
@@ -1637,9 +1637,10 @@ extract_bit_field_1 (rtx str_rtx, poly_u
       if (GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode))
 	{
 	  scalar_mode inner_mode = GET_MODE_INNER (tmode);
-	  unsigned int nunits = (GET_MODE_BITSIZE (GET_MODE (op0))
-				 / GET_MODE_UNIT_BITSIZE (tmode));
-	  if (!mode_for_vector (inner_mode, nunits).exists (&new_mode)
+	  poly_uint64 nunits;
+	  if (!multiple_p (GET_MODE_BITSIZE (GET_MODE (op0)),
+			   GET_MODE_UNIT_BITSIZE (tmode), &nunits)
+	      || !mode_for_vector (inner_mode, nunits).exists (&new_mode)
 	      || !VECTOR_MODE_P (new_mode)
 	      || GET_MODE_SIZE (new_mode) != GET_MODE_SIZE (GET_MODE (op0))
 	      || GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode)
@@ -2042,9 +2043,9 @@ extract_bit_field (rtx str_rtx, poly_uin
   machine_mode mode1;
 
   /* Handle -fstrict-volatile-bitfields in the cases where it applies.  */
-  if (GET_MODE_BITSIZE (GET_MODE (str_rtx)) > 0)
+  if (maybe_nonzero (GET_MODE_BITSIZE (GET_MODE (str_rtx))))
     mode1 = GET_MODE (str_rtx);
-  else if (target && GET_MODE_BITSIZE (GET_MODE (target)) > 0)
+  else if (target && maybe_nonzero (GET_MODE_BITSIZE (GET_MODE (target))))
     mode1 = GET_MODE (target);
   else
     mode1 = tmode;
@@ -2360,7 +2361,7 @@ extract_low_bits (machine_mode mode, mac
   if (GET_MODE_CLASS (mode) == MODE_CC || GET_MODE_CLASS (src_mode) == MODE_CC)
     return NULL_RTX;
 
-  if (GET_MODE_BITSIZE (mode) == GET_MODE_BITSIZE (src_mode)
+  if (must_eq (GET_MODE_BITSIZE (mode), GET_MODE_BITSIZE (src_mode))
       && targetm.modes_tieable_p (mode, src_mode))
     {
       rtx x = gen_lowpart_common (mode, src);
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:25:56.086223649 +0100
+++ gcc/expr.c	2017-10-23 17:25:57.263181343 +0100
@@ -245,7 +245,8 @@ convert_move (rtx to, rtx from, int unsi
 
   if (VECTOR_MODE_P (to_mode) || VECTOR_MODE_P (from_mode))
     {
-      gcc_assert (GET_MODE_BITSIZE (from_mode) == GET_MODE_BITSIZE (to_mode));
+      gcc_assert (must_eq (GET_MODE_BITSIZE (from_mode),
+			   GET_MODE_BITSIZE (to_mode)));
 
       if (VECTOR_MODE_P (to_mode))
 	from = simplify_gen_subreg (to_mode, from, GET_MODE (from), 0);
@@ -698,7 +699,8 @@ convert_modes (machine_mode mode, machin
      subreg operation.  */
   if (VECTOR_MODE_P (mode) && GET_MODE (x) == VOIDmode)
     {
-      gcc_assert (GET_MODE_BITSIZE (mode) == GET_MODE_BITSIZE (oldmode));
+      gcc_assert (must_eq (GET_MODE_BITSIZE (mode),
+			   GET_MODE_BITSIZE (oldmode)));
       return simplify_gen_subreg (mode, x, oldmode, 0);
     }
 
@@ -3677,7 +3679,8 @@ emit_move_insn_1 (rtx x, rtx y)
      only safe when simplify_subreg can convert MODE constants into integer
      constants.  At present, it can only do this reliably if the value
      fits within a HOST_WIDE_INT.  */
-  if (!CONSTANT_P (y) || GET_MODE_BITSIZE (mode) <= HOST_BITS_PER_WIDE_INT)
+  if (!CONSTANT_P (y)
+      || must_le (GET_MODE_BITSIZE (mode), HOST_BITS_PER_WIDE_INT))
     {
       rtx_insn *ret = emit_move_via_integer (mode, x, y, lra_in_progress);
 
@@ -4620,8 +4623,9 @@ optimize_bitfield_assignment_op (poly_ui
 				 machine_mode mode1, rtx str_rtx,
 				 tree to, tree src, bool reverse)
 {
+  /* str_mode is not guaranteed to be a scalar type.  */
   machine_mode str_mode = GET_MODE (str_rtx);
-  unsigned int str_bitsize = GET_MODE_BITSIZE (str_mode);
+  unsigned int str_bitsize;
   tree op0, op1;
   rtx value, result;
   optab binop;
@@ -4635,6 +4639,7 @@ optimize_bitfield_assignment_op (poly_ui
       || !pbitregion_start.is_constant (&bitregion_start)
       || !pbitregion_end.is_constant (&bitregion_end)
       || bitsize >= BITS_PER_WORD
+      || !GET_MODE_BITSIZE (str_mode).is_constant (&str_bitsize)
       || str_bitsize > BITS_PER_WORD
       || TREE_SIDE_EFFECTS (to)
       || TREE_THIS_VOLATILE (to))
@@ -5147,7 +5152,7 @@ expand_assignment (tree to, tree from, b
 	    }
 	  else
 	    {
-	      rtx temp = assign_stack_temp (GET_MODE (to_rtx),
+	      rtx temp = assign_stack_temp (to_mode,
 					    GET_MODE_SIZE (GET_MODE (to_rtx)));
 	      write_complex_part (temp, XEXP (to_rtx, 0), false);
 	      write_complex_part (temp, XEXP (to_rtx, 1), true);
@@ -6878,7 +6883,8 @@ store_field (rtx target, poly_int64 bits
 	{
 	  tree type = TREE_TYPE (exp);
 	  if (INTEGRAL_TYPE_P (type)
-	      && TYPE_PRECISION (type) < GET_MODE_BITSIZE (TYPE_MODE (type))
+	      && may_ne (TYPE_PRECISION (type),
+			 GET_MODE_BITSIZE (TYPE_MODE (type)))
 	      && must_eq (bitsize, TYPE_PRECISION (type)))
 	    {
 	      tree op = gimple_assign_rhs1 (nop_def);
@@ -10268,8 +10274,8 @@ expand_expr_real_1 (tree exp, rtx target
 	    if (known_zero (offset)
 	        && !reverse
 		&& tree_fits_uhwi_p (TYPE_SIZE (type))
-		&& (GET_MODE_BITSIZE (DECL_MODE (base))
-		    == tree_to_uhwi (TYPE_SIZE (type))))
+		&& must_eq (GET_MODE_BITSIZE (DECL_MODE (base)),
+			    tree_to_uhwi (TYPE_SIZE (type))))
 	      return expand_expr (build1 (VIEW_CONVERT_EXPR, type, base),
 				  target, tmode, modifier);
 	    if (TYPE_MODE (type) == BLKmode)
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 17:25:51.744379717 +0100
+++ gcc/fold-const.c	2017-10-23 17:25:57.264181307 +0100
@@ -4150,7 +4150,7 @@ optimize_bit_field_compare (location_t l
       || !known_size_p (plbitsize)
       || !plbitsize.is_constant (&lbitsize)
       || !plbitpos.is_constant (&lbitpos)
-      || lbitsize == GET_MODE_BITSIZE (lmode)
+      || must_eq (lbitsize, GET_MODE_BITSIZE (lmode))
       || offset != 0
       || TREE_CODE (linner) == PLACEHOLDER_EXPR
       || lvolatilep)
@@ -5275,8 +5275,9 @@ merge_ranges (int *pin_p, tree *plow, tr
 		switch (TREE_CODE (TREE_TYPE (low0)))
 		  {
 		  case ENUMERAL_TYPE:
-		    if (TYPE_PRECISION (TREE_TYPE (low0))
-			!= GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (low0))))
+		    if (may_ne (TYPE_PRECISION (TREE_TYPE (low0)),
+				GET_MODE_BITSIZE
+				  (TYPE_MODE (TREE_TYPE (low0)))))
 		      break;
 		    /* FALLTHROUGH */
 		  case INTEGER_TYPE:
@@ -5298,8 +5299,9 @@ merge_ranges (int *pin_p, tree *plow, tr
 		switch (TREE_CODE (TREE_TYPE (high1)))
 		  {
 		  case ENUMERAL_TYPE:
-		    if (TYPE_PRECISION (TREE_TYPE (high1))
-			!= GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (high1))))
+		    if (may_ne (TYPE_PRECISION (TREE_TYPE (high1)),
+				GET_MODE_BITSIZE
+				  (TYPE_MODE (TREE_TYPE (high1)))))
 		      break;
 		    /* FALLTHROUGH */
 		  case INTEGER_TYPE:
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	2017-10-23 17:25:51.747379609 +0100
+++ gcc/gimple-fold.c	2017-10-23 17:25:57.265181271 +0100
@@ -3616,7 +3616,7 @@ optimize_atomic_compare_exchange_p (gimp
       /* Don't optimize floating point expected vars, VIEW_CONVERT_EXPRs
 	 might not preserve all the bits.  See PR71716.  */
       || SCALAR_FLOAT_TYPE_P (etype)
-      || TYPE_PRECISION (etype) != GET_MODE_BITSIZE (TYPE_MODE (etype)))
+      || may_ne (TYPE_PRECISION (etype), GET_MODE_BITSIZE (TYPE_MODE (etype))))
     return false;
 
   tree weak = gimple_call_arg (stmt, 3);
Index: gcc/reload.c
===================================================================
--- gcc/reload.c	2017-10-23 17:25:43.543674491 +0100
+++ gcc/reload.c	2017-10-23 17:25:57.266181235 +0100
@@ -3121,10 +3121,11 @@ find_reloads (rtx_insn *insn, int replac
 			   || (REG_P (operand)
 			       && REGNO (operand) >= FIRST_PSEUDO_REGISTER))
 			  && (WORD_REGISTER_OPERATIONS
-			      || ((GET_MODE_BITSIZE (GET_MODE (operand))
-				   < BIGGEST_ALIGNMENT)
-				  && paradoxical_subreg_p (operand_mode[i],
-							   GET_MODE (operand)))
+			      || (((may_lt
+				    (GET_MODE_BITSIZE (GET_MODE (operand)),
+				     BIGGEST_ALIGNMENT))
+				   && (paradoxical_subreg_p
+				       (operand_mode[i], GET_MODE (operand)))))
 			      || BYTES_BIG_ENDIAN
 			      || ((GET_MODE_SIZE (operand_mode[i])
 				   <= UNITS_PER_WORD)
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2017-10-23 17:25:44.492640380 +0100
+++ gcc/reload1.c	2017-10-23 17:25:57.267181199 +0100
@@ -2146,7 +2146,11 @@ alter_reg (int i, int from_reg, bool don
       unsigned int inherent_align = GET_MODE_ALIGNMENT (mode);
       machine_mode wider_mode = wider_subreg_mode (mode, reg_max_ref_mode[i]);
       poly_uint64 total_size = GET_MODE_SIZE (wider_mode);
-      unsigned int min_align = GET_MODE_BITSIZE (reg_max_ref_mode[i]);
+      /* ??? Seems strange to derive the minimum alignment from the size,
+	 but that's the traditional behavior.  For polynomial-size modes,
+	 the natural extension is to use the minimum possible size.  */
+      unsigned int min_align
+	= constant_lower_bound (GET_MODE_BITSIZE (reg_max_ref_mode[i]));
       poly_int64 adjust = 0;
 
       something_was_spilled = true;
Index: gcc/stor-layout.c
===================================================================
--- gcc/stor-layout.c	2017-10-23 17:25:51.753379393 +0100
+++ gcc/stor-layout.c	2017-10-23 17:25:57.267181199 +0100
@@ -410,7 +410,6 @@ int_mode_for_mode (machine_mode mode)
 bitwise_mode_for_mode (machine_mode mode)
 {
   /* Quick exit if we already have a suitable mode.  */
-  unsigned int bitsize = GET_MODE_BITSIZE (mode);
   scalar_int_mode int_mode;
   if (is_a <scalar_int_mode> (mode, &int_mode)
       && GET_MODE_BITSIZE (int_mode) <= MAX_FIXED_MODE_SIZE)
@@ -419,6 +418,8 @@ bitwise_mode_for_mode (machine_mode mode
   /* Reuse the sanity checks from int_mode_for_mode.  */
   gcc_checking_assert ((int_mode_for_mode (mode), true));
 
+  poly_int64 bitsize = GET_MODE_BITSIZE (mode);
+
   /* Try to replace complex modes with complex modes.  In general we
      expect both components to be processed independently, so we only
      care whether there is a register for the inner mode.  */
@@ -433,7 +434,8 @@ bitwise_mode_for_mode (machine_mode mode
 
   /* Try to replace vector modes with vector modes.  Also try using vector
      modes if an integer mode would be too big.  */
-  if (VECTOR_MODE_P (mode) || bitsize > MAX_FIXED_MODE_SIZE)
+  if (VECTOR_MODE_P (mode)
+      || may_gt (bitsize, MAX_FIXED_MODE_SIZE))
     {
       machine_mode trial = mode;
       if ((GET_MODE_CLASS (trial) == MODE_VECTOR_INT
@@ -1771,7 +1773,7 @@ compute_record_mode (tree type)
      does not apply to unions.  */
   if (TREE_CODE (type) == RECORD_TYPE && mode != VOIDmode
       && tree_fits_uhwi_p (TYPE_SIZE (type))
-      && GET_MODE_BITSIZE (mode) == tree_to_uhwi (TYPE_SIZE (type)))
+      && must_eq (GET_MODE_BITSIZE (mode), tree_to_uhwi (TYPE_SIZE (type))))
     ;
   else
     mode = mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1).else_blk ();
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2017-10-23 17:25:51.753379393 +0100
+++ gcc/targhooks.c	2017-10-23 17:25:57.267181199 +0100
@@ -1143,7 +1143,7 @@ default_secondary_reload (bool in_p ATTR
 default_secondary_memory_needed_mode (machine_mode mode)
 {
   if (!targetm.lra_p ()
-      && GET_MODE_BITSIZE (mode) < BITS_PER_WORD
+      && must_lt (GET_MODE_BITSIZE (mode), BITS_PER_WORD)
       && INTEGRAL_MODE_P (mode))
     return mode_for_size (BITS_PER_WORD, GET_MODE_CLASS (mode), 0).require ();
   return mode;
Index: gcc/tree-if-conv.c
===================================================================
--- gcc/tree-if-conv.c	2017-10-23 11:41:25.512892753 +0100
+++ gcc/tree-if-conv.c	2017-10-23 17:25:57.268181163 +0100
@@ -2228,7 +2228,10 @@ predicate_mem_writes (loop_p loop)
 	      tree ref, addr, ptr, mask;
 	      gcall *new_stmt;
 	      gimple_seq stmts = NULL;
-	      int bitsize = GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (lhs)));
+	      machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+	      /* We checked before setting GF_PLF_2 that an equivalent
+		 integer mode exists.  */
+	      int bitsize = GET_MODE_BITSIZE (mode).to_constant ();
 	      ref = TREE_CODE (lhs) == SSA_NAME ? rhs : lhs;
 	      mark_addressable (ref);
 	      addr = force_gimple_operand_gsi (&gsi, build_fold_addr_expr (ref),
Index: gcc/tree-ssa-strlen.c
===================================================================
--- gcc/tree-ssa-strlen.c	2017-10-23 17:17:01.436033953 +0100
+++ gcc/tree-ssa-strlen.c	2017-10-23 17:25:57.268181163 +0100
@@ -2132,7 +2132,7 @@ handle_builtin_memcmp (gimple_stmt_itera
 	  location_t loc = gimple_location (stmt2);
 	  tree type, off;
 	  type = build_nonstandard_integer_type (leni, 1);
-	  gcc_assert (GET_MODE_BITSIZE (TYPE_MODE (type)) == leni);
+	  gcc_assert (must_eq (GET_MODE_BITSIZE (TYPE_MODE (type)), leni));
 	  tree ptrtype = build_pointer_type_for_mode (char_type_node,
 						      ptr_mode, true);
 	  off = build_int_cst (ptrtype, 0);
Index: gcc/tree-vect-patterns.c
===================================================================
--- gcc/tree-vect-patterns.c	2017-10-23 17:25:51.763379034 +0100
+++ gcc/tree-vect-patterns.c	2017-10-23 17:25:57.268181163 +0100
@@ -3388,8 +3388,8 @@ adjust_bool_pattern (tree var, tree out_
       gcc_assert (TREE_CODE_CLASS (rhs_code) == tcc_comparison);
       if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE
 	  || !TYPE_UNSIGNED (TREE_TYPE (rhs1))
-	  || (TYPE_PRECISION (TREE_TYPE (rhs1))
-	      != GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs1)))))
+	  || may_ne (TYPE_PRECISION (TREE_TYPE (rhs1)),
+		     GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (rhs1)))))
 	{
 	  scalar_mode mode = SCALAR_TYPE_MODE (TREE_TYPE (rhs1));
 	  itype
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-10-23 17:25:51.767378890 +0100
+++ gcc/tree-vect-stmts.c	2017-10-23 17:25:57.269181128 +0100
@@ -3585,7 +3585,7 @@ vectorizable_simd_clone_call (gimple *st
 		  if (simd_clone_subparts (atype)
 		      < simd_clone_subparts (arginfo[i].vectype))
 		    {
-		      unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
+		      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (atype));
 		      k = (simd_clone_subparts (arginfo[i].vectype)
 			   / simd_clone_subparts (atype));
 		      gcc_assert ((k & (k - 1)) == 0);
@@ -3749,7 +3749,8 @@ vectorizable_simd_clone_call (gimple *st
 	  if (simd_clone_subparts (vectype) < nunits)
 	    {
 	      unsigned int k, l;
-	      unsigned int prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
+	      poly_uint64 prec = GET_MODE_BITSIZE (TYPE_MODE (vectype));
+	      poly_uint64 bytes = GET_MODE_SIZE (TYPE_MODE (vectype));
 	      k = nunits / simd_clone_subparts (vectype);
 	      gcc_assert ((k & (k - 1)) == 0);
 	      for (l = 0; l < k; l++)
@@ -3759,8 +3760,7 @@ vectorizable_simd_clone_call (gimple *st
 		    {
 		      t = build_fold_addr_expr (new_temp);
 		      t = build2 (MEM_REF, vectype, t,
-				  build_int_cst (TREE_TYPE (t),
-						 l * prec / BITS_PER_UNIT));
+				  build_int_cst (TREE_TYPE (t), l * bytes));
 		    }
 		  else
 		    t = build3 (BIT_FIELD_REF, vectype, new_temp,
Index: gcc/valtrack.c
===================================================================
--- gcc/valtrack.c	2017-10-23 17:16:50.376527466 +0100
+++ gcc/valtrack.c	2017-10-23 17:25:57.269181128 +0100
@@ -606,10 +606,13 @@ dead_debug_insert_temp (struct dead_debu
 	  usesp = &cur->next;
 	  *tailp = cur->next;
 	  cur->next = NULL;
+	  /* "may" rather than "must" because we want (for example)
+	     N V4SFs to win over plain V4SF even though N might be 1.  */
+	  rtx candidate = *DF_REF_REAL_LOC (cur->use);
 	  if (!reg
-	      || (GET_MODE_BITSIZE (GET_MODE (reg))
-		  < GET_MODE_BITSIZE (GET_MODE (*DF_REF_REAL_LOC (cur->use)))))
-	    reg = *DF_REF_REAL_LOC (cur->use);
+	      || may_lt (GET_MODE_BITSIZE (GET_MODE (reg)),
+			 GET_MODE_BITSIZE (GET_MODE (candidate))))
+	    reg = candidate;
 	}
       else
 	tailp = &(*tailp)->next;
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	2017-10-23 17:22:18.236826658 +0100
+++ gcc/varasm.c	2017-10-23 17:25:57.271181056 +0100
@@ -843,12 +843,10 @@ mergeable_constant_section (machine_mode
 			    unsigned HOST_WIDE_INT align ATTRIBUTE_UNUSED,
 			    unsigned int flags ATTRIBUTE_UNUSED)
 {
-  unsigned int modesize = GET_MODE_BITSIZE (mode);
-
   if (HAVE_GAS_SHF_MERGE && flag_merge_constants
       && mode != VOIDmode
       && mode != BLKmode
-      && modesize <= align
+      && must_le (GET_MODE_BITSIZE (mode), align)
       && align >= 8
       && align <= 256
       && (align & (align - 1)) == 0)
Index: gcc/config/sh/sh.h
===================================================================
--- gcc/config/sh/sh.h	2017-10-23 11:41:23.004847152 +0100
+++ gcc/config/sh/sh.h	2017-10-23 17:25:57.259181487 +0100
@@ -468,7 +468,9 @@ #define FASTEST_ALIGNMENT (32)
 #define LOCAL_ALIGNMENT(TYPE, ALIGN) \
   ((GET_MODE_CLASS (TYPE_MODE (TYPE)) == MODE_COMPLEX_INT \
     || GET_MODE_CLASS (TYPE_MODE (TYPE)) == MODE_COMPLEX_FLOAT) \
-   ? (unsigned) MIN (BIGGEST_ALIGNMENT, GET_MODE_BITSIZE (TYPE_MODE (TYPE))) \
+   ? (unsigned) MIN (BIGGEST_ALIGNMENT, \
+		     GET_MODE_BITSIZE (as_a <fixed_size_mode> \
+				       (TYPE_MODE (TYPE)))) \
    : (unsigned) DATA_ALIGNMENT(TYPE, ALIGN))
 
 /* Make arrays of chars word-aligned for the same reasons.  */
Index: gcc/ada/gcc-interface/misc.c
===================================================================
--- gcc/ada/gcc-interface/misc.c	2017-10-23 17:25:54.174292373 +0100
+++ gcc/ada/gcc-interface/misc.c	2017-10-23 17:25:57.256181595 +0100
@@ -1298,14 +1298,14 @@ enumerate_modes (void (*f) (const char *
 	  }
 
       /* If no predefined C types were found, register the mode itself.  */
-      int nunits, precision;
+      int nunits, precision, bitsize;
       if (!skip_p
 	  && GET_MODE_NUNITS (i).is_constant (&nunits)
-	  && GET_MODE_PRECISION (i).is_constant (&precision))
+	  && GET_MODE_PRECISION (i).is_constant (&precision)
+	  && GET_MODE_BITSIZE (i).is_constant (&bitsize))
 	f (GET_MODE_NAME (i), digs, complex_p,
 	   vector_p ? nunits : 0, float_rep,
-	   precision, GET_MODE_BITSIZE (i),
-	   GET_MODE_ALIGNMENT (i));
+	   precision, bitsize, GET_MODE_ALIGNMENT (i));
     }
 }
 
Index: gcc/c-family/c-ubsan.c
===================================================================
--- gcc/c-family/c-ubsan.c	2017-08-21 10:41:51.265103275 +0100
+++ gcc/c-family/c-ubsan.c	2017-10-23 17:25:57.256181595 +0100
@@ -132,7 +132,8 @@ ubsan_instrument_shift (location_t loc,
   /* If this is not a signed operation, don't perform overflow checks.
      Also punt on bit-fields.  */
   if (TYPE_OVERFLOW_WRAPS (type0)
-      || GET_MODE_BITSIZE (TYPE_MODE (type0)) != TYPE_PRECISION (type0)
+      || may_ne (GET_MODE_BITSIZE (TYPE_MODE (type0)),
+		 TYPE_PRECISION (type0))
       || !sanitize_flags_p (SANITIZE_SHIFT_BASE))
     ;
 

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [107/nnn] poly_int: GET_MODE_SIZE
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (105 preceding siblings ...)
  2017-10-23 17:43 ` [106/nnn] poly_int: GET_MODE_BITSIZE Richard Sandiford
@ 2017-10-23 17:48 ` Richard Sandiford
  2017-11-21  7:48   ` Jeff Law
  2017-10-24  9:25 ` [000/nnn] poly_int: representation of runtime offsets and sizes Eric Botcazou
  107 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-23 17:48 UTC (permalink / raw)
  To: gcc-patches

This patch changes GET_MODE_SIZE from unsigned short to poly_uint16.
The non-mechanical parts were handled by previous patches.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* machmode.h (mode_size): Change from unsigned short to
	poly_uint16_pod.
	(mode_to_bytes): Return a poly_uint16 rather than an unsigned short.
	(GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
	or if measurement_type is not polynomial.
	(fixed_size_mode::includes_p): Check for constant-sized modes.
	* genmodes.c (emit_mode_size_inline): Make mode_size_inline
	return a poly_uint16 rather than an unsigned short.
	(emit_mode_size): Change the type of mode_size from unsigned short
	to poly_uint16_pod.  Use ZERO_COEFFS for the initializer.
	(emit_mode_adjustments): Cope with polynomial vector sizes.
	* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
	for GET_MODE_SIZE.
	* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
	for GET_MODE_SIZE.
	* auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial.
	* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise.
	* caller-save.c (setup_save_areas): Likewise.
	(replace_reg_with_saved_mem): Likewise.
	* calls.c (emit_library_call_value_1): Likewise.
	* combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise.
	* combine.c (simplify_set, make_extraction, simplify_shift_const_1)
	(gen_lowpart_for_combine): Likewise.
	* convert.c (convert_to_integer_1): Likewise.
	* cse.c (equiv_constant, cse_insn): Likewise.
	* cselib.c (autoinc_split, cselib_hash_rtx): Likewise.
	(cselib_subst_to_values): Likewise.
	* dce.c (word_dce_process_block): Likewise.
	* df-problems.c (df_word_lr_mark_ref): Likewise.
	* dwarf2cfi.c (init_one_dwarf_reg_size): Likewise.
	* dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor)
	(concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor)
	(rtl_for_decl_location): Likewise.
	* emit-rtl.c (gen_highpart, widen_memory_access): Likewise.
	* expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise.
	* expr.c (emit_group_load_1, clear_storage_hints): Likewise.
	(emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise.
	(expand_expr_real_1): Likewise.
	* function.c (assign_parm_setup_block_p, assign_parm_setup_block)
	(pad_below): Likewise.
	* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
	* gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise.
	* ira.c (get_subreg_tracking_sizes): Likewise.
	* ira-build.c (ira_create_allocno_objects): Likewise.
	* ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise.
	(ira_sort_regnos_for_alter_reg): Likewise.
	* ira-costs.c (record_operand_costs): Likewise.
	* lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn)
	(resolve_simple_move): Likewise.
	* lra-constraints.c (get_reload_reg, operands_match_p): Likewise.
	(process_addr_reg, simplify_operand_subreg, lra_constraints): Likewise.
	(CONST_POOL_OK_P): Reject variable-sized modes.
	* lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare)
	(add_pseudo_to_slot, lra_spill): Likewise.
	* omp-low.c (omp_clause_aligned_alignment): Likewise.
	* optabs-query.c (get_best_extraction_insn): Likewise.
	* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
	* optabs.c (expand_vec_perm, expand_vec_cond_expr): Likewise.
	(expand_mult_highpart, valid_multiword_target_p): Likewise.
	* recog.c (offsettable_address_addr_space_p): Likewise.
	* regcprop.c (maybe_mode_change): Likewise.
	* reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise.
	* regrename.c (build_def_use): Likewise.
	* regstat.c (dump_reg_info): Likewise.
	* reload.c (complex_word_subreg_p, push_reload, find_dummy_reload)
	(find_reloads, find_reloads_subreg_address): Likewise.
	* reload1.c (eliminate_regs_1): Likewise.
	* rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise.
	* simplify-rtx.c (avoid_constant_pool_reference): Likewise.
	(simplify_binary_operation_1, simplify_subreg): Likewise.
	* targhooks.c (default_function_arg_padding): Likewise.
	(default_hard_regno_nregs, default_class_max_nregs): Likewise.
	* tree-cfg.c (verify_gimple_assign_binary): Likewise.
	(verify_gimple_assign_ternary): Likewise.
	* tree-inline.c (estimate_move_cost): Likewise.
	* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
	* tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise.
	(get_address_cost_ainc): Likewise.
	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise.
	(vect_supportable_dr_alignment): Likewise.
	* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
	(vectorizable_reduction): Likewise.
	* tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift)
	(vectorizable_operation, vectorizable_load): Likewise.
	* tree.c (build_same_sized_truth_vector_type): Likewise.
	* valtrack.c (cleanup_auto_inc_dec): Likewise.
	* var-tracking.c (emit_note_insn_var_location): Likewise.
	* config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>.
	(ADDR_VEC_ALIGN): Likewise.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2017-10-23 17:25:57.265181271 +0100
+++ gcc/machmode.h	2017-10-23 17:25:59.436103237 +0100
@@ -22,7 +22,7 @@ #define HAVE_MACHINE_MODES
 
 typedef opt_mode<machine_mode> opt_machine_mode;
 
-extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
+extern CONST_MODE_SIZE poly_uint16_pod mode_size[NUM_MACHINE_MODES];
 extern const poly_uint16_pod mode_precision[NUM_MACHINE_MODES];
 extern const unsigned char mode_inner[NUM_MACHINE_MODES];
 extern const poly_uint16_pod mode_nunits[NUM_MACHINE_MODES];
@@ -514,7 +514,7 @@ complex_mode::includes_p (machine_mode m
 
 /* Return the base GET_MODE_SIZE value for MODE.  */
 
-ALWAYS_INLINE unsigned short
+ALWAYS_INLINE poly_uint16
 mode_to_bytes (machine_mode mode)
 {
 #if GCC_VERSION >= 4001
@@ -596,7 +596,29 @@ mode_to_nunits (machine_mode mode)
 
 /* Get the size in bytes of an object of mode MODE.  */
 
-#define GET_MODE_SIZE(MODE) (mode_to_bytes (MODE))
+#if ONLY_FIXED_SIZE_MODES
+#define GET_MODE_SIZE(MODE) ((unsigned short) mode_to_bytes (MODE).coeffs[0])
+#else
+ALWAYS_INLINE poly_uint16
+GET_MODE_SIZE (machine_mode mode)
+{
+  return mode_to_bytes (mode);
+}
+
+template<typename T>
+ALWAYS_INLINE typename if_poly<typename T::measurement_type>::t
+GET_MODE_SIZE (const T &mode)
+{
+  return mode_to_bytes (mode);
+}
+
+template<typename T>
+ALWAYS_INLINE typename if_nonpoly<typename T::measurement_type>::t
+GET_MODE_SIZE (const T &mode)
+{
+  return mode_to_bytes (mode).coeffs[0];
+}
+#endif
 
 /* Get the size in bits of an object of mode MODE.  */
 
@@ -761,9 +783,9 @@ #define GET_MODE_COMPLEX_MODE(MODE) ((ma
 /* Return true if MODE has a fixed size.  */
 
 inline bool
-fixed_size_mode::includes_p (machine_mode)
+fixed_size_mode::includes_p (machine_mode mode)
 {
-  return true;
+  return mode_to_bytes (mode).is_constant ();
 }
 
 /* Wrapper for mode arguments to target macros, so that if a target
Index: gcc/genmodes.c
===================================================================
--- gcc/genmodes.c	2017-10-23 17:25:54.178292230 +0100
+++ gcc/genmodes.c	2017-10-23 17:25:59.423103705 +0100
@@ -987,10 +987,10 @@ inline __attribute__((__always_inline__)
 #else\n\
 extern __inline__ __attribute__((__always_inline__, __gnu_inline__))\n\
 #endif\n\
-unsigned short\n\
+poly_uint16\n\
 mode_size_inline (machine_mode mode)\n\
 {\n\
-  extern %sunsigned short mode_size[NUM_MACHINE_MODES];\n\
+  extern %spoly_uint16_pod mode_size[NUM_MACHINE_MODES];\n\
   gcc_assert (mode >= 0 && mode < NUM_MACHINE_MODES);\n\
   switch (mode)\n\
     {\n", adj_bytesize ? "" : "const ");
@@ -1376,11 +1376,11 @@ emit_mode_size (void)
   int c;
   struct mode_data *m;
 
-  print_maybe_const_decl ("%sunsigned short", "mode_size",
+  print_maybe_const_decl ("%spoly_uint16_pod", "mode_size",
 			  "NUM_MACHINE_MODES", bytesize);
 
   for_all_modes (c, m)
-    tagged_printf ("%u", m->bytesize, m->name);
+    tagged_printf ("{ %u" ZERO_COEFFS " }", m->bytesize, m->name);
 
   print_closer ();
 }
@@ -1647,17 +1647,33 @@ emit_mode_adjustments (void)
 \nvoid\
 \ninit_adjust_machine_modes (void)\
 \n{\
-\n  size_t s ATTRIBUTE_UNUSED;");
+\n  poly_uint16 ps ATTRIBUTE_UNUSED;\n\
+  size_t s ATTRIBUTE_UNUSED;");
 
   /* Size adjustments must be propagated to all containing modes.
      A size adjustment forces us to recalculate the alignment too.  */
   for (a = adj_bytesize; a; a = a->next)
     {
-      printf ("\n  /* %s:%d */\n  s = %s;\n",
-	      a->file, a->line, a->adjustment);
-      printf ("  mode_size[E_%smode] = s;\n", a->mode->name);
-      printf ("  mode_unit_size[E_%smode] = s;\n", a->mode->name);
-      printf ("  mode_base_align[E_%smode] = s & (~s + 1);\n",
+      printf ("\n  /* %s:%d */\n", a->file, a->line);
+      switch (a->mode->cl)
+	{
+	case MODE_VECTOR_INT:
+	case MODE_VECTOR_FLOAT:
+	case MODE_VECTOR_FRACT:
+	case MODE_VECTOR_UFRACT:
+	case MODE_VECTOR_ACCUM:
+	case MODE_VECTOR_UACCUM:
+	  printf ("  ps = %s;\n", a->adjustment);
+	  printf ("  s = mode_unit_size[E_%smode];\n", a->mode->name);
+	  break;
+
+	default:
+	  printf ("  ps = s = %s;\n", a->adjustment);
+	  printf ("  mode_unit_size[E_%smode] = s;\n", a->mode->name);
+	  break;
+	}
+      printf ("  mode_size[E_%smode] = ps;\n", a->mode->name);
+      printf ("  mode_base_align[E_%smode] = known_alignment (ps);\n",
 	      a->mode->name);
 
       for (m = a->mode->contained; m; m = m->next_cont)
@@ -1678,11 +1694,12 @@ emit_mode_adjustments (void)
 	    case MODE_VECTOR_UFRACT:
 	    case MODE_VECTOR_ACCUM:
 	    case MODE_VECTOR_UACCUM:
-	      printf ("  mode_size[E_%smode] = %d*s;\n",
+	      printf ("  mode_size[E_%smode] = %d * ps;\n",
 		      m->name, m->ncomponents);
 	      printf ("  mode_unit_size[E_%smode] = s;\n", m->name);
-	      printf ("  mode_base_align[E_%smode] = (%d*s) & (~(%d*s)+1);\n",
-		      m->name, m->ncomponents, m->ncomponents);
+	      printf ("  mode_base_align[E_%smode]"
+		      " = known_alignment (%d * ps);\n",
+		      m->name, m->ncomponents);
 	      break;
 
 	    default:
Index: gcc/lto-streamer-in.c
===================================================================
--- gcc/lto-streamer-in.c	2017-10-23 17:25:54.179292194 +0100
+++ gcc/lto-streamer-in.c	2017-10-23 17:25:59.434103309 +0100
@@ -1604,7 +1604,7 @@ lto_input_mode_table (struct lto_file_de
     {
       enum mode_class mclass
 	= bp_unpack_enum (&bp, mode_class, MAX_MODE_CLASS);
-      unsigned int size = bp_unpack_value (&bp, 8);
+      poly_uint16 size = bp_unpack_poly_value (&bp, 16);
       poly_uint16 prec = bp_unpack_poly_value (&bp, 16);
       machine_mode inner = (machine_mode) bp_unpack_value (&bp, 8);
       poly_uint16 nunits = bp_unpack_poly_value (&bp, 16);
@@ -1638,7 +1638,7 @@ lto_input_mode_table (struct lto_file_de
 	     pass ? mr = (machine_mode) (mr + 1)
 		  : mr = GET_MODE_WIDER_MODE (mr).else_void ())
 	  if (GET_MODE_CLASS (mr) != mclass
-	      || GET_MODE_SIZE (mr) != size
+	      || may_ne (GET_MODE_SIZE (mr), size)
 	      || may_ne (GET_MODE_PRECISION (mr), prec)
 	      || (inner == m
 		  ? GET_MODE_INNER (mr) != mr
Index: gcc/lto-streamer-out.c
===================================================================
--- gcc/lto-streamer-out.c	2017-10-23 17:25:54.180292158 +0100
+++ gcc/lto-streamer-out.c	2017-10-23 17:25:59.435103273 +0100
@@ -2772,7 +2772,7 @@ lto_write_mode_table (void)
 	    continue;
 	  bp_pack_value (&bp, m, 8);
 	  bp_pack_enum (&bp, mode_class, MAX_MODE_CLASS, GET_MODE_CLASS (m));
-	  bp_pack_value (&bp, GET_MODE_SIZE (m), 8);
+	  bp_pack_poly_value (&bp, GET_MODE_SIZE (m), 16);
 	  bp_pack_poly_value (&bp, GET_MODE_PRECISION (m), 16);
 	  bp_pack_value (&bp, GET_MODE_INNER (m), 8);
 	  bp_pack_poly_value (&bp, GET_MODE_NUNITS (m), 16);
Index: gcc/auto-inc-dec.c
===================================================================
--- gcc/auto-inc-dec.c	2017-10-23 17:25:36.142940510 +0100
+++ gcc/auto-inc-dec.c	2017-10-23 17:25:59.396104675 +0100
@@ -601,7 +601,7 @@ try_merge (void)
     inc_insn.reg_res : mem_insn.reg0;
 
   /* The width of the mem being accessed.  */
-  int size = GET_MODE_SIZE (GET_MODE (mem));
+  poly_int64 size = GET_MODE_SIZE (GET_MODE (mem));
   rtx_insn *last_insn = NULL;
   machine_mode reg_mode = GET_MODE (inc_reg);
 
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	2017-10-23 17:25:41.647742640 +0100
+++ gcc/builtins.c	2017-10-23 17:25:59.397104639 +0100
@@ -5839,7 +5839,7 @@ expand_ifn_atomic_compare_exchange_into_
   for (z = 4; z < 6; z++)
     vec->quick_push (gimple_call_arg (call, z));
   /* At present we only have BUILT_IN_ATOMIC_COMPARE_EXCHANGE_{1,2,4,8,16}.  */
-  unsigned int bytes_log2 = exact_log2 (GET_MODE_SIZE (mode));
+  unsigned int bytes_log2 = exact_log2 (GET_MODE_SIZE (mode).to_constant ());
   gcc_assert (bytes_log2 < 5);
   built_in_function fncode
     = (built_in_function) ((int) BUILT_IN_ATOMIC_COMPARE_EXCHANGE_1
Index: gcc/caller-save.c
===================================================================
--- gcc/caller-save.c	2017-10-23 17:16:50.356530167 +0100
+++ gcc/caller-save.c	2017-10-23 17:25:59.397104639 +0100
@@ -607,9 +607,9 @@ setup_save_areas (void)
 		    break;
 		}
 	      if (k < 0
-		  && (GET_MODE_SIZE (regno_save_mode[regno][1])
-		      <= GET_MODE_SIZE (regno_save_mode
-					[saved_reg2->hard_regno][1])))
+		  && must_le (GET_MODE_SIZE (regno_save_mode[regno][1]),
+			      GET_MODE_SIZE (regno_save_mode
+					     [saved_reg2->hard_regno][1])))
 		{
 		  saved_reg->slot
 		    = adjust_address_nv
@@ -631,8 +631,8 @@ setup_save_areas (void)
 		  slot = prev_save_slots[j];
 		  if (slot == NULL_RTX)
 		    continue;
-		  if (GET_MODE_SIZE (regno_save_mode[regno][1])
-		      <= GET_MODE_SIZE (GET_MODE (slot))
+		  if (must_le (GET_MODE_SIZE (regno_save_mode[regno][1]),
+			       GET_MODE_SIZE (GET_MODE (slot)))
 		      && best_slot_num < 0)
 		    best_slot_num = j;
 		  if (GET_MODE (slot) == regno_save_mode[regno][1])
@@ -1147,7 +1147,7 @@ replace_reg_with_saved_mem (rtx *loc,
 	    machine_mode smode = save_mode[regno];
 	    gcc_assert (smode != VOIDmode);
 	    if (hard_regno_nregs (regno, smode) > 1)
-	      smode = mode_for_size (GET_MODE_SIZE (mode) / nregs,
+	      smode = mode_for_size (exact_div (GET_MODE_SIZE (mode), nregs),
 				     GET_MODE_CLASS (mode), 0).require ();
 	    XVECEXP (mem, 0, i) = gen_rtx_REG (smode, regno + i);
 	  }
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-10-23 17:25:57.257181559 +0100
+++ gcc/calls.c	2017-10-23 17:25:59.398104603 +0100
@@ -4496,7 +4496,7 @@ emit_library_call_value_1 (int retval, r
   rtx mem_value = 0;
   rtx valreg;
   int pcc_struct_value = 0;
-  int struct_value_size = 0;
+  poly_int64 struct_value_size = 0;
   int flags;
   int reg_parm_stack_space = 0;
   poly_int64 needed;
@@ -4735,7 +4735,7 @@ emit_library_call_value_1 (int retval, r
 	   end it should be padded.  */
 	argvec[count].locate.where_pad =
 	  BLOCK_REG_PADDING (mode, NULL_TREE,
-			     GET_MODE_SIZE (mode) <= UNITS_PER_WORD);
+			     must_le (GET_MODE_SIZE (mode), UNITS_PER_WORD));
 #endif
 
       targetm.calls.function_arg_advance (args_so_far, mode, (tree) 0, true);
@@ -4986,9 +4986,6 @@ emit_library_call_value_1 (int retval, r
       rtx val = argvec[argnum].value;
       rtx reg = argvec[argnum].reg;
       int partial = argvec[argnum].partial;
-#ifdef BLOCK_REG_PADDING
-      int size = 0;
-#endif
       
       /* Handle calls that pass values in multiple non-contiguous
 	 locations.  The PA64 has examples of this for library calls.  */
@@ -4998,19 +4995,19 @@ emit_library_call_value_1 (int retval, r
         {
 	  emit_move_insn (reg, val);
 #ifdef BLOCK_REG_PADDING
-	  size = GET_MODE_SIZE (argvec[argnum].mode);
+	  poly_int64 size = GET_MODE_SIZE (argvec[argnum].mode);
 
 	  /* Copied from load_register_parameters.  */
 
 	  /* Handle case where we have a value that needs shifting
 	     up to the msb.  eg. a QImode value and we're padding
 	     upward on a BYTES_BIG_ENDIAN machine.  */
-	  if (size < UNITS_PER_WORD
+	  if (must_lt (size, UNITS_PER_WORD)
 	      && (argvec[argnum].locate.where_pad
 		  == (BYTES_BIG_ENDIAN ? PAD_UPWARD : PAD_DOWNWARD)))
 	    {
 	      rtx x;
-	      int shift = (UNITS_PER_WORD - size) * BITS_PER_UNIT;
+	      poly_int64 shift = (UNITS_PER_WORD - size) * BITS_PER_UNIT;
 
 	      /* Assigning REG here rather than a temp makes CALL_FUSAGE
 		 report the whole reg as used.  Strictly speaking, the
Index: gcc/combine-stack-adj.c
===================================================================
--- gcc/combine-stack-adj.c	2017-09-21 11:53:16.788928404 +0100
+++ gcc/combine-stack-adj.c	2017-10-23 17:25:59.398104603 +0100
@@ -622,11 +622,11 @@ combine_stack_adjustments_for_block (bas
 	  if (MEM_P (dest)
 	      && ((STACK_GROWS_DOWNWARD
 		   ? (GET_CODE (XEXP (dest, 0)) == PRE_DEC
-		      && last_sp_adjust
-			 == (HOST_WIDE_INT) GET_MODE_SIZE (GET_MODE (dest)))
+		      && must_eq (last_sp_adjust,
+				  GET_MODE_SIZE (GET_MODE (dest))))
 		   : (GET_CODE (XEXP (dest, 0)) == PRE_INC
-		      && last_sp_adjust
-		         == -(HOST_WIDE_INT) GET_MODE_SIZE (GET_MODE (dest))))
+		      && must_eq (-last_sp_adjust,
+				  GET_MODE_SIZE (GET_MODE (dest)))))
 		  || ((STACK_GROWS_DOWNWARD
 		       ? last_sp_adjust >= 0 : last_sp_adjust <= 0)
 		      && GET_CODE (XEXP (dest, 0)) == PRE_MODIFY
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	2017-10-23 17:25:57.258181523 +0100
+++ gcc/combine.c	2017-10-23 17:25:59.400104531 +0100
@@ -6902,10 +6902,10 @@ simplify_set (rtx x)
 
   if (GET_CODE (src) == SUBREG && subreg_lowpart_p (src)
       && !OBJECT_P (SUBREG_REG (src))
-      && (((GET_MODE_SIZE (GET_MODE (src)) + (UNITS_PER_WORD - 1))
-	   / UNITS_PER_WORD)
-	  == ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))
-	       + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))
+      && (known_equal_after_align_up
+	  (GET_MODE_SIZE (GET_MODE (src)),
+	   GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))),
+	   UNITS_PER_WORD))
       && (WORD_REGISTER_OPERATIONS || !paradoxical_subreg_p (src))
       && ! (REG_P (dest) && REGNO (dest) < FIRST_PSEUDO_REGISTER
 	    && !REG_CAN_CHANGE_MODE_P (REGNO (dest),
@@ -7741,7 +7741,7 @@ make_extraction (machine_mode mode, rtx
       && ! mode_dependent_address_p (XEXP (inner, 0), MEM_ADDR_SPACE (inner))
       && ! MEM_VOLATILE_P (inner))
     {
-      int offset = 0;
+      poly_int64 offset = 0;
 
       /* The computations below will be correct if the machine is big
 	 endian in both bits and bytes or little endian in bits and bytes.
@@ -10437,8 +10437,6 @@ simplify_shift_const_1 (enum rtx_code co
   machine_mode mode = result_mode;
   machine_mode shift_mode;
   scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
-  unsigned int mode_words
-    = (GET_MODE_SIZE (mode) + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD;
   /* We form (outer_op (code varop count) (outer_const)).  */
   enum rtx_code outer_op = UNKNOWN;
   HOST_WIDE_INT outer_const = 0;
@@ -10619,9 +10617,8 @@ simplify_shift_const_1 (enum rtx_code co
 	  if (subreg_lowpart_p (varop)
 	      && is_int_mode (GET_MODE (SUBREG_REG (varop)), &inner_mode)
 	      && GET_MODE_SIZE (inner_mode) > GET_MODE_SIZE (int_varop_mode)
-	      && (unsigned int) ((GET_MODE_SIZE (inner_mode)
-				  + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD)
-		 == mode_words
+	      && (CEIL (GET_MODE_SIZE (inner_mode), UNITS_PER_WORD)
+		  == CEIL (GET_MODE_SIZE (int_mode), UNITS_PER_WORD))
 	      && GET_MODE_CLASS (int_varop_mode) == MODE_INT)
 	    {
 	      varop = SUBREG_REG (varop);
@@ -11593,8 +11590,6 @@ recog_for_combine (rtx *pnewpat, rtx_ins
 gen_lowpart_for_combine (machine_mode omode, rtx x)
 {
   machine_mode imode = GET_MODE (x);
-  unsigned int osize = GET_MODE_SIZE (omode);
-  unsigned int isize = GET_MODE_SIZE (imode);
   rtx result;
 
   if (omode == imode)
@@ -11602,8 +11597,9 @@ gen_lowpart_for_combine (machine_mode om
 
   /* We can only support MODE being wider than a word if X is a
      constant integer or has a mode the same size.  */
-  if (GET_MODE_SIZE (omode) > UNITS_PER_WORD
-      && ! (CONST_SCALAR_INT_P (x) || isize == osize))
+  if (may_gt (GET_MODE_SIZE (omode), UNITS_PER_WORD)
+      && ! (CONST_SCALAR_INT_P (x)
+	    || must_eq (GET_MODE_SIZE (imode), GET_MODE_SIZE (omode))))
     goto fail;
 
   /* X might be a paradoxical (subreg (mem)).  In that case, gen_lowpart
@@ -11620,8 +11616,6 @@ gen_lowpart_for_combine (machine_mode om
 
       if (imode == omode)
 	return x;
-
-      isize = GET_MODE_SIZE (imode);
     }
 
   result = gen_lowpart_common (omode, x);
Index: gcc/convert.c
===================================================================
--- gcc/convert.c	2017-10-23 17:25:54.176292301 +0100
+++ gcc/convert.c	2017-10-23 17:25:59.400104531 +0100
@@ -916,13 +916,15 @@ convert_to_integer_1 (tree type, tree ex
 	    }
 
 	  CASE_CONVERT:
-	    /* Don't introduce a "can't convert between vector values of
-	       different size" error.  */
-	    if (TREE_CODE (TREE_TYPE (TREE_OPERAND (expr, 0))) == VECTOR_TYPE
-		&& (GET_MODE_SIZE (TYPE_MODE
-				   (TREE_TYPE (TREE_OPERAND (expr, 0))))
-		    != GET_MODE_SIZE (TYPE_MODE (type))))
-	      break;
+	    {
+	      tree argtype = TREE_TYPE (TREE_OPERAND (expr, 0));
+	      /* Don't introduce a "can't convert between vector values
+		 of different size" error.  */
+	      if (TREE_CODE (argtype) == VECTOR_TYPE
+		  && may_ne (GET_MODE_SIZE (TYPE_MODE (argtype)),
+			     GET_MODE_SIZE (TYPE_MODE (type))))
+		break;
+	    }
 	    /* If truncating after truncating, might as well do all at once.
 	       If truncating after extending, we may get rid of wasted work.  */
 	    return convert (type, get_unwidened (TREE_OPERAND (expr, 0), type));
Index: gcc/cse.c
===================================================================
--- gcc/cse.c	2017-10-23 17:25:54.177292265 +0100
+++ gcc/cse.c	2017-10-23 17:25:59.402104460 +0100
@@ -3807,8 +3807,8 @@ equiv_constant (rtx x)
 
       /* If we didn't and if doing so makes sense, see if we previously
 	 assigned a constant value to the enclosing word mode SUBREG.  */
-      if (GET_MODE_SIZE (mode) < GET_MODE_SIZE (word_mode)
-	  && GET_MODE_SIZE (word_mode) < GET_MODE_SIZE (imode))
+      if (must_lt (GET_MODE_SIZE (mode), UNITS_PER_WORD)
+	  && must_lt (UNITS_PER_WORD, GET_MODE_SIZE (imode)))
 	{
 	  poly_int64 byte = (SUBREG_BYTE (x)
 			     - subreg_lowpart_offset (mode, word_mode));
@@ -5986,9 +5986,10 @@ cse_insn (rtx_insn *insn)
 	   already entered SRC and DEST of the SET in the table.  */
 
 	if (GET_CODE (dest) == SUBREG
-	    && (((GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest))) - 1)
-		 / UNITS_PER_WORD)
-		== (GET_MODE_SIZE (GET_MODE (dest)) - 1) / UNITS_PER_WORD)
+	    && (known_equal_after_align_down
+		(GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest))) - 1,
+		 GET_MODE_SIZE (GET_MODE (dest)) - 1,
+		 UNITS_PER_WORD))
 	    && !partial_subreg_p (dest)
 	    && sets[i].src_elt != 0)
 	  {
Index: gcc/cselib.c
===================================================================
--- gcc/cselib.c	2017-10-23 17:16:50.359529762 +0100
+++ gcc/cselib.c	2017-10-23 17:25:59.403104424 +0100
@@ -805,14 +805,14 @@ autoinc_split (rtx x, rtx *off, machine_
       if (memmode == VOIDmode)
 	return x;
 
-      *off = GEN_INT (-GET_MODE_SIZE (memmode));
+      *off = gen_int_mode (-GET_MODE_SIZE (memmode), GET_MODE (x));
       return XEXP (x, 0);
 
     case PRE_INC:
       if (memmode == VOIDmode)
 	return x;
 
-      *off = GEN_INT (GET_MODE_SIZE (memmode));
+      *off = gen_int_mode (GET_MODE_SIZE (memmode), GET_MODE (x));
       return XEXP (x, 0);
 
     case PRE_MODIFY:
@@ -1068,6 +1068,7 @@ rtx_equal_for_cselib_1 (rtx x, rtx y, ma
 cselib_hash_rtx (rtx x, int create, machine_mode memmode)
 {
   cselib_val *e;
+  poly_int64 offset;
   int i, j;
   enum rtx_code code;
   const char *fmt;
@@ -1203,14 +1204,15 @@ cselib_hash_rtx (rtx x, int create, mach
     case PRE_INC:
       /* We can't compute these without knowing the MEM mode.  */
       gcc_assert (memmode != VOIDmode);
-      i = GET_MODE_SIZE (memmode);
+      offset = GET_MODE_SIZE (memmode);
       if (code == PRE_DEC)
-	i = -i;
+	offset = -offset;
       /* Adjust the hash so that (mem:MEMMODE (pre_* (reg))) hashes
 	 like (mem:MEMMODE (plus (reg) (const_int I))).  */
       hash += (unsigned) PLUS - (unsigned)code
 	+ cselib_hash_rtx (XEXP (x, 0), create, memmode)
-	+ cselib_hash_rtx (GEN_INT (i), create, memmode);
+	+ cselib_hash_rtx (gen_int_mode (offset, GET_MODE (x)),
+			   create, memmode);
       return hash ? hash : 1 + (unsigned) PLUS;
 
     case PRE_MODIFY:
@@ -1871,6 +1873,7 @@ cselib_subst_to_values (rtx x, machine_m
   struct elt_list *l;
   rtx copy = x;
   int i;
+  poly_int64 offset;
 
   switch (code)
     {
@@ -1907,11 +1910,11 @@ cselib_subst_to_values (rtx x, machine_m
     case PRE_DEC:
     case PRE_INC:
       gcc_assert (memmode != VOIDmode);
-      i = GET_MODE_SIZE (memmode);
+      offset = GET_MODE_SIZE (memmode);
       if (code == PRE_DEC)
-	i = -i;
+	offset = -offset;
       return cselib_subst_to_values (plus_constant (GET_MODE (x),
-						    XEXP (x, 0), i),
+						    XEXP (x, 0), offset),
 				     memmode);
 
     case PRE_MODIFY:
Index: gcc/dce.c
===================================================================
--- gcc/dce.c	2017-10-23 17:11:40.377197950 +0100
+++ gcc/dce.c	2017-10-23 17:25:59.403104424 +0100
@@ -884,8 +884,8 @@ word_dce_process_block (basic_block bb,
 	df_ref use;
 	FOR_EACH_INSN_USE (use, insn)
 	  if (DF_REF_REGNO (use) >= FIRST_PSEUDO_REGISTER
-	      && (GET_MODE_SIZE (GET_MODE (DF_REF_REAL_REG (use)))
-		  == 2 * UNITS_PER_WORD)
+	      && must_eq (GET_MODE_SIZE (GET_MODE (DF_REF_REAL_REG (use))),
+			  2 * UNITS_PER_WORD)
 	      && !bitmap_bit_p (local_live, 2 * DF_REF_REGNO (use))
 	      && !bitmap_bit_p (local_live, 2 * DF_REF_REGNO (use) + 1))
 	    dead_debug_add (&debug, use, DF_REF_REGNO (use));
Index: gcc/df-problems.c
===================================================================
--- gcc/df-problems.c	2017-08-30 16:28:10.773201395 +0100
+++ gcc/df-problems.c	2017-10-23 17:25:59.405104352 +0100
@@ -2815,7 +2815,7 @@ df_word_lr_mark_ref (df_ref ref, bool is
   regno = REGNO (reg);
   reg_mode = GET_MODE (reg);
   if (regno < FIRST_PSEUDO_REGISTER
-      || GET_MODE_SIZE (reg_mode) != 2 * UNITS_PER_WORD)
+      || may_ne (GET_MODE_SIZE (reg_mode), 2 * UNITS_PER_WORD))
     return true;
 
   if (GET_CODE (orig_reg) == SUBREG
Index: gcc/dwarf2cfi.c
===================================================================
--- gcc/dwarf2cfi.c	2017-10-23 17:18:57.858161053 +0100
+++ gcc/dwarf2cfi.c	2017-10-23 17:25:59.405104352 +0100
@@ -270,8 +270,8 @@ void init_one_dwarf_reg_size (int regno,
   const unsigned int rnum = DWARF2_FRAME_REG_OUT (dnum, 1);
   const unsigned int dcol = DWARF_REG_TO_UNWIND_COLUMN (rnum);
   
-  const HOST_WIDE_INT slotoffset = dcol * GET_MODE_SIZE (slotmode);
-  const HOST_WIDE_INT regsize = GET_MODE_SIZE (regmode);
+  poly_int64 slotoffset = dcol * GET_MODE_SIZE (slotmode);
+  poly_int64 regsize = GET_MODE_SIZE (regmode);
 
   init_state->processed_regno[regno] = true;
 
@@ -285,7 +285,8 @@ void init_one_dwarf_reg_size (int regno,
       init_state->wrote_return_column = true;
     }
 
-  if (slotoffset < 0)
+  /* ??? When is this true?  Should it be a test based on DCOL instead?  */
+  if (may_lt (slotoffset, 0))
     return;
 
   emit_move_insn (adjust_address (table, slotmode, slotoffset),
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 17:25:57.261181415 +0100
+++ gcc/dwarf2out.c	2017-10-23 17:25:59.418103884 +0100
@@ -13164,7 +13164,10 @@ multiple_reg_loc_descriptor (rtx rtl, rt
       gcc_assert ((unsigned) DBX_REGISTER_NUMBER (reg) == dbx_reg_number (rtl));
       nregs = REG_NREGS (rtl);
 
-      size = GET_MODE_SIZE (GET_MODE (rtl)) / nregs;
+      /* At present we only track constant-sized pieces.  */
+      if (!GET_MODE_SIZE (GET_MODE (rtl)).is_constant (&size))
+	return NULL;
+      size /= nregs;
 
       loc_result = NULL;
       while (nregs--)
@@ -13184,7 +13187,9 @@ multiple_reg_loc_descriptor (rtx rtl, rt
 
   gcc_assert (GET_CODE (regs) == PARALLEL);
 
-  size = GET_MODE_SIZE (GET_MODE (XVECEXP (regs, 0, 0)));
+  /* At present we only track constant-sized pieces.  */
+  if (!GET_MODE_SIZE (GET_MODE (XVECEXP (regs, 0, 0))).is_constant (&size))
+    return NULL;
   loc_result = NULL;
 
   for (i = 0; i < XVECLEN (regs, 0); ++i)
@@ -14765,7 +14770,7 @@ mem_loc_descriptor (rtx rtl, machine_mod
       if (is_a <scalar_int_mode> (mode, &int_mode)
 	  && is_a <scalar_int_mode> (GET_MODE (inner), &inner_mode)
 	  ? GET_MODE_SIZE (int_mode) <= GET_MODE_SIZE (inner_mode)
-	  : GET_MODE_SIZE (mode) == GET_MODE_SIZE (GET_MODE (inner)))
+	  : must_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (GET_MODE (inner))))
 	{
 	  dw_die_ref type_die;
 	  dw_loc_descr_ref cvt;
@@ -14781,8 +14786,7 @@ mem_loc_descriptor (rtx rtl, machine_mod
 	      mem_loc_result = NULL;
 	      break;
 	    }
-	  if (GET_MODE_SIZE (mode)
-	      != GET_MODE_SIZE (GET_MODE (inner)))
+	  if (may_ne (GET_MODE_SIZE (mode), GET_MODE_SIZE (GET_MODE (inner))))
 	    cvt = new_loc_descr (dwarf_OP (DW_OP_convert), 0, 0);
 	  else
 	    cvt = new_loc_descr (dwarf_OP (DW_OP_reinterpret), 0, 0);
@@ -14943,15 +14947,17 @@ mem_loc_descriptor (rtx rtl, machine_mod
 	    {
 	      dw_die_ref type_die;
 	      dw_loc_descr_ref deref;
+	      HOST_WIDE_INT size;
 
 	      if (dwarf_strict && dwarf_version < 5)
 		return NULL;
+	      if (!GET_MODE_SIZE (mode).is_constant (&size))
+		return NULL;
 	      type_die
 		= base_type_for_mode (mode, SCALAR_INT_MODE_P (mode));
 	      if (type_die == NULL)
 		return NULL;
-	      deref = new_loc_descr (dwarf_OP (DW_OP_deref_type),
-				     GET_MODE_SIZE (mode), 0);
+	      deref = new_loc_descr (dwarf_OP (DW_OP_deref_type), size, 0);
 	      deref->dw_loc_oprnd2.val_class = dw_val_class_die_ref;
 	      deref->dw_loc_oprnd2.v.val_die_ref.die = type_die;
 	      deref->dw_loc_oprnd2.v.val_die_ref.external = 0;
@@ -15703,6 +15709,12 @@ mem_loc_descriptor (rtx rtl, machine_mod
 static dw_loc_descr_ref
 concat_loc_descriptor (rtx x0, rtx x1, enum var_init_status initialized)
 {
+  /* At present we only track constant-sized pieces.  */
+  unsigned int size0, size1;
+  if (!GET_MODE_SIZE (GET_MODE (x0)).is_constant (&size0)
+      || !GET_MODE_SIZE (GET_MODE (x1)).is_constant (&size1))
+    return 0;
+
   dw_loc_descr_ref cc_loc_result = NULL;
   dw_loc_descr_ref x0_ref
     = loc_descriptor (x0, VOIDmode, VAR_INIT_STATUS_INITIALIZED);
@@ -15713,10 +15725,10 @@ concat_loc_descriptor (rtx x0, rtx x1, e
     return 0;
 
   cc_loc_result = x0_ref;
-  add_loc_descr_op_piece (&cc_loc_result, GET_MODE_SIZE (GET_MODE (x0)));
+  add_loc_descr_op_piece (&cc_loc_result, size0);
 
   add_loc_descr (&cc_loc_result, x1_ref);
-  add_loc_descr_op_piece (&cc_loc_result, GET_MODE_SIZE (GET_MODE (x1)));
+  add_loc_descr_op_piece (&cc_loc_result, size1);
 
   if (initialized == VAR_INIT_STATUS_UNINITIALIZED)
     add_loc_descr (&cc_loc_result, new_loc_descr (DW_OP_GNU_uninit, 0, 0));
@@ -15733,18 +15745,23 @@ concatn_loc_descriptor (rtx concatn, enu
   unsigned int i;
   dw_loc_descr_ref cc_loc_result = NULL;
   unsigned int n = XVECLEN (concatn, 0);
+  unsigned int size;
 
   for (i = 0; i < n; ++i)
     {
       dw_loc_descr_ref ref;
       rtx x = XVECEXP (concatn, 0, i);
 
+      /* At present we only track constant-sized pieces.  */
+      if (!GET_MODE_SIZE (GET_MODE (x)).is_constant (&size))
+	return NULL;
+
       ref = loc_descriptor (x, VOIDmode, VAR_INIT_STATUS_INITIALIZED);
       if (ref == NULL)
 	return NULL;
 
       add_loc_descr (&cc_loc_result, ref);
-      add_loc_descr_op_piece (&cc_loc_result, GET_MODE_SIZE (GET_MODE (x)));
+      add_loc_descr_op_piece (&cc_loc_result, size);
     }
 
   if (cc_loc_result && initialized == VAR_INIT_STATUS_UNINITIALIZED)
@@ -15863,7 +15880,7 @@ loc_descriptor (rtx rtl, machine_mode mo
 	rtvec par_elems = XVEC (rtl, 0);
 	int num_elem = GET_NUM_ELEM (par_elems);
 	machine_mode mode;
-	int i;
+	int i, size;
 
 	/* Create the first one, so we have something to add to.  */
 	loc_result = loc_descriptor (XEXP (RTVEC_ELT (par_elems, 0), 0),
@@ -15871,7 +15888,10 @@ loc_descriptor (rtx rtl, machine_mode mo
 	if (loc_result == NULL)
 	  return NULL;
 	mode = GET_MODE (XEXP (RTVEC_ELT (par_elems, 0), 0));
-	add_loc_descr_op_piece (&loc_result, GET_MODE_SIZE (mode));
+	/* At present we only track constant-sized pieces.  */
+	if (!GET_MODE_SIZE (mode).is_constant (&size))
+	  return NULL;
+	add_loc_descr_op_piece (&loc_result, size);
 	for (i = 1; i < num_elem; i++)
 	  {
 	    dw_loc_descr_ref temp;
@@ -15882,7 +15902,10 @@ loc_descriptor (rtx rtl, machine_mode mo
 	      return NULL;
 	    add_loc_descr (&loc_result, temp);
 	    mode = GET_MODE (XEXP (RTVEC_ELT (par_elems, i), 0));
-	    add_loc_descr_op_piece (&loc_result, GET_MODE_SIZE (mode));
+	    /* At present we only track constant-sized pieces.  */
+	    if (!GET_MODE_SIZE (mode).is_constant (&size))
+	      return NULL;
+	    add_loc_descr_op_piece (&loc_result, size);
 	  }
       }
       break;
@@ -19098,7 +19121,7 @@ rtl_for_decl_location (tree decl)
 	    rtl = DECL_INCOMING_RTL (decl);
 	  else if ((rtl == NULL_RTX || is_pseudo_reg (rtl))
 		   && SCALAR_INT_MODE_P (dmode)
-		   && GET_MODE_SIZE (dmode) <= GET_MODE_SIZE (pmode)
+		   && must_le (GET_MODE_SIZE (dmode), GET_MODE_SIZE (pmode))
 		   && DECL_INCOMING_RTL (decl))
 	    {
 	      rtx inc = DECL_INCOMING_RTL (decl);
@@ -19139,12 +19162,12 @@ rtl_for_decl_location (tree decl)
 	       /* Big endian correction check.  */
 	       && BYTES_BIG_ENDIAN
 	       && TYPE_MODE (TREE_TYPE (decl)) != GET_MODE (rtl)
-	       && (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (decl)))
-		   < UNITS_PER_WORD))
+	       && must_lt (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (decl))),
+			   UNITS_PER_WORD))
 	{
 	  machine_mode addr_mode = get_address_mode (rtl);
-	  int offset = (UNITS_PER_WORD
-			- GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (decl))));
+	  poly_int64 offset = (UNITS_PER_WORD
+			       - GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (decl))));
 
 	  rtl = gen_rtx_MEM (TYPE_MODE (TREE_TYPE (decl)),
 			     plus_constant (addr_mode, XEXP (rtl, 0), offset));
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 17:25:48.618492077 +0100
+++ gcc/emit-rtl.c	2017-10-23 17:25:59.418103884 +0100
@@ -1635,13 +1635,13 @@ gen_lowpart_common (machine_mode mode, r
 rtx
 gen_highpart (machine_mode mode, rtx x)
 {
-  unsigned int msize = GET_MODE_SIZE (mode);
+  poly_uint64 msize = GET_MODE_SIZE (mode);
   rtx result;
 
   /* This case loses if X is a subreg.  To catch bugs early,
      complain if an invalid MODE is used even in other cases.  */
-  gcc_assert (msize <= UNITS_PER_WORD
-	      || msize == (unsigned int) GET_MODE_UNIT_SIZE (GET_MODE (x)));
+  gcc_assert (must_le (msize, (unsigned int) UNITS_PER_WORD)
+	      || must_eq (msize, GET_MODE_UNIT_SIZE (GET_MODE (x))));
 
   result = simplify_gen_subreg (mode, x, GET_MODE (x),
 				subreg_highpart_offset (mode, GET_MODE (x)));
@@ -2603,7 +2603,7 @@ replace_equiv_address_nv (rtx memref, rt
 widen_memory_access (rtx memref, machine_mode mode, poly_int64 offset)
 {
   rtx new_rtx = adjust_address_1 (memref, mode, offset, 1, 1, 0, 0);
-  unsigned int size = GET_MODE_SIZE (mode);
+  poly_uint64 size = GET_MODE_SIZE (mode);
 
   /* If there are no changes, just return the original memory reference.  */
   if (new_rtx == memref)
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 17:25:57.262181379 +0100
+++ gcc/expmed.c	2017-10-23 17:25:59.419103849 +0100
@@ -1631,7 +1631,7 @@ extract_bit_field_1 (rtx str_rtx, poly_u
       && !MEM_P (op0)
       && VECTOR_MODE_P (tmode)
       && must_eq (bitsize, GET_MODE_SIZE (tmode))
-      && GET_MODE_SIZE (GET_MODE (op0)) > GET_MODE_SIZE (tmode))
+      && may_gt (GET_MODE (op0), GET_MODE_SIZE (tmode)))
     {
       machine_mode new_mode = GET_MODE (op0);
       if (GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode))
@@ -1642,7 +1642,8 @@ extract_bit_field_1 (rtx str_rtx, poly_u
 			   GET_MODE_UNIT_BITSIZE (tmode), &nunits)
 	      || !mode_for_vector (inner_mode, nunits).exists (&new_mode)
 	      || !VECTOR_MODE_P (new_mode)
-	      || GET_MODE_SIZE (new_mode) != GET_MODE_SIZE (GET_MODE (op0))
+	      || may_ne (GET_MODE_SIZE (new_mode),
+			 GET_MODE_SIZE (GET_MODE (op0)))
 	      || GET_MODE_INNER (new_mode) != GET_MODE_INNER (tmode)
 	      || !targetm.vector_mode_supported_p (new_mode))
 	    new_mode = VOIDmode;
@@ -1698,8 +1699,8 @@ extract_bit_field_1 (rtx str_rtx, poly_u
 	new_mode = MIN_MODE_VECTOR_INT;
 
       FOR_EACH_MODE_FROM (new_mode, new_mode)
-	if (GET_MODE_SIZE (new_mode) == GET_MODE_SIZE (GET_MODE (op0))
-	    && GET_MODE_UNIT_SIZE (new_mode) == GET_MODE_SIZE (tmode)
+	if (must_eq (GET_MODE_SIZE (new_mode), GET_MODE_SIZE (GET_MODE (op0)))
+	    && must_eq (GET_MODE_UNIT_SIZE (new_mode), GET_MODE_SIZE (tmode))
 	    && targetm.vector_mode_supported_p (new_mode))
 	  break;
       if (new_mode != VOIDmode)
@@ -1757,7 +1758,7 @@ extract_bit_field_1 (rtx str_rtx, poly_u
 	}
       else
 	{
-	  HOST_WIDE_INT size = GET_MODE_SIZE (GET_MODE (op0));
+	  poly_int64 size = GET_MODE_SIZE (GET_MODE (op0));
 	  rtx mem = assign_stack_temp (GET_MODE (op0), size);
 	  emit_move_insn (mem, op0);
 	  op0 = adjust_bitfield_address_size (mem, BLKmode, 0, size);
@@ -1857,7 +1858,8 @@ extract_integral_bit_field (rtx op0, opt
       /* The mode must be fixed-size, since extract_bit_field_1 handles
 	 extractions from variable-sized objects before calling this
 	 function.  */
-      unsigned int target_size = GET_MODE_SIZE (GET_MODE (target));
+      unsigned int target_size
+	= GET_MODE_SIZE (GET_MODE (target)).to_constant ();
       last = get_last_insn ();
       for (i = 0; i < nwords; i++)
 	{
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 17:25:57.263181343 +0100
+++ gcc/expr.c	2017-10-23 17:25:59.420103813 +0100
@@ -2237,7 +2237,7 @@ emit_group_load_1 (rtx *tmps, rtx dst, r
       else if (VECTOR_MODE_P (GET_MODE (dst))
 	       && REG_P (src))
 	{
-	  int slen = GET_MODE_SIZE (GET_MODE (src));
+	  poly_uint64 slen = GET_MODE_SIZE (GET_MODE (src));
 	  rtx mem;
 
 	  mem = assign_stack_temp (GET_MODE (src), slen);
@@ -2967,7 +2967,7 @@ clear_storage_hints (rtx object, rtx siz
      just move a zero.  Otherwise, do this a piece at a time.  */
   if (mode != BLKmode
       && CONST_INT_P (size)
-      && INTVAL (size) == (HOST_WIDE_INT) GET_MODE_SIZE (mode))
+      && must_eq (INTVAL (size), GET_MODE_SIZE (mode)))
     {
       rtx zero = CONST0_RTX (mode);
       if (zero != NULL)
@@ -3505,7 +3505,7 @@ emit_move_complex (machine_mode mode, rt
 	 existing block move logic.  */
       if (MEM_P (x) && MEM_P (y))
 	{
-	  emit_block_move (x, y, GEN_INT (GET_MODE_SIZE (mode)),
+	  emit_block_move (x, y, gen_int_mode (GET_MODE_SIZE (mode), Pmode),
 			   BLOCK_OP_NO_LIBCALL);
 	  return get_last_insn ();
 	}
@@ -3570,9 +3570,12 @@ emit_move_multi_word (machine_mode mode,
   rtx_insn *seq;
   rtx inner;
   bool need_clobber;
-  int i;
+  int i, mode_size;
 
-  gcc_assert (GET_MODE_SIZE (mode) >= UNITS_PER_WORD);
+  /* This function can only handle cases where the number of words is
+     known at compile time.  */
+  mode_size = GET_MODE_SIZE (mode).to_constant ();
+  gcc_assert (mode_size >= UNITS_PER_WORD);
 
   /* If X is a push on the stack, do the push now and replace
      X with a reference to the stack pointer.  */
@@ -3591,9 +3594,7 @@ emit_move_multi_word (machine_mode mode,
   start_sequence ();
 
   need_clobber = false;
-  for (i = 0;
-       i < (GET_MODE_SIZE (mode) + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD;
-       i++)
+  for (i = 0; i < CEIL (mode_size, UNITS_PER_WORD); i++)
     {
       rtx xpart = operand_subword (x, i, 1, mode);
       rtx ypart;
@@ -4315,7 +4316,7 @@ emit_push_insn (rtx x, machine_mode mode
 	  /* A value is to be stored in an insufficiently aligned
 	     stack slot; copy via a suitably aligned slot if
 	     necessary.  */
-	  size = GEN_INT (GET_MODE_SIZE (mode));
+	  size = gen_int_mode (GET_MODE_SIZE (mode), Pmode);
 	  if (!MEM_P (xinner))
 	    {
 	      temp = assign_temp (type, 1, 1);
@@ -4471,9 +4472,10 @@ emit_push_insn (rtx x, machine_mode mode
     }
   else if (partial > 0)
     {
-      /* Scalar partly in registers.  */
-
-      int size = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+      /* Scalar partly in registers.  This case is only supported
+	 for fixed-wdth modes.  */
+      int size = GET_MODE_SIZE (mode).to_constant ();
+      size /= UNITS_PER_WORD;
       int i;
       int not_stack;
       /* # bytes of start of argument
@@ -11140,10 +11142,13 @@ expand_expr_real_1 (tree exp, rtx target
 		  gcc_assert (!TREE_ADDRESSABLE (exp));
 
 		  if (GET_MODE (op0) == BLKmode)
-		    emit_block_move (new_with_op0_mode, op0,
-				     GEN_INT (GET_MODE_SIZE (mode)),
-				     (modifier == EXPAND_STACK_PARM
-				      ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL));
+		    {
+		      rtx size_rtx = gen_int_mode (mode_size, Pmode);
+		      emit_block_move (new_with_op0_mode, op0, size_rtx,
+				       (modifier == EXPAND_STACK_PARM
+					? BLOCK_OP_CALL_PARM
+					: BLOCK_OP_NORMAL));
+		    }
 		  else
 		    emit_move_insn (new_with_op0_mode, op0);
 
Index: gcc/function.c
===================================================================
--- gcc/function.c	2017-10-23 17:19:01.410170292 +0100
+++ gcc/function.c	2017-10-23 17:25:59.422103741 +0100
@@ -2875,7 +2875,7 @@ assign_parm_setup_block_p (struct assign
   /* Only assign_parm_setup_block knows how to deal with register arguments
      that are padded at the least significant end.  */
   if (REG_P (data->entry_parm)
-      && GET_MODE_SIZE (data->promoted_mode) < UNITS_PER_WORD
+      && must_lt (GET_MODE_SIZE (data->promoted_mode), UNITS_PER_WORD)
       && (BLOCK_REG_PADDING (data->passed_mode, data->passed_type, 1)
 	  == (BYTES_BIG_ENDIAN ? PAD_UPWARD : PAD_DOWNWARD)))
     return true;
@@ -2938,7 +2938,7 @@ assign_parm_setup_block (struct assign_p
       SET_DECL_ALIGN (parm, MAX (DECL_ALIGN (parm), BITS_PER_WORD));
       stack_parm = assign_stack_local (BLKmode, size_stored,
 				       DECL_ALIGN (parm));
-      if (GET_MODE_SIZE (GET_MODE (entry_parm)) == size)
+      if (must_eq (GET_MODE_SIZE (GET_MODE (entry_parm)), size))
 	PUT_MODE (stack_parm, GET_MODE (entry_parm));
       set_mem_attributes (stack_parm, parm, 1);
     }
@@ -4339,8 +4339,10 @@ pad_to_arg_alignment (struct args_size *
 pad_below (struct args_size *offset_ptr, machine_mode passed_mode, tree sizetree)
 {
   unsigned int align = PARM_BOUNDARY / BITS_PER_UNIT;
-  if (passed_mode != BLKmode)
-    offset_ptr->constant += -GET_MODE_SIZE (passed_mode) & (align - 1);
+  int misalign;
+  if (passed_mode != BLKmode
+      && known_misalignment (GET_MODE_SIZE (passed_mode), align, &misalign))
+    offset_ptr->constant += -misalign & (align - 1);
   else
     {
       if (TREE_CODE (sizetree) != INTEGER_CST
Index: gcc/gimple-fold.c
===================================================================
--- gcc/gimple-fold.c	2017-10-23 17:25:57.265181271 +0100
+++ gcc/gimple-fold.c	2017-10-23 17:25:59.425103633 +0100
@@ -3632,7 +3632,7 @@ optimize_atomic_compare_exchange_p (gimp
       && optab_handler (sync_compare_and_swap_optab, mode) == CODE_FOR_nothing)
     return false;
 
-  if (int_size_in_bytes (etype) != GET_MODE_SIZE (mode))
+  if (may_ne (int_size_in_bytes (etype), GET_MODE_SIZE (mode)))
     return false;
 
   return true;
Index: gcc/gimple-ssa-store-merging.c
===================================================================
--- gcc/gimple-ssa-store-merging.c	2017-10-23 17:18:46.178187802 +0100
+++ gcc/gimple-ssa-store-merging.c	2017-10-23 17:25:59.425103633 +0100
@@ -1328,8 +1328,9 @@ lhs_valid_for_store_merging_p (tree lhs)
 static bool
 rhs_valid_for_store_merging_p (tree rhs)
 {
-  return native_encode_expr (rhs, NULL,
-			     GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (rhs)))) != 0;
+  unsigned HOST_WIDE_INT size;
+  return (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (rhs))).is_constant (&size)
+	  && native_encode_expr (rhs, NULL, size) != 0);
 }
 
 /* Entry point for the pass.  Go over each basic block recording chains of
Index: gcc/ira.c
===================================================================
--- gcc/ira.c	2017-10-23 17:18:53.834514759 +0100
+++ gcc/ira.c	2017-10-23 17:25:59.430103453 +0100
@@ -4049,9 +4049,9 @@ get_subreg_tracking_sizes (rtx x, HOST_W
 			   HOST_WIDE_INT *inner_size, HOST_WIDE_INT *start)
 {
   rtx reg = regno_reg_rtx[REGNO (SUBREG_REG (x))];
-  *outer_size = GET_MODE_SIZE (GET_MODE (x));
-  *inner_size = GET_MODE_SIZE (GET_MODE (reg));
-  return SUBREG_BYTE (x).is_constant (start);
+  return (GET_MODE_SIZE (GET_MODE (x)).is_constant (outer_size)
+	  && GET_MODE_SIZE (GET_MODE (reg)).is_constant (inner_size)
+	  && SUBREG_BYTE (x).is_constant (start));
 }
 
 /* Init LIVE_SUBREGS[ALLOCNUM] and LIVE_SUBREGS_USED[ALLOCNUM] for
Index: gcc/ira-build.c
===================================================================
--- gcc/ira-build.c	2017-08-30 16:26:42.813124082 +0100
+++ gcc/ira-build.c	2017-10-23 17:25:59.426103597 +0100
@@ -566,7 +566,7 @@ ira_create_allocno_objects (ira_allocno_
   int n = ira_reg_class_max_nregs[aclass][mode];
   int i;
 
-  if (GET_MODE_SIZE (mode) != 2 * UNITS_PER_WORD || n != 2)
+  if (n != 2 || may_ne (GET_MODE_SIZE (mode), n * UNITS_PER_WORD))
     n = 1;
 
   ALLOCNO_NUM_OBJECTS (a) = n;
Index: gcc/ira-color.c
===================================================================
--- gcc/ira-color.c	2017-10-23 17:20:48.204761416 +0100
+++ gcc/ira-color.c	2017-10-23 17:25:59.428103525 +0100
@@ -3940,7 +3940,8 @@ coalesced_pseudo_reg_slot_compare (const
 			     regno_max_ref_mode[regno1]);
   mode2 = wider_subreg_mode (PSEUDO_REGNO_MODE (regno2),
 			     regno_max_ref_mode[regno2]);
-  if ((diff = GET_MODE_SIZE (mode2) - GET_MODE_SIZE (mode1)) != 0)
+  if ((diff = compare_sizes_for_sort (GET_MODE_SIZE (mode2),
+				      GET_MODE_SIZE (mode1))) != 0)
     return diff;
   return regno1 - regno2;
 }
@@ -4229,9 +4230,10 @@ ira_sort_regnos_for_alter_reg (int *pseu
 	      machine_mode mode = wider_subreg_mode
 		(PSEUDO_REGNO_MODE (ALLOCNO_REGNO (a)),
 		 reg_max_ref_mode[ALLOCNO_REGNO (a)]);
-	      fprintf (ira_dump_file, " a%dr%d(%d,%d)",
-		       ALLOCNO_NUM (a), ALLOCNO_REGNO (a), ALLOCNO_FREQ (a),
-		       GET_MODE_SIZE (mode));
+	      fprintf (ira_dump_file, " a%dr%d(%d,",
+		       ALLOCNO_NUM (a), ALLOCNO_REGNO (a), ALLOCNO_FREQ (a));
+	      print_dec (GET_MODE_SIZE (mode), ira_dump_file, SIGNED);
+	      fprintf (ira_dump_file, ")\n");
 	    }
 
 	  if (a == allocno)
Index: gcc/ira-costs.c
===================================================================
--- gcc/ira-costs.c	2017-10-02 09:10:57.866601895 +0100
+++ gcc/ira-costs.c	2017-10-23 17:25:59.428103525 +0100
@@ -1368,12 +1368,12 @@ record_operand_costs (rtx_insn *insn, en
       rtx src = SET_SRC (set);
 
       if (GET_CODE (dest) == SUBREG
-	  && (GET_MODE_SIZE (GET_MODE (dest))
-	      == GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest)))))
+	  && must_eq (GET_MODE_SIZE (GET_MODE (dest)),
+		      GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest)))))
 	dest = SUBREG_REG (dest);
       if (GET_CODE (src) == SUBREG
-	  && (GET_MODE_SIZE (GET_MODE (src))
-	      == GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))))
+	  && must_eq (GET_MODE_SIZE (GET_MODE (src)),
+		      GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))))
 	src = SUBREG_REG (src);
       if (REG_P (src) && REG_P (dest)
 	  && find_regno_note (insn, REG_DEAD, REGNO (src))
Index: gcc/lower-subreg.c
===================================================================
--- gcc/lower-subreg.c	2017-10-23 17:16:50.370528277 +0100
+++ gcc/lower-subreg.c	2017-10-23 17:25:59.431103417 +0100
@@ -110,7 +110,8 @@ #define choices \
 interesting_mode_p (machine_mode mode, unsigned int *bytes,
 		    unsigned int *words)
 {
-  *bytes = GET_MODE_SIZE (mode);
+  if (!GET_MODE_SIZE (mode).is_constant (bytes))
+    return false;
   *words = CEIL (*bytes, UNITS_PER_WORD);
   return true;
 }
@@ -667,8 +668,8 @@ simplify_gen_subreg_concatn (machine_mod
     {
       rtx op2;
 
-      if ((GET_MODE_SIZE (GET_MODE (op))
-	   == GET_MODE_SIZE (GET_MODE (SUBREG_REG (op))))
+      if (must_eq (GET_MODE_SIZE (GET_MODE (op)),
+		   GET_MODE_SIZE (GET_MODE (SUBREG_REG (op))))
 	  && known_zero (SUBREG_BYTE (op)))
 	return simplify_gen_subreg_concatn (outermode, SUBREG_REG (op),
 					    GET_MODE (SUBREG_REG (op)), byte);
@@ -869,8 +870,7 @@ resolve_simple_move (rtx set, rtx_insn *
   if (GET_CODE (src) == SUBREG
       && resolve_reg_p (SUBREG_REG (src))
       && (maybe_nonzero (SUBREG_BYTE (src))
-	  || (GET_MODE_SIZE (orig_mode)
-	      != GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))))
+	  || may_ne (orig_size, GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))))
     {
       real_dest = dest;
       dest = gen_reg_rtx (orig_mode);
@@ -884,8 +884,7 @@ resolve_simple_move (rtx set, rtx_insn *
   if (GET_CODE (dest) == SUBREG
       && resolve_reg_p (SUBREG_REG (dest))
       && (maybe_nonzero (SUBREG_BYTE (dest))
-	  || (GET_MODE_SIZE (orig_mode)
-	      != GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest))))))
+	  || may_ne (orig_size, GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest))))))
     {
       rtx reg, smove;
       rtx_insn *minsn;
Index: gcc/lra-constraints.c
===================================================================
--- gcc/lra-constraints.c	2017-10-23 17:25:54.179292194 +0100
+++ gcc/lra-constraints.c	2017-10-23 17:25:59.433103345 +0100
@@ -591,7 +591,8 @@ get_reload_reg (enum op_type type, machi
 	      {
 		if (in_subreg_p)
 		  continue;
-		if (GET_MODE_SIZE (GET_MODE (reg)) < GET_MODE_SIZE (mode))
+		if (may_lt (GET_MODE_SIZE (GET_MODE (reg)),
+			    GET_MODE_SIZE (mode)))
 		  continue;
 		reg = lowpart_subreg (mode, reg, GET_MODE (reg));
 		if (reg == NULL_RTX || GET_CODE (reg) != SUBREG)
@@ -827,6 +828,7 @@ #define CONST_POOL_OK_P(MODE, X)		\
   ((MODE) != VOIDmode				\
    && CONSTANT_P (X)				\
    && GET_CODE (X) != HIGH			\
+   && GET_MODE_SIZE (MODE).is_constant ()	\
    && !targetm.cannot_force_const_mem (MODE, X))
 
 /* True if C is a non-empty register class that has too few registers
@@ -1394,7 +1396,7 @@ process_addr_reg (rtx *loc, bool check_o
 	 -fno-split-wide-types specified.  */
       if (!REG_P (reg)
 	  || in_class_p (reg, cl, &new_class)
-	  || GET_MODE_SIZE (mode) <= GET_MODE_SIZE (ptr_mode))
+	  || must_le (GET_MODE_SIZE (mode), GET_MODE_SIZE (ptr_mode)))
        loc = &SUBREG_REG (*loc);
     }
 
@@ -1557,8 +1559,8 @@ simplify_operand_subreg (int nop, machin
 	     a word.  */
 	  if (!(may_ne (GET_MODE_PRECISION (mode),
 			GET_MODE_PRECISION (innermode))
-		&& GET_MODE_SIZE (mode) <= UNITS_PER_WORD
-		&& GET_MODE_SIZE (innermode) <= UNITS_PER_WORD
+		&& must_le (GET_MODE_SIZE (mode), UNITS_PER_WORD)
+		&& must_le (GET_MODE_SIZE (innermode), UNITS_PER_WORD)
 		&& WORD_REGISTER_OPERATIONS)
 	      && (!(MEM_ALIGN (subst) < GET_MODE_ALIGNMENT (mode)
 		    && targetm.slow_unaligned_access (mode, MEM_ALIGN (subst)))
@@ -4729,8 +4731,8 @@ lra_constraints (bool first_p)
 		/* Prevent access beyond equivalent memory for
 		   paradoxical subregs.  */
 		|| (MEM_P (x)
-		    && (GET_MODE_SIZE (lra_reg_info[i].biggest_mode)
-			> GET_MODE_SIZE (GET_MODE (x))))
+		    && may_gt (GET_MODE_SIZE (lra_reg_info[i].biggest_mode),
+			       GET_MODE_SIZE (GET_MODE (x))))
 		|| (pic_offset_table_rtx
 		    && ((CONST_POOL_OK_P (PSEUDO_REGNO_MODE (i), x)
 			 && (targetm.preferred_reload_class
Index: gcc/lra-spills.c
===================================================================
--- gcc/lra-spills.c	2017-10-23 17:16:50.371528142 +0100
+++ gcc/lra-spills.c	2017-10-23 17:25:59.434103309 +0100
@@ -107,7 +107,7 @@ struct slot
   /* Maximum alignment required by all users of the slot.  */
   unsigned int align;
   /* Maximum size required by all users of the slot.  */
-  HOST_WIDE_INT size;
+  poly_int64 size;
   /* Memory representing the all stack slot.  It can be different from
      memory representing a pseudo belonging to give stack slot because
      pseudo can be placed in a part of the corresponding stack slot.
@@ -132,10 +132,10 @@ assign_mem_slot (int i)
 {
   rtx x = NULL_RTX;
   machine_mode mode = GET_MODE (regno_reg_rtx[i]);
-  HOST_WIDE_INT inherent_size = PSEUDO_REGNO_BYTES (i);
+  poly_int64 inherent_size = PSEUDO_REGNO_BYTES (i);
   machine_mode wider_mode
     = wider_subreg_mode (mode, lra_reg_info[i].biggest_mode);
-  HOST_WIDE_INT total_size = GET_MODE_SIZE (wider_mode);
+  poly_int64 total_size = GET_MODE_SIZE (wider_mode);
   poly_int64 adjust = 0;
 
   lra_assert (regno_reg_rtx[i] != NULL_RTX && REG_P (regno_reg_rtx[i])
@@ -191,16 +191,15 @@ pseudo_reg_slot_compare (const void *v1p
   const int regno1 = *(const int *) v1p;
   const int regno2 = *(const int *) v2p;
   int diff, slot_num1, slot_num2;
-  int total_size1, total_size2;
 
   slot_num1 = pseudo_slots[regno1].slot_num;
   slot_num2 = pseudo_slots[regno2].slot_num;
   if ((diff = slot_num1 - slot_num2) != 0)
     return (frame_pointer_needed
 	    || (!FRAME_GROWS_DOWNWARD) == STACK_GROWS_DOWNWARD ? diff : -diff);
-  total_size1 = GET_MODE_SIZE (lra_reg_info[regno1].biggest_mode);
-  total_size2 = GET_MODE_SIZE (lra_reg_info[regno2].biggest_mode);
-  if ((diff = total_size2 - total_size1) != 0)
+  poly_int64 total_size1 = GET_MODE_SIZE (lra_reg_info[regno1].biggest_mode);
+  poly_int64 total_size2 = GET_MODE_SIZE (lra_reg_info[regno2].biggest_mode);
+  if ((diff = compare_sizes_for_sort (total_size2, total_size1)) != 0)
     return diff;
   return regno1 - regno2;
 }
@@ -315,7 +314,8 @@ add_pseudo_to_slot (int regno, int slot_
 					 lra_reg_info[regno].biggest_mode);
   unsigned int align = spill_slot_alignment (mode);
   slots[slot_num].align = MAX (slots[slot_num].align, align);
-  slots[slot_num].size = MAX (slots[slot_num].size, GET_MODE_SIZE (mode));
+  slots[slot_num].size = upper_bound (slots[slot_num].size,
+				      GET_MODE_SIZE (mode));
 
   if (slots[slot_num].regno < 0)
     {
@@ -580,8 +580,10 @@ lra_spill (void)
     {
       for (i = 0; i < slots_num; i++)
 	{
-	  fprintf (lra_dump_file, "  Slot %d regnos (width = %d):", i,
-		   GET_MODE_SIZE (GET_MODE (slots[i].mem)));
+	  fprintf (lra_dump_file, "  Slot %d regnos (width = ", i);
+	  print_dec (GET_MODE_SIZE (GET_MODE (slots[i].mem)),
+		     lra_dump_file, SIGNED);
+	  fprintf (lra_dump_file, "):");
 	  for (curr_regno = slots[i].regno;;
 	       curr_regno = pseudo_slots[curr_regno].next - pseudo_slots)
 	    {
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	2017-10-23 17:22:32.723227539 +0100
+++ gcc/omp-low.c	2017-10-23 17:25:59.438103166 +0100
@@ -3474,7 +3474,8 @@ omp_clause_aligned_alignment (tree claus
 	tree type = lang_hooks.types.type_for_mode (mode, 1);
 	if (type == NULL_TREE || TYPE_MODE (type) != mode)
 	  continue;
-	unsigned int nelts = GET_MODE_SIZE (vmode) / GET_MODE_SIZE (mode);
+	poly_uint64 nelts = exact_div (GET_MODE_SIZE (vmode),
+				       GET_MODE_SIZE (mode));
 	type = build_vector_type (type, nelts);
 	if (TYPE_MODE (type) != vmode)
 	  continue;
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-10-23 17:25:54.180292158 +0100
+++ gcc/optabs-query.c	2017-10-23 17:25:59.439103130 +0100
@@ -212,7 +212,7 @@ get_best_extraction_insn (extraction_ins
 	  FOR_EACH_MODE_FROM (mode_iter, mode)
 	    {
 	      mode = mode_iter.require ();
-	      if (GET_MODE_SIZE (mode) > GET_MODE_SIZE (field_mode)
+	      if (may_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (field_mode))
 		  || TRULY_NOOP_TRUNCATION_MODES_P (insn->field_mode,
 						    field_mode))
 		break;
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2017-10-23 17:25:48.620492005 +0100
+++ gcc/optabs-tree.c	2017-10-23 17:25:59.439103130 +0100
@@ -337,7 +337,7 @@ expand_vec_cond_expr_p (tree value_type,
 			       TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
     return true;
 
-  if (GET_MODE_SIZE (value_mode) != GET_MODE_SIZE (cmp_op_mode)
+  if (may_ne (GET_MODE_SIZE (value_mode), GET_MODE_SIZE (cmp_op_mode))
       || may_ne (GET_MODE_NUNITS (value_mode), GET_MODE_NUNITS (cmp_op_mode)))
     return false;
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 17:25:54.181292122 +0100
+++ gcc/optabs.c	2017-10-23 17:25:59.441103058 +0100
@@ -5456,13 +5456,12 @@ expand_vec_perm (machine_mode mode, rtx
   if (!target || GET_MODE (target) != mode)
     target = gen_reg_rtx (mode);
 
-  w = GET_MODE_SIZE (mode);
   u = GET_MODE_UNIT_SIZE (mode);
 
   /* Set QIMODE to a different vector mode with byte elements.
      If no such mode, or if MODE already has byte elements, use VOIDmode.  */
   if (GET_MODE_INNER (mode) == QImode
-      || !mode_for_vector (QImode, w).exists (&qimode)
+      || !mode_for_vector (QImode, GET_MODE_SIZE (mode)).exists (&qimode)
       || !VECTOR_MODE_P (qimode))
     qimode = VOIDmode;
 
@@ -5518,7 +5517,7 @@ expand_vec_perm (machine_mode mode, rtx
 	}
 
       /* Fall back to a constant byte-based permutation.  */
-      if (qimode != VOIDmode)
+      if (qimode != VOIDmode && GET_MODE_SIZE (mode).is_constant (&w))
 	{
 	  vec = rtvec_alloc (w);
 	  for (i = 0; i < e; ++i)
@@ -5565,6 +5564,9 @@ expand_vec_perm (machine_mode mode, rtx
 
   if (sel_qi == NULL)
     {
+      if (!GET_MODE_SIZE (mode).is_constant (&w))
+	return NULL_RTX;
+
       /* Multiply each element by its byte size.  */
       machine_mode selmode = GET_MODE (sel);
       if (u == 2)
@@ -5686,7 +5688,7 @@ expand_vec_cond_expr (tree vec_cond_type
   unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
 
 
-  gcc_assert (GET_MODE_SIZE (mode) == GET_MODE_SIZE (cmp_op_mode)
+  gcc_assert (must_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
 	      && must_eq (GET_MODE_NUNITS (mode),
 			  GET_MODE_NUNITS (cmp_op_mode)));
 
@@ -5815,7 +5817,7 @@ expand_mult_highpart (machine_mode mode,
   wmode = insn_data[icode].operand[0].mode;
   gcc_checking_assert (must_eq (2 * GET_MODE_NUNITS (wmode),
 				GET_MODE_NUNITS (mode)));
-  gcc_checking_assert (GET_MODE_SIZE (wmode) == GET_MODE_SIZE (mode));
+  gcc_checking_assert (must_eq (GET_MODE_SIZE (wmode), GET_MODE_SIZE (mode)));
 
   create_output_operand (&eops[0], gen_reg_rtx (wmode), wmode);
   create_input_operand (&eops[1], op0, mode);
@@ -6953,10 +6955,12 @@ insn_operand_matches (enum insn_code ico
 valid_multiword_target_p (rtx target)
 {
   machine_mode mode;
-  int i;
+  int i, size;
 
   mode = GET_MODE (target);
-  for (i = 0; i < GET_MODE_SIZE (mode); i += UNITS_PER_WORD)
+  if (!GET_MODE_SIZE (mode).is_constant (&size))
+    return false;
+  for (i = 0; i < size; i += UNITS_PER_WORD)
     if (!validate_subreg (word_mode, mode, target, i))
       return false;
   return true;
Index: gcc/recog.c
===================================================================
--- gcc/recog.c	2017-10-23 17:25:38.242865029 +0100
+++ gcc/recog.c	2017-10-23 17:25:59.442103022 +0100
@@ -1945,7 +1945,7 @@ offsettable_address_addr_space_p (int st
   int (*addressp) (machine_mode, rtx, addr_space_t) =
     (strictp ? strict_memory_address_addr_space_p
 	     : memory_address_addr_space_p);
-  unsigned int mode_sz = GET_MODE_SIZE (mode);
+  poly_int64 mode_sz = GET_MODE_SIZE (mode);
 
   if (CONSTANT_ADDRESS_P (y))
     return 1;
@@ -1967,7 +1967,7 @@ offsettable_address_addr_space_p (int st
      Clearly that depends on the situation in which it's being used.
      However, the current situation in which we test 0xffffffff is
      less than ideal.  Caveat user.  */
-  if (mode_sz == 0)
+  if (known_zero (mode_sz))
     mode_sz = BIGGEST_ALIGNMENT / BITS_PER_UNIT;
 
   /* If the expression contains a constant term,
@@ -1998,7 +1998,7 @@ offsettable_address_addr_space_p (int st
      go inside a LO_SUM here, so we do so as well.  */
   if (GET_CODE (y) == LO_SUM
       && mode != BLKmode
-      && mode_sz <= GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT)
+      && must_le (mode_sz, GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT))
     z = gen_rtx_LO_SUM (address_mode, XEXP (y, 0),
 			plus_constant (address_mode, XEXP (y, 1),
 				       mode_sz - 1));
Index: gcc/regcprop.c
===================================================================
--- gcc/regcprop.c	2017-10-23 17:16:50.372528007 +0100
+++ gcc/regcprop.c	2017-10-23 17:25:59.443102986 +0100
@@ -406,8 +406,11 @@ maybe_mode_change (machine_mode orig_mod
     {
       int copy_nregs = hard_regno_nregs (copy_regno, copy_mode);
       int use_nregs = hard_regno_nregs (copy_regno, new_mode);
-      int copy_offset
-	= GET_MODE_SIZE (copy_mode) / copy_nregs * (copy_nregs - use_nregs);
+      poly_uint64 bytes_per_reg;
+      if (!can_div_trunc_p (GET_MODE_SIZE (copy_mode),
+			    copy_nregs, &bytes_per_reg))
+	return NULL_RTX;
+      poly_uint64 copy_offset = bytes_per_reg * (copy_nregs - use_nregs);
       poly_uint64 offset
 	= subreg_size_lowpart_offset (GET_MODE_SIZE (new_mode) + copy_offset,
 				      GET_MODE_SIZE (orig_mode));
Index: gcc/reginfo.c
===================================================================
--- gcc/reginfo.c	2017-10-23 17:25:30.704136008 +0100
+++ gcc/reginfo.c	2017-10-23 17:25:59.443102986 +0100
@@ -631,14 +631,16 @@ choose_hard_reg_mode (unsigned int regno
 
   /* We first look for the largest integer mode that can be validly
      held in REGNO.  If none, we look for the largest floating-point mode.
-     If we still didn't find a valid mode, try CCmode.  */
+     If we still didn't find a valid mode, try CCmode.
 
+     The tests use may_gt rather than must_gt because we want (for example)
+     N V4SFs to win over plain V4SF even though N might be 1.  */
   FOR_EACH_MODE_IN_CLASS (mode, MODE_INT)
     if (hard_regno_nregs (regno, mode) == nregs
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
 	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
-	&& GET_MODE_SIZE (mode) > GET_MODE_SIZE (found_mode))
+	&& may_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_FLOAT)
@@ -646,7 +648,7 @@ choose_hard_reg_mode (unsigned int regno
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
 	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
-	&& GET_MODE_SIZE (mode) > GET_MODE_SIZE (found_mode))
+	&& may_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_FLOAT)
@@ -654,7 +656,7 @@ choose_hard_reg_mode (unsigned int regno
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
 	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
-	&& GET_MODE_SIZE (mode) > GET_MODE_SIZE (found_mode))
+	&& may_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
   FOR_EACH_MODE_IN_CLASS (mode, MODE_VECTOR_INT)
@@ -662,7 +664,7 @@ choose_hard_reg_mode (unsigned int regno
 	&& targetm.hard_regno_mode_ok (regno, mode)
 	&& (!call_saved
 	    || !targetm.hard_regno_call_part_clobbered (regno, mode))
-	&& GET_MODE_SIZE (mode) > GET_MODE_SIZE (found_mode))
+	&& may_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (found_mode)))
       found_mode = mode;
 
   if (found_mode != VOIDmode)
@@ -1299,8 +1301,8 @@ record_subregs_of_mode (rtx subreg, bool
 	 The size of the outer mode must ordered wrt the size of the
 	 inner mode's registers, since otherwise we wouldn't know at
 	 compile time how many registers the outer mode occupies.  */
-      poly_uint64 size = MAX (REGMODE_NATURAL_SIZE (shape.inner_mode),
-			      GET_MODE_SIZE (shape.outer_mode));
+      poly_uint64 size = ordered_max (REGMODE_NATURAL_SIZE (shape.inner_mode),
+				      GET_MODE_SIZE (shape.outer_mode));
       gcc_checking_assert (must_lt (size, GET_MODE_SIZE (shape.inner_mode)));
       if (must_ge (shape.offset, size))
 	shape.offset -= size;
Index: gcc/regrename.c
===================================================================
--- gcc/regrename.c	2017-09-12 14:28:56.397825010 +0100
+++ gcc/regrename.c	2017-10-23 17:25:59.444102950 +0100
@@ -1697,9 +1697,11 @@ build_def_use (basic_block bb)
 		     not already tracking such a reg, we won't start here,
 		     and we must instead make sure to make the operand visible
 		     to the machinery that tracks hard registers.  */
+		  machine_mode i_mode = recog_data.operand_mode[i];
+		  machine_mode matches_mode = recog_data.operand_mode[matches];
 		  if (matches >= 0
-		      && (GET_MODE_SIZE (recog_data.operand_mode[i])
-			  != GET_MODE_SIZE (recog_data.operand_mode[matches]))
+		      && may_ne (GET_MODE_SIZE (i_mode),
+				 GET_MODE_SIZE (matches_mode))
 		      && !verify_reg_in_set (op, &live_in_chains))
 		    {
 		      untracked_operands |= 1 << i;
Index: gcc/regstat.c
===================================================================
--- gcc/regstat.c	2017-02-23 19:54:03.000000000 +0000
+++ gcc/regstat.c	2017-10-23 17:25:59.444102950 +0100
@@ -436,8 +436,12 @@ dump_reg_info (FILE *file)
       else if (REG_N_CALLS_CROSSED (i))
 	fprintf (file, "; crosses %d calls", REG_N_CALLS_CROSSED (i));
       if (regno_reg_rtx[i] != NULL
-	  && PSEUDO_REGNO_BYTES (i) != UNITS_PER_WORD)
-	fprintf (file, "; %d bytes", PSEUDO_REGNO_BYTES (i));
+	  && may_ne (PSEUDO_REGNO_BYTES (i), UNITS_PER_WORD))
+	{
+	  fprintf (file, "; ");
+	  print_dec (PSEUDO_REGNO_BYTES (i), file, SIGNED);
+	  fprintf (file, " bytes");
+	}
 
       rclass = reg_preferred_class (i);
       altclass = reg_alternate_class (i);
Index: gcc/reload.c
===================================================================
--- gcc/reload.c	2017-10-23 17:25:57.266181235 +0100
+++ gcc/reload.c	2017-10-23 17:25:59.446102878 +0100
@@ -823,9 +823,11 @@ find_reusable_reload (rtx *p_in, rtx out
 complex_word_subreg_p (machine_mode outer_mode, rtx reg)
 {
   machine_mode inner_mode = GET_MODE (reg);
-  return (GET_MODE_SIZE (outer_mode) <= UNITS_PER_WORD
-	  && GET_MODE_SIZE (inner_mode) > UNITS_PER_WORD
-	  && GET_MODE_SIZE (inner_mode) / UNITS_PER_WORD != REG_NREGS (reg));
+  poly_uint64 reg_words = REG_NREGS (reg) * UNITS_PER_WORD;
+  return (must_le (GET_MODE_SIZE (outer_mode), UNITS_PER_WORD)
+	  && may_gt (GET_MODE_SIZE (inner_mode), UNITS_PER_WORD)
+	  && !known_equal_after_align_up (GET_MODE_SIZE (inner_mode),
+					  reg_words, UNITS_PER_WORD));
 }
 
 /* Return true if X is a SUBREG that will need reloading of its SUBREG_REG
@@ -1061,7 +1063,7 @@ push_reload (rtx in, rtx out, rtx *inloc
 		&& REGNO (SUBREG_REG (in)) >= FIRST_PSEUDO_REGISTER)
 	       || MEM_P (SUBREG_REG (in)))
 	      && (paradoxical_subreg_p (inmode, GET_MODE (SUBREG_REG (in)))
-		  || (GET_MODE_SIZE (inmode) <= UNITS_PER_WORD
+		  || (must_le (GET_MODE_SIZE (inmode), UNITS_PER_WORD)
 		      && is_a <scalar_int_mode> (GET_MODE (SUBREG_REG (in)),
 						 &inner_mode)
 		      && GET_MODE_SIZE (inner_mode) <= UNITS_PER_WORD
@@ -1069,9 +1071,10 @@ push_reload (rtx in, rtx out, rtx *inloc
 		      && LOAD_EXTEND_OP (inner_mode) != UNKNOWN)
 		  || (WORD_REGISTER_OPERATIONS
 		      && partial_subreg_p (inmode, GET_MODE (SUBREG_REG (in)))
-		      && ((GET_MODE_SIZE (inmode) - 1) / UNITS_PER_WORD ==
-			  ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (in))) - 1)
-			   / UNITS_PER_WORD)))))
+		      && (known_equal_after_align_down
+			  (GET_MODE_SIZE (inmode) - 1,
+			   GET_MODE_SIZE (GET_MODE (SUBREG_REG (in))) - 1,
+			   UNITS_PER_WORD)))))
 	  || (REG_P (SUBREG_REG (in))
 	      && REGNO (SUBREG_REG (in)) < FIRST_PSEUDO_REGISTER
 	      /* The case where out is nonzero
@@ -1099,7 +1102,8 @@ push_reload (rtx in, rtx out, rtx *inloc
 	  && MEM_P (in))
 	/* This is supposed to happen only for paradoxical subregs made by
 	   combine.c.  (SUBREG (MEM)) isn't supposed to occur other ways.  */
-	gcc_assert (GET_MODE_SIZE (GET_MODE (in)) <= GET_MODE_SIZE (inmode));
+	gcc_assert (must_le (GET_MODE_SIZE (GET_MODE (in)),
+			     GET_MODE_SIZE (inmode)));
 
       inmode = GET_MODE (in);
     }
@@ -1158,16 +1162,17 @@ push_reload (rtx in, rtx out, rtx *inloc
 	      && (paradoxical_subreg_p (outmode, GET_MODE (SUBREG_REG (out)))
 		  || (WORD_REGISTER_OPERATIONS
 		      && partial_subreg_p (outmode, GET_MODE (SUBREG_REG (out)))
-		      && ((GET_MODE_SIZE (outmode) - 1) / UNITS_PER_WORD ==
-			  ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (out))) - 1)
-			   / UNITS_PER_WORD)))))
+		      && (known_equal_after_align_down
+			  (GET_MODE_SIZE (outmode) - 1,
+			   GET_MODE_SIZE (GET_MODE (SUBREG_REG (out))) - 1,
+			   UNITS_PER_WORD)))))
 	  || (REG_P (SUBREG_REG (out))
 	      && REGNO (SUBREG_REG (out)) < FIRST_PSEUDO_REGISTER
 	      /* The case of a word mode subreg
 		 is handled differently in the following statement.  */
-	      && ! (GET_MODE_SIZE (outmode) <= UNITS_PER_WORD
-		    && (GET_MODE_SIZE (GET_MODE (SUBREG_REG (out)))
-		        > UNITS_PER_WORD))
+	      && ! (must_le (GET_MODE_SIZE (outmode), UNITS_PER_WORD)
+		    && may_gt (GET_MODE_SIZE (GET_MODE (SUBREG_REG (out))),
+			       UNITS_PER_WORD))
 	      && !targetm.hard_regno_mode_ok (subreg_regno (out), outmode))
 	  || (secondary_reload_class (0, rclass, outmode, out) != NO_REGS
 	      && (secondary_reload_class (0, rclass, GET_MODE (SUBREG_REG (out)),
@@ -1185,8 +1190,8 @@ push_reload (rtx in, rtx out, rtx *inloc
       outloc = &SUBREG_REG (out);
       out = *outloc;
       gcc_assert (WORD_REGISTER_OPERATIONS || !MEM_P (out)
-		  || GET_MODE_SIZE (GET_MODE (out))
-		     <= GET_MODE_SIZE (outmode));
+		  || must_le (GET_MODE_SIZE (GET_MODE (out)),
+			      GET_MODE_SIZE (outmode)));
       outmode = GET_MODE (out);
     }
 
@@ -1593,13 +1598,13 @@ push_reload (rtx in, rtx out, rtx *inloc
 	       What's going on here.  */
 	    && (in != out
 		|| (GET_CODE (in) == SUBREG
-		    && (((GET_MODE_SIZE (GET_MODE (in)) + (UNITS_PER_WORD - 1))
-			 / UNITS_PER_WORD)
-			== ((GET_MODE_SIZE (GET_MODE (SUBREG_REG (in)))
-			     + (UNITS_PER_WORD - 1)) / UNITS_PER_WORD))))
+		    && (known_equal_after_align_up
+			(GET_MODE_SIZE (GET_MODE (in)),
+			 GET_MODE_SIZE (GET_MODE (SUBREG_REG (in))),
+			 UNITS_PER_WORD))))
 	    /* Make sure the operand fits in the reg that dies.  */
-	    && (GET_MODE_SIZE (rel_mode)
-		<= GET_MODE_SIZE (GET_MODE (XEXP (note, 0))))
+	    && must_le (GET_MODE_SIZE (rel_mode),
+			GET_MODE_SIZE (GET_MODE (XEXP (note, 0))))
 	    && targetm.hard_regno_mode_ok (regno, inmode)
 	    && targetm.hard_regno_mode_ok (regno, outmode))
 	  {
@@ -1937,9 +1942,9 @@ find_dummy_reload (rtx real_in, rtx real
 
   /* If operands exceed a word, we can't use either of them
      unless they have the same size.  */
-  if (GET_MODE_SIZE (outmode) != GET_MODE_SIZE (inmode)
-      && (GET_MODE_SIZE (outmode) > UNITS_PER_WORD
-	  || GET_MODE_SIZE (inmode) > UNITS_PER_WORD))
+  if (may_ne (GET_MODE_SIZE (outmode), GET_MODE_SIZE (inmode))
+      && (may_gt (GET_MODE_SIZE (outmode), UNITS_PER_WORD)
+	  || may_gt (GET_MODE_SIZE (inmode), UNITS_PER_WORD)))
     return 0;
 
   /* Note that {in,out}_offset are needed only when 'in' or 'out'
@@ -2885,8 +2890,8 @@ find_reloads (rtx_insn *insn, int replac
 	  if (replace
 	      && MEM_P (op)
 	      && REG_P (reg)
-	      && (GET_MODE_SIZE (GET_MODE (reg))
-		  >= GET_MODE_SIZE (GET_MODE (op)))
+	      && must_ge (GET_MODE_SIZE (GET_MODE (reg)),
+			  GET_MODE_SIZE (GET_MODE (op)))
 	      && reg_equiv_constant (REGNO (reg)) == 0)
 	    set_unique_reg_note (emit_insn_before (gen_rtx_USE (VOIDmode, reg),
 						   insn),
@@ -3127,8 +3132,8 @@ find_reloads (rtx_insn *insn, int replac
 				   && (paradoxical_subreg_p
 				       (operand_mode[i], GET_MODE (operand)))))
 			      || BYTES_BIG_ENDIAN
-			      || ((GET_MODE_SIZE (operand_mode[i])
-				   <= UNITS_PER_WORD)
+			      || (must_le (GET_MODE_SIZE (operand_mode[i]),
+					   UNITS_PER_WORD)
 				  && (is_a <scalar_int_mode>
 				      (GET_MODE (operand), &inner_mode))
 				  && (GET_MODE_SIZE (inner_mode)
@@ -3625,7 +3630,7 @@ find_reloads (rtx_insn *insn, int replac
 
 	      if (! win && ! did_match
 		  && this_alternative[i] != NO_REGS
-		  && GET_MODE_SIZE (operand_mode[i]) <= UNITS_PER_WORD
+		  && must_le (GET_MODE_SIZE (operand_mode[i]), UNITS_PER_WORD)
 		  && reg_class_size [(int) preferred_class[i]] > 0
 		  && ! small_register_class_p (preferred_class[i]))
 		{
@@ -6146,8 +6151,9 @@ find_reloads_subreg_address (rtx x, int
 
   if (WORD_REGISTER_OPERATIONS
       && partial_subreg_p (outer_mode, inner_mode)
-      && ((GET_MODE_SIZE (outer_mode) - 1) / UNITS_PER_WORD
-          == (GET_MODE_SIZE (inner_mode) - 1) / UNITS_PER_WORD))
+      && known_equal_after_align_down (GET_MODE_SIZE (outer_mode) - 1,
+				       GET_MODE_SIZE (inner_mode) - 1,
+				       UNITS_PER_WORD))
     return NULL;
 
   /* Since we don't attempt to handle paradoxical subregs, we can just
Index: gcc/reload1.c
===================================================================
--- gcc/reload1.c	2017-10-23 17:25:57.267181199 +0100
+++ gcc/reload1.c	2017-10-23 17:25:59.449102770 +0100
@@ -2829,8 +2829,8 @@ eliminate_regs_1 (rtx x, machine_mode me
 
       if (new_rtx != SUBREG_REG (x))
 	{
-	  int x_size = GET_MODE_SIZE (GET_MODE (x));
-	  int new_size = GET_MODE_SIZE (GET_MODE (new_rtx));
+	  poly_int64 x_size = GET_MODE_SIZE (GET_MODE (x));
+	  poly_int64 new_size = GET_MODE_SIZE (GET_MODE (new_rtx));
 
 	  if (MEM_P (new_rtx)
 	      && ((partial_subreg_p (GET_MODE (x), GET_MODE (new_rtx))
@@ -2842,9 +2842,10 @@ eliminate_regs_1 (rtx x, machine_mode me
 		      So if the number of words is the same, preserve the
 		      subreg so that push_reload can see it.  */
 		   && !(WORD_REGISTER_OPERATIONS
-			&& (x_size - 1) / UNITS_PER_WORD
-			   == (new_size -1 ) / UNITS_PER_WORD))
-		  || x_size == new_size)
+			&& known_equal_after_align_down (x_size - 1,
+							 new_size - 1,
+							 UNITS_PER_WORD)))
+		  || must_eq (x_size, new_size))
 	      )
 	    return adjust_address_nv (new_rtx, GET_MODE (x), SUBREG_BYTE (x));
 	  else if (insn && GET_CODE (insn) == DEBUG_INSN)
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-10-23 17:25:54.182292086 +0100
+++ gcc/rtlanal.c	2017-10-23 17:25:59.450102734 +0100
@@ -3346,7 +3346,7 @@ for_each_inc_dec_find_inc_dec (rtx mem,
     case PRE_INC:
     case POST_INC:
       {
-	int size = GET_MODE_SIZE (GET_MODE (mem));
+	poly_int64 size = GET_MODE_SIZE (GET_MODE (mem));
 	rtx r1 = XEXP (x, 0);
 	rtx c = gen_int_mode (size, GET_MODE (r1));
 	return fn (mem, x, r1, r1, c, data);
@@ -3355,7 +3355,7 @@ for_each_inc_dec_find_inc_dec (rtx mem,
     case PRE_DEC:
     case POST_DEC:
       {
-	int size = GET_MODE_SIZE (GET_MODE (mem));
+	poly_int64 size = GET_MODE_SIZE (GET_MODE (mem));
 	rtx r1 = XEXP (x, 0);
 	rtx c = gen_int_mode (-size, GET_MODE (r1));
 	return fn (mem, x, r1, r1, c, data);
@@ -4194,7 +4194,7 @@ rtx_cost (rtx x, machine_mode mode, enum
 
   /* A size N times larger than UNITS_PER_WORD likely needs N times as
      many insns, taking N times as long.  */
-  factor = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+  factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD;
   if (factor == 0)
     factor = 1;
 
@@ -4225,7 +4225,7 @@ rtx_cost (rtx x, machine_mode mode, enum
       /* A SET doesn't have a mode, so let's look at the SET_DEST to get
 	 the mode for the factor.  */
       mode = GET_MODE (SET_DEST (x));
-      factor = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
+      factor = estimated_poly_value (GET_MODE_SIZE (mode)) / UNITS_PER_WORD;
       if (factor == 0)
 	factor = 1;
       /* FALLTHRU */
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 17:25:48.622491933 +0100
+++ gcc/simplify-rtx.c	2017-10-23 17:25:59.453102626 +0100
@@ -263,7 +263,7 @@ avoid_constant_pool_reference (rtx x)
          If that fails we have no choice but to return the original memory.  */
       if (offset == 0 && cmode == GET_MODE (x))
 	return c;
-      else if (offset >= 0 && offset < GET_MODE_SIZE (cmode))
+      else if (known_in_range_p (offset, 0, GET_MODE_SIZE (cmode)))
         {
           rtx tem = simplify_subreg (GET_MODE (x), c, cmode, offset);
           if (tem && CONSTANT_P (tem))
@@ -3813,13 +3813,13 @@ simplify_binary_operation_1 (enum rtx_co
 	  && GET_CODE (trueop0) == VEC_CONCAT)
 	{
 	  rtx vec = trueop0;
-	  int offset = INTVAL (XVECEXP (trueop1, 0, 0)) * GET_MODE_SIZE (mode);
+	  offset = INTVAL (XVECEXP (trueop1, 0, 0)) * GET_MODE_SIZE (mode);
 
 	  /* Try to find the element in the VEC_CONCAT.  */
 	  while (GET_MODE (vec) != mode
 		 && GET_CODE (vec) == VEC_CONCAT)
 	    {
-	      HOST_WIDE_INT vec_size;
+	      poly_int64 vec_size;
 
 	      if (CONST_INT_P (XEXP (vec, 0)))
 	        {
@@ -3834,13 +3834,15 @@ simplify_binary_operation_1 (enum rtx_co
 	      else
 	        vec_size = GET_MODE_SIZE (GET_MODE (XEXP (vec, 0)));
 
-	      if (offset < vec_size)
+	      if (must_lt (offset, vec_size))
 		vec = XEXP (vec, 0);
-	      else
+	      else if (must_ge (offset, vec_size))
 		{
 		  offset -= vec_size;
 		  vec = XEXP (vec, 1);
 		}
+	      else
+		break;
 	      vec = avoid_constant_pool_reference (vec);
 	    }
 
@@ -3909,8 +3911,9 @@ simplify_binary_operation_1 (enum rtx_co
 				      : GET_MODE_INNER (mode));
 
 	gcc_assert (VECTOR_MODE_P (mode));
-	gcc_assert (GET_MODE_SIZE (op0_mode) + GET_MODE_SIZE (op1_mode)
-		    == GET_MODE_SIZE (mode));
+	gcc_assert (must_eq (GET_MODE_SIZE (op0_mode)
+			     + GET_MODE_SIZE (op1_mode),
+			     GET_MODE_SIZE (mode)));
 
 	if (VECTOR_MODE_P (op0_mode))
 	  gcc_assert (GET_MODE_INNER (mode)
@@ -6199,10 +6202,12 @@ simplify_subreg (machine_mode outermode,
   gcc_assert (GET_MODE (op) == innermode
 	      || GET_MODE (op) == VOIDmode);
 
-  if (!multiple_p (byte, GET_MODE_SIZE (outermode)))
+  poly_uint64 outersize = GET_MODE_SIZE (outermode);
+  if (!multiple_p (byte, outersize))
     return NULL_RTX;
 
-  if (may_ge (byte, GET_MODE_SIZE (innermode)))
+  poly_uint64 innersize = GET_MODE_SIZE (innermode);
+  if (may_ge (byte, innersize))
     return NULL_RTX;
 
   if (outermode == innermode && known_zero (byte))
@@ -6247,6 +6252,7 @@ simplify_subreg (machine_mode outermode,
   if (GET_CODE (op) == SUBREG)
     {
       machine_mode innermostmode = GET_MODE (SUBREG_REG (op));
+      poly_uint64 innermostsize = GET_MODE_SIZE (innermostmode);
       rtx newx;
 
       if (outermode == innermostmode
@@ -6264,12 +6270,10 @@ simplify_subreg (machine_mode outermode,
       /* See whether resulting subreg will be paradoxical.  */
       if (!paradoxical_subreg_p (outermode, innermostmode))
 	{
-	  /* In nonparadoxical subregs we can't handle negative offsets.  */
-	  if (may_lt (final_offset, 0))
-	    return NULL_RTX;
 	  /* Bail out in case resulting subreg would be incorrect.  */
-	  if (!multiple_p (final_offset, GET_MODE_SIZE (outermode))
-	      || may_ge (final_offset, GET_MODE_SIZE (innermostmode)))
+	  if (may_lt (final_offset, 0)
+	      || may_ge (poly_uint64 (final_offset), innermostsize)
+	      || !multiple_p (final_offset, outersize))
 	    return NULL_RTX;
 	}
       else
@@ -6294,9 +6298,8 @@ simplify_subreg (machine_mode outermode,
 	  if (SUBREG_PROMOTED_VAR_P (op)
 	      && SUBREG_PROMOTED_SIGN (op) >= 0
 	      && GET_MODE_CLASS (outermode) == MODE_INT
-	      && IN_RANGE (GET_MODE_SIZE (outermode),
-			   GET_MODE_SIZE (innermode),
-			   GET_MODE_SIZE (innermostmode))
+	      && must_ge (outersize, innersize)
+	      && must_le (outersize, innermostsize)
 	      && subreg_lowpart_p (newx))
 	    {
 	      SUBREG_PROMOTED_VAR_P (newx) = 1;
@@ -6346,7 +6349,7 @@ simplify_subreg (machine_mode outermode,
          have instruction to move the whole thing.  */
       && (! MEM_VOLATILE_P (op)
 	  || ! have_insn_for (SET, innermode))
-      && GET_MODE_SIZE (outermode) <= GET_MODE_SIZE (GET_MODE (op)))
+      && must_le (outersize, innersize))
     return adjust_address_nv (op, outermode, byte);
 
   /* Handle complex or vector values represented as CONCAT or VEC_CONCAT
@@ -6354,14 +6357,13 @@ simplify_subreg (machine_mode outermode,
   if (GET_CODE (op) == CONCAT
       || GET_CODE (op) == VEC_CONCAT)
     {
-      unsigned int part_size;
       poly_uint64 final_offset;
       rtx part, res;
 
       machine_mode part_mode = GET_MODE (XEXP (op, 0));
       if (part_mode == VOIDmode)
 	part_mode = GET_MODE_INNER (GET_MODE (op));
-      part_size = GET_MODE_SIZE (part_mode);
+      poly_uint64 part_size = GET_MODE_SIZE (part_mode);
       if (must_lt (byte, part_size))
 	{
 	  part = XEXP (op, 0);
@@ -6375,7 +6377,7 @@ simplify_subreg (machine_mode outermode,
       else
 	return NULL_RTX;
 
-      if (may_gt (final_offset + GET_MODE_SIZE (outermode), part_size))
+      if (may_gt (final_offset + outersize, part_size))
 	return NULL_RTX;
 
       part_mode = GET_MODE (part);
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2017-10-23 17:25:57.267181199 +0100
+++ gcc/targhooks.c	2017-10-23 17:25:59.454102590 +0100
@@ -759,7 +759,9 @@ default_function_arg_padding (machine_mo
       size = int_size_in_bytes (type);
     }
   else
-    size = GET_MODE_SIZE (mode);
+    /* Targets with variable-sized modes must override this hook
+       and handle variable-sized modes explicitly.  */
+    size = GET_MODE_SIZE (mode).to_constant ();
 
   if (size < (PARM_BOUNDARY / BITS_PER_UNIT))
     return PAD_DOWNWARD;
@@ -1484,7 +1486,9 @@ default_addr_space_convert (rtx op ATTRI
 unsigned int
 default_hard_regno_nregs (unsigned int, machine_mode mode)
 {
-  return CEIL (GET_MODE_SIZE (mode), UNITS_PER_WORD);
+  /* Targets with variable-sized modes must provide their own definition
+     of this hook.  */
+  return CEIL (GET_MODE_SIZE (mode).to_constant (), UNITS_PER_WORD);
 }
 
 bool
@@ -1810,7 +1814,10 @@ default_class_max_nregs (reg_class_t rcl
   return (unsigned char) CLASS_MAX_NREGS ((enum reg_class) rclass,
 					  MACRO_MODE (mode));
 #else
-  return ((GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD);
+  /* Targets with variable-sized modes must provide their own definition
+     of this hook.  */
+  unsigned int size = GET_MODE_SIZE (mode).to_constant ();
+  return (size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
 #endif
 }
 
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-10-23 17:25:51.756379285 +0100
+++ gcc/tree-cfg.c	2017-10-23 17:25:59.456102519 +0100
@@ -4033,8 +4033,8 @@ verify_gimple_assign_binary (gassign *st
 		 || (!INTEGRAL_TYPE_P (lhs_type)
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
 	    || !useless_type_conversion_p (lhs_type, rhs2_type)
-	    || (GET_MODE_SIZE (element_mode (rhs2_type))
-		< 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
+	    || may_lt (GET_MODE_SIZE (element_mode (rhs2_type)),
+		       2 * GET_MODE_SIZE (element_mode (rhs1_type))))
           {
             error ("type mismatch in widening sum reduction");
             debug_generic_expr (lhs_type);
@@ -4053,8 +4053,8 @@ verify_gimple_assign_binary (gassign *st
         if (TREE_CODE (rhs1_type) != VECTOR_TYPE
             || TREE_CODE (lhs_type) != VECTOR_TYPE
 	    || !types_compatible_p (rhs1_type, rhs2_type)
-            || (GET_MODE_SIZE (element_mode (lhs_type))
-		!= 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
+            || may_ne (GET_MODE_SIZE (element_mode (lhs_type)),
+		       2 * GET_MODE_SIZE (element_mode (rhs1_type))))
           {
             error ("type mismatch in vector widening multiplication");
             debug_generic_expr (lhs_type);
@@ -4087,8 +4087,8 @@ verify_gimple_assign_binary (gassign *st
 		 || (INTEGRAL_TYPE_P (TREE_TYPE (rhs1_type))
 		     == INTEGRAL_TYPE_P (TREE_TYPE (lhs_type))))
 	    || !types_compatible_p (rhs1_type, rhs2_type)
-            || (GET_MODE_SIZE (element_mode (rhs1_type))
-		!= 2 * GET_MODE_SIZE (element_mode (lhs_type))))
+            || may_ne (GET_MODE_SIZE (element_mode (rhs1_type)),
+		       2 * GET_MODE_SIZE (element_mode (lhs_type))))
           {
             error ("type mismatch in vector pack expression");
             debug_generic_expr (lhs_type);
@@ -4385,8 +4385,8 @@ verify_gimple_assign_ternary (gassign *s
 		     && !SCALAR_FLOAT_TYPE_P (lhs_type))))
 	    || !types_compatible_p (rhs1_type, rhs2_type)
 	    || !useless_type_conversion_p (lhs_type, rhs3_type)
-	    || (GET_MODE_SIZE (element_mode (rhs3_type))
-		< 2 * GET_MODE_SIZE (element_mode (rhs1_type))))
+	    || may_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
+		       2 * GET_MODE_SIZE (element_mode (rhs1_type))))
           {
             error ("type mismatch in dot product reduction");
             debug_generic_expr (lhs_type);
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2017-10-23 17:11:39.942366965 +0100
+++ gcc/tree-inline.c	2017-10-23 17:25:59.458102447 +0100
@@ -3880,10 +3880,11 @@ estimate_move_cost (tree type, bool ARG_
   if (TREE_CODE (type) == VECTOR_TYPE)
     {
       scalar_mode inner = SCALAR_TYPE_MODE (TREE_TYPE (type));
-      machine_mode simd
-	= targetm.vectorize.preferred_simd_mode (inner);
-      int simd_mode_size = GET_MODE_SIZE (simd);
-      return ((GET_MODE_SIZE (TYPE_MODE (type)) + simd_mode_size - 1)
+      machine_mode simd = targetm.vectorize.preferred_simd_mode (inner);
+      int orig_mode_size
+	= estimated_poly_value (GET_MODE_SIZE (TYPE_MODE (type)));
+      int simd_mode_size = estimated_poly_value (GET_MODE_SIZE (simd));
+      return ((orig_mode_size + simd_mode_size - 1)
 	      / simd_mode_size);
     }
 
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2017-10-23 17:25:51.756379285 +0100
+++ gcc/tree-ssa-forwprop.c	2017-10-23 17:25:59.459102411 +0100
@@ -1988,8 +1988,8 @@ simplify_vector_constructor (gimple_stmt
 	  op1 = gimple_assign_rhs1 (def_stmt);
 	  if (conv_code == ERROR_MARK)
 	    {
-	      if (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (elt->value)))
-		  != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1))))
+	      if (may_ne (GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (elt->value))),
+			  GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1)))))
 		return false;
 	      conv_code = code;
 	    }
@@ -2062,8 +2062,8 @@ simplify_vector_constructor (gimple_stmt
 	= build_vector_type (build_nonstandard_integer_type (elem_size, 1),
 			     nelts);
       if (GET_MODE_CLASS (TYPE_MODE (mask_type)) != MODE_VECTOR_INT
-	  || GET_MODE_SIZE (TYPE_MODE (mask_type))
-	     != GET_MODE_SIZE (TYPE_MODE (type)))
+	  || may_ne (GET_MODE_SIZE (TYPE_MODE (mask_type)),
+		     GET_MODE_SIZE (TYPE_MODE (type))))
 	return false;
       auto_vec<tree, 32> mask_elts (nelts);
       for (i = 0; i < nelts; i++)
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	2017-10-23 17:22:22.298641645 +0100
+++ gcc/tree-ssa-loop-ivopts.c	2017-10-23 17:25:59.461102339 +0100
@@ -3151,10 +3151,10 @@ add_autoinc_candidates (struct ivopts_da
   mem_mode = TYPE_MODE (TREE_TYPE (*use->op_p));
   if (((USE_LOAD_PRE_INCREMENT (mem_mode)
 	|| USE_STORE_PRE_INCREMENT (mem_mode))
-       && GET_MODE_SIZE (mem_mode) == cstepi)
+       && must_eq (GET_MODE_SIZE (mem_mode), cstepi))
       || ((USE_LOAD_PRE_DECREMENT (mem_mode)
 	   || USE_STORE_PRE_DECREMENT (mem_mode))
-	  && GET_MODE_SIZE (mem_mode) == -cstepi))
+	  && must_eq (GET_MODE_SIZE (mem_mode), -cstepi)))
     {
       enum tree_code code = MINUS_EXPR;
       tree new_base;
@@ -3173,10 +3173,10 @@ add_autoinc_candidates (struct ivopts_da
     }
   if (((USE_LOAD_POST_INCREMENT (mem_mode)
 	|| USE_STORE_POST_INCREMENT (mem_mode))
-       && GET_MODE_SIZE (mem_mode) == cstepi)
+       && must_eq (GET_MODE_SIZE (mem_mode), cstepi))
       || ((USE_LOAD_POST_DECREMENT (mem_mode)
 	   || USE_STORE_POST_DECREMENT (mem_mode))
-	  && GET_MODE_SIZE (mem_mode) == -cstepi))
+	  && must_eq (GET_MODE_SIZE (mem_mode), -cstepi)))
     {
       add_candidate_1 (data, base, step, important, IP_AFTER_USE, use,
 		       use->stmt);
@@ -4298,7 +4298,7 @@ get_address_cost_ainc (poly_int64 ainc_s
       ainc_cost_data_list[idx] = data;
     }
 
-  HOST_WIDE_INT msize = GET_MODE_SIZE (mem_mode);
+  poly_int64 msize = GET_MODE_SIZE (mem_mode);
   if (known_zero (ainc_offset) && must_eq (msize, ainc_step))
     return comp_cost (data->costs[AINC_POST_INC], 0);
   if (known_zero (ainc_offset) && must_eq (msize, -ainc_step))
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-10-23 17:25:51.758379213 +0100
+++ gcc/tree-vect-data-refs.c	2017-10-23 17:25:59.463102267 +0100
@@ -2138,11 +2138,22 @@ vect_enhance_data_refs_alignment (loop_v
               vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt));
               gcc_assert (vectype);
 
+	      /* At present we don't support versioning for alignment
+		 with variable VF, since there's no guarantee that the
+		 VF is a power of two.  We could relax this if we added
+		 a way of enforcing a power-of-two size.  */
+	      unsigned HOST_WIDE_INT size;
+	      if (!GET_MODE_SIZE (TYPE_MODE (vectype)).is_constant (&size))
+		{
+		  do_versioning = false;
+		  break;
+		}
+
               /* The rightmost bits of an aligned address must be zeros.
                  Construct the mask needed for this test.  For example,
                  GET_MODE_SIZE for the vector mode V4SI is 16 bytes so the
                  mask must be 15 = 0xf. */
-              mask = GET_MODE_SIZE (TYPE_MODE (vectype)) - 1;
+	      mask = size - 1;
 
               /* FORNOW: use the same mask to test all potentially unaligned
                  references in the loop.  The vectorizer currently supports
@@ -6013,8 +6024,8 @@ vect_supportable_dr_alignment (struct da
 	    ;
 	  else if (!loop_vinfo
 		   || (nested_in_vect_loop
-		       && (TREE_INT_CST_LOW (DR_STEP (dr))
-			   != GET_MODE_SIZE (TYPE_MODE (vectype)))))
+		       && may_ne (TREE_INT_CST_LOW (DR_STEP (dr)),
+				  GET_MODE_SIZE (TYPE_MODE (vectype)))))
 	    return dr_explicit_realign;
 	  else
 	    return dr_explicit_realign_optimized;
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-10-23 17:25:51.761379105 +0100
+++ gcc/tree-vect-loop.c	2017-10-23 17:25:59.465102195 +0100
@@ -524,8 +524,8 @@ vect_determine_vectorization_factor (loo
 	      return false;
 	    }
 
-	  if ((GET_MODE_SIZE (TYPE_MODE (vectype))
-	       != GET_MODE_SIZE (TYPE_MODE (vf_vectype))))
+	  if (may_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
+		      GET_MODE_SIZE (TYPE_MODE (vf_vectype))))
 	    {
 	      if (dump_enabled_p ())
 		{
@@ -6076,7 +6076,7 @@ vectorizable_reduction (gimple *stmt, gi
           if (dump_enabled_p ())
             dump_printf (MSG_NOTE, "op not supported by target.\n");
 
-          if (GET_MODE_SIZE (vec_mode) != UNITS_PER_WORD
+          if (may_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
 	      || !vect_worthwhile_without_simd_p (loop_vinfo, code))
             return false;
 
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-10-23 17:25:57.269181128 +0100
+++ gcc/tree-vect-stmts.c	2017-10-23 17:25:59.468102087 +0100
@@ -4766,8 +4766,8 @@ vectorizable_assignment (gimple *stmt, g
        || code == VIEW_CONVERT_EXPR)
       && (!vectype_in
 	  || may_ne (TYPE_VECTOR_SUBPARTS (vectype_in), nunits)
-	  || (GET_MODE_SIZE (TYPE_MODE (vectype))
-	      != GET_MODE_SIZE (TYPE_MODE (vectype_in)))))
+	  || may_ne (GET_MODE_SIZE (TYPE_MODE (vectype)),
+		     GET_MODE_SIZE (TYPE_MODE (vectype_in)))))
     return false;
 
   /* We do not handle bit-precision changes.  */
@@ -5137,7 +5137,7 @@ vectorizable_shift (gimple *stmt, gimple
         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                          "op not supported by target.\n");
       /* Check only during analysis.  */
-      if (GET_MODE_SIZE (vec_mode) != UNITS_PER_WORD
+      if (may_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
 	  || (!vec_stmt
 	      && !vect_worthwhile_without_simd_p (vinfo, code)))
         return false;
@@ -5459,7 +5459,7 @@ vectorizable_operation (gimple *stmt, gi
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
                          "op not supported by target.\n");
       /* Check only during analysis.  */
-      if (GET_MODE_SIZE (vec_mode) != UNITS_PER_WORD
+      if (may_ne (GET_MODE_SIZE (vec_mode), UNITS_PER_WORD)
 	  || (!vec_stmt && !vect_worthwhile_without_simd_p (vinfo, code)))
         return false;
       if (dump_enabled_p ())
@@ -7456,7 +7456,8 @@ vectorizable_load (gimple *stmt, gimple_
      nested within an outer-loop that is being vectorized.  */
 
   if (nested_in_vect_loop
-      && (DR_STEP_ALIGNMENT (dr) % GET_MODE_SIZE (TYPE_MODE (vectype))) != 0)
+      && !multiple_p (DR_STEP_ALIGNMENT (dr),
+		      GET_MODE_SIZE (TYPE_MODE (vectype))))
     {
       gcc_assert (alignment_support_scheme != dr_explicit_realign_optimized);
       compute_in_loop = true;
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 17:25:51.771378746 +0100
+++ gcc/tree.c	2017-10-23 17:25:59.471101979 +0100
@@ -10622,9 +10622,9 @@ build_same_sized_truth_vector_type (tree
   if (VECTOR_BOOLEAN_TYPE_P (vectype))
     return vectype;
 
-  unsigned HOST_WIDE_INT size = GET_MODE_SIZE (TYPE_MODE (vectype));
+  poly_uint64 size = GET_MODE_SIZE (TYPE_MODE (vectype));
 
-  if (!size)
+  if (known_zero (size))
     size = tree_to_uhwi (TYPE_SIZE_UNIT (vectype));
 
   return build_truth_vector_type (TYPE_VECTOR_SUBPARTS (vectype), size);
Index: gcc/valtrack.c
===================================================================
--- gcc/valtrack.c	2017-10-23 17:25:57.269181128 +0100
+++ gcc/valtrack.c	2017-10-23 17:25:59.472101944 +0100
@@ -94,13 +94,15 @@ cleanup_auto_inc_dec (rtx src, machine_m
 
     case PRE_INC:
     case PRE_DEC:
-      gcc_assert (mem_mode != VOIDmode && mem_mode != BLKmode);
-      return gen_rtx_PLUS (GET_MODE (x),
-			   cleanup_auto_inc_dec (XEXP (x, 0), mem_mode),
-			   gen_int_mode (code == PRE_INC
-					 ? GET_MODE_SIZE (mem_mode)
-					 : -GET_MODE_SIZE (mem_mode),
-					 GET_MODE (x)));
+      {
+	gcc_assert (mem_mode != VOIDmode && mem_mode != BLKmode);
+	poly_int64 offset = GET_MODE_SIZE (mem_mode);
+	if (code == PRE_DEC)
+	  offset = -offset;
+	return gen_rtx_PLUS (GET_MODE (x),
+			     cleanup_auto_inc_dec (XEXP (x, 0), mem_mode),
+			     gen_int_mode (offset, GET_MODE (x)));
+      }
 
     case POST_INC:
     case POST_DEC:
Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c	2017-10-23 17:25:40.610779914 +0100
+++ gcc/var-tracking.c	2017-10-23 17:25:59.475101836 +0100
@@ -8685,7 +8685,7 @@ emit_note_insn_var_location (variable **
     {
       machine_mode mode, wider_mode;
       rtx loc2;
-      HOST_WIDE_INT offset;
+      HOST_WIDE_INT offset, size, wider_size;
 
       if (i == 0 && var->onepart)
 	{
@@ -8740,7 +8740,14 @@ emit_note_insn_var_location (variable **
       mode = GET_MODE (var->var_part[i].cur_loc);
       if (mode == VOIDmode && var->onepart)
 	mode = DECL_MODE (decl);
-      last_limit = offsets[n_var_parts] + GET_MODE_SIZE (mode);
+      /* We ony track subparts of constant-sized objects, since at present
+	 there's no representation for polynomial pieces.  */
+      if (!GET_MODE_SIZE (mode).is_constant (&size))
+	{
+	  complete = false;
+	  continue;
+	}
+      last_limit = offsets[n_var_parts] + size;
 
       /* Attempt to merge adjacent registers or memory.  */
       for (j = i + 1; j < var->n_var_parts; j++)
@@ -8748,6 +8755,7 @@ emit_note_insn_var_location (variable **
 	  break;
       if (j < var->n_var_parts
 	  && GET_MODE_WIDER_MODE (mode).exists (&wider_mode)
+	  && GET_MODE_SIZE (wider_mode).is_constant (&wider_size)
 	  && var->var_part[j].cur_loc
 	  && mode == GET_MODE (var->var_part[j].cur_loc)
 	  && (REG_P (loc[n_var_parts]) || MEM_P (loc[n_var_parts]))
@@ -8785,14 +8793,12 @@ emit_note_insn_var_location (variable **
 	      if ((REG_P (XEXP (loc[n_var_parts], 0))
 		   && rtx_equal_p (XEXP (loc[n_var_parts], 0),
 				   XEXP (XEXP (loc2, 0), 0))
-		   && INTVAL (XEXP (XEXP (loc2, 0), 1))
-		      == GET_MODE_SIZE (mode))
+		   && INTVAL (XEXP (XEXP (loc2, 0), 1)) == size)
 		  || (GET_CODE (XEXP (loc[n_var_parts], 0)) == PLUS
 		      && CONST_INT_P (XEXP (XEXP (loc[n_var_parts], 0), 1))
 		      && rtx_equal_p (XEXP (XEXP (loc[n_var_parts], 0), 0),
 				      XEXP (XEXP (loc2, 0), 0))
-		      && INTVAL (XEXP (XEXP (loc[n_var_parts], 0), 1))
-			 + GET_MODE_SIZE (mode)
+		      && INTVAL (XEXP (XEXP (loc[n_var_parts], 0), 1)) + size
 			 == INTVAL (XEXP (XEXP (loc2, 0), 1))))
 		new_loc = adjust_address_nv (loc[n_var_parts],
 					     wider_mode, 0);
@@ -8802,7 +8808,7 @@ emit_note_insn_var_location (variable **
 	    {
 	      loc[n_var_parts] = new_loc;
 	      mode = wider_mode;
-	      last_limit = offsets[n_var_parts] + GET_MODE_SIZE (mode);
+	      last_limit = offsets[n_var_parts] + wider_size;
 	      i = j;
 	    }
 	}
Index: gcc/config/arc/arc.h
===================================================================
--- gcc/config/arc/arc.h	2017-10-23 11:41:22.922632005 +0100
+++ gcc/config/arc/arc.h	2017-10-23 17:25:59.400104531 +0100
@@ -1297,7 +1297,8 @@ #define ASM_OUTPUT_CASE_END(FILE, NUM, J
   do                                                    \
     {                                                   \
       if (GET_CODE (PATTERN (JUMPTABLE)) == ADDR_DIFF_VEC \
-	  && ((GET_MODE_SIZE (GET_MODE (PATTERN (JUMPTABLE))) \
+	  && ((GET_MODE_SIZE (as_a <scalar_int_mode>	\
+			      (GET_MODE (PATTERN (JUMPTABLE)))) \
 	       * XVECLEN (PATTERN (JUMPTABLE), 1) + 1)	\
 	      & 2))					\
       arc_toggle_unalign ();				\
@@ -1408,7 +1409,8 @@ #define CASE_VECTOR_SHORTEN_MODE_1(MIN_O
  : SImode)
 
 #define ADDR_VEC_ALIGN(VEC_INSN) \
-  (exact_log2 (GET_MODE_SIZE (GET_MODE (PATTERN (VEC_INSN)))))
+  (exact_log2 (GET_MODE_SIZE (as_a <scalar_int_mode> \
+			      (GET_MODE (PATTERN (VEC_INSN))))))
 #undef ASM_OUTPUT_BEFORE_CASE_LABEL
 #define ASM_OUTPUT_BEFORE_CASE_LABEL(FILE, PREFIX, NUM, TABLE) \
   ASM_OUTPUT_ALIGN ((FILE), ADDR_VEC_ALIGN (TABLE));

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [081/nnn] poly_int: brig vector elements
  2017-10-23 17:33 ` [081/nnn] poly_int: brig vector elements Richard Sandiford
@ 2017-10-24  7:10   ` Pekka Jääskeläinen
  0 siblings, 0 replies; 302+ messages in thread
From: Pekka Jääskeläinen @ 2017-10-24  7:10 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford, Martin Jambor

Hi Richard,

Indeed, HSAIL doesn't so far support variable length vectors.
If it ever will, there will be wider changes needed anyways.

So, this patch LGTM.

Pekka,
A HSA/BRIG maintainer


On Mon, Oct 23, 2017 at 7:32 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a brig-specific wrapper around TYPE_VECTOR_SUBPARTS,
> since presumably it will never need to support variable vector lengths.
>
>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/brig/
>         * brigfrontend/brig-util.h (gccbrig_type_vector_subparts): New
>         function.
>         * brigfrontend/brig-basic-inst-handler.cc
>         (brig_basic_inst_handler::build_shuffle): Use it instead of
>         TYPE_VECTOR_SUBPARTS.
>         (brig_basic_inst_handler::build_unpack): Likewise.
>         (brig_basic_inst_handler::build_pack): Likewise.
>         (brig_basic_inst_handler::build_unpack_lo_or_hi): Likewise.
>         (brig_basic_inst_handler::operator ()): Likewise.
>         (brig_basic_inst_handler::build_lower_element_broadcast): Likewise.
>         * brigfrontend/brig-code-entry-handler.cc
>         (brig_code_entry_handler::get_tree_cst_for_hsa_operand): Likewise.
>         (brig_code_entry_handler::get_comparison_result_type): Likewise.
>         (brig_code_entry_handler::expand_or_call_builtin): Likewise.
>
> Index: gcc/brig/brigfrontend/brig-util.h
> ===================================================================
> --- gcc/brig/brigfrontend/brig-util.h   2017-10-02 09:10:56.960755788 +0100
> +++ gcc/brig/brigfrontend/brig-util.h   2017-10-23 17:22:46.882758777 +0100
> @@ -76,4 +76,12 @@ bool gccbrig_might_be_host_defined_var_p
>  /* From hsa.h.  */
>  bool hsa_type_packed_p (BrigType16_t type);
>
> +/* Return the number of elements in a VECTOR_TYPE.  BRIG does not support
> +   variable-length vectors.  */
> +inline unsigned HOST_WIDE_INT
> +gccbrig_type_vector_subparts (const_tree type)
> +{
> +  return TYPE_VECTOR_SUBPARTS (type);
> +}
> +
>  #endif
> Index: gcc/brig/brigfrontend/brig-basic-inst-handler.cc
> ===================================================================
> --- gcc/brig/brigfrontend/brig-basic-inst-handler.cc    2017-08-10 14:36:07.092506123 +0100
> +++ gcc/brig/brigfrontend/brig-basic-inst-handler.cc    2017-10-23 17:22:46.882758777 +0100
> @@ -97,9 +97,10 @@ brig_basic_inst_handler::build_shuffle (
>       output elements can originate from any input element.  */
>    vec<constructor_elt, va_gc> *mask_offset_vals = NULL;
>
> +  unsigned int element_count = gccbrig_type_vector_subparts (arith_type);
> +
>    vec<constructor_elt, va_gc> *input_mask_vals = NULL;
> -  size_t input_mask_element_size
> -    = exact_log2 (TYPE_VECTOR_SUBPARTS (arith_type));
> +  size_t input_mask_element_size = exact_log2 (element_count);
>
>    /* Unpack the tightly packed mask elements to BIT_FIELD_REFs
>       from which to construct the mask vector as understood by
> @@ -109,7 +110,7 @@ brig_basic_inst_handler::build_shuffle (
>    tree mask_element_type
>      = build_nonstandard_integer_type (input_mask_element_size, true);
>
> -  for (size_t i = 0; i < TYPE_VECTOR_SUBPARTS (arith_type); ++i)
> +  for (size_t i = 0; i < element_count; ++i)
>      {
>        tree mask_element
>         = build3 (BIT_FIELD_REF, mask_element_type, mask_operand,
> @@ -119,17 +120,15 @@ brig_basic_inst_handler::build_shuffle (
>        mask_element = convert (element_type, mask_element);
>
>        tree offset;
> -      if (i < TYPE_VECTOR_SUBPARTS (arith_type) / 2)
> +      if (i < element_count / 2)
>         offset = build_int_cst (element_type, 0);
>        else
> -       offset
> -         = build_int_cst (element_type, TYPE_VECTOR_SUBPARTS (arith_type));
> +       offset = build_int_cst (element_type, element_count);
>
>        CONSTRUCTOR_APPEND_ELT (mask_offset_vals, NULL_TREE, offset);
>        CONSTRUCTOR_APPEND_ELT (input_mask_vals, NULL_TREE, mask_element);
>      }
> -  tree mask_vec_type
> -    = build_vector_type (element_type, TYPE_VECTOR_SUBPARTS (arith_type));
> +  tree mask_vec_type = build_vector_type (element_type, element_count);
>
>    tree mask_vec = build_constructor (mask_vec_type, input_mask_vals);
>    tree offset_vec = build_constructor (mask_vec_type, mask_offset_vals);
> @@ -158,7 +157,8 @@ brig_basic_inst_handler::build_unpack (t
>    vec<constructor_elt, va_gc> *input_mask_vals = NULL;
>    vec<constructor_elt, va_gc> *and_mask_vals = NULL;
>
> -  size_t element_count = TYPE_VECTOR_SUBPARTS (TREE_TYPE (operands[0]));
> +  size_t element_count
> +    = gccbrig_type_vector_subparts (TREE_TYPE (operands[0]));
>    tree vec_type = build_vector_type (element_type, element_count);
>
>    for (size_t i = 0; i < element_count; ++i)
> @@ -213,7 +213,7 @@ brig_basic_inst_handler::build_pack (tre
>       TODO: Reuse this for implementing 'bitinsert'
>       without a builtin call.  */
>
> -  size_t ecount = TYPE_VECTOR_SUBPARTS (TREE_TYPE (operands[0]));
> +  size_t ecount = gccbrig_type_vector_subparts (TREE_TYPE (operands[0]));
>    size_t vecsize = int_size_in_bytes (TREE_TYPE (operands[0])) * BITS_PER_UNIT;
>    tree wide_type = build_nonstandard_integer_type (vecsize, 1);
>
> @@ -275,9 +275,10 @@ brig_basic_inst_handler::build_unpack_lo
>  {
>    tree element_type = get_unsigned_int_type (TREE_TYPE (arith_type));
>    tree mask_vec_type
> -    = build_vector_type (element_type, TYPE_VECTOR_SUBPARTS (arith_type));
> +    = build_vector_type (element_type,
> +                        gccbrig_type_vector_subparts (arith_type));
>
> -  size_t element_count = TYPE_VECTOR_SUBPARTS (arith_type);
> +  size_t element_count = gccbrig_type_vector_subparts (arith_type);
>    vec<constructor_elt, va_gc> *input_mask_vals = NULL;
>
>    size_t offset = (brig_opcode == BRIG_OPCODE_UNPACKLO) ? 0 : element_count / 2;
> @@ -600,8 +601,8 @@ brig_basic_inst_handler::operator () (co
>         }
>
>        size_t promoted_type_size = int_size_in_bytes (promoted_type) * 8;
> -
> -      for (size_t i = 0; i < TYPE_VECTOR_SUBPARTS (arith_type); ++i)
> +      size_t element_count = gccbrig_type_vector_subparts (arith_type);
> +      for (size_t i = 0; i < element_count; ++i)
>         {
>           tree operand0 = convert (promoted_type, operand0_elements.at (i));
>           tree operand1 = convert (promoted_type, operand1_elements.at (i));
> @@ -708,7 +709,8 @@ brig_basic_inst_handler::build_lower_ele
>    tree element_type = TREE_TYPE (TREE_TYPE (vec_operand));
>    size_t esize = 8 * int_size_in_bytes (element_type);
>
> -  size_t element_count = TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec_operand));
> +  size_t element_count
> +    = gccbrig_type_vector_subparts (TREE_TYPE (vec_operand));
>    tree mask_inner_type = build_nonstandard_integer_type (esize, 1);
>    vec<constructor_elt, va_gc> *constructor_vals = NULL;
>
> Index: gcc/brig/brigfrontend/brig-code-entry-handler.cc
> ===================================================================
> --- gcc/brig/brigfrontend/brig-code-entry-handler.cc    2017-10-02 09:10:56.960755788 +0100
> +++ gcc/brig/brigfrontend/brig-code-entry-handler.cc    2017-10-23 17:22:46.882758777 +0100
> @@ -641,7 +641,8 @@ brig_code_entry_handler::get_tree_cst_fo
>         {
>           /* In case of vector type elements (or sole vectors),
>              create a vector ctor.  */
> -         size_t element_count = TYPE_VECTOR_SUBPARTS (tree_element_type);
> +         size_t element_count
> +           = gccbrig_type_vector_subparts (tree_element_type);
>           if (bytes_left < scalar_element_size * element_count)
>             fatal_error (UNKNOWN_LOCATION,
>                          "Not enough bytes left for the initializer "
> @@ -844,7 +845,7 @@ brig_code_entry_handler::get_comparison_
>        size_t element_size = int_size_in_bytes (TREE_TYPE (source_type));
>        return build_vector_type
>         (build_nonstandard_boolean_type (element_size * BITS_PER_UNIT),
> -        TYPE_VECTOR_SUBPARTS (source_type));
> +        gccbrig_type_vector_subparts (source_type));
>      }
>    else
>      return gccbrig_tree_type_for_hsa_type (BRIG_TYPE_B1);
> @@ -949,7 +950,8 @@ brig_code_entry_handler::expand_or_call_
>
>        tree_stl_vec result_elements;
>
> -      for (size_t i = 0; i < TYPE_VECTOR_SUBPARTS (arith_type); ++i)
> +      size_t element_count = gccbrig_type_vector_subparts (arith_type);
> +      for (size_t i = 0; i < element_count; ++i)
>         {
>           tree_stl_vec call_operands;
>           if (operand0_elements.size () > 0)

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS
  2017-10-23 17:42 ` [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS Richard Sandiford
@ 2017-10-24  9:06   ` Richard Biener
  2017-10-24  9:40     ` Richard Sandiford
  2017-12-06  2:31   ` Jeff Law
  1 sibling, 1 reply; 302+ messages in thread
From: Richard Biener @ 2017-10-24  9:06 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 7:41 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64.  The value is
> encoded in the 10-bit precision field and was previously always stored
> as a simple log2 value.  The challenge was to use this 10 bits to
> encode the number of elements in variable-length vectors, so that
> we didn't need to increase the size of the tree.
>
> In practice the number of vector elements should always have the form
> N + N * X (where X is the runtime value), and as for constant-length
> vectors, N must be a power of 2 (even though X itself might not be).
> The patch therefore uses the low bit to select between constant-length
> and variable-length and uses the upper 9 bits to encode log2(N).
> Targets without variable-length vectors continue to use the old scheme.
>
> A new valid_vector_subparts_p function tests whether a given number
> of elements can be encoded.  This is false for the vector modes that
> represent an LD3 or ST3 vector triple (which we want to treat as arrays
> of vectors rather than single vectors).
>
> Most of the patch is mechanical; previous patches handled the changes
> that weren't entirely straightforward.

One comment, w/o actually reviewing may/must stuff (will comment on that
elsewhere).

You split 10 bits into 9 and 1, wouldn't it be more efficient to use the
lower 8 bits for the log2 value of N and either of the two remaining bits
for the flag?  That way the 8 bits for the shift amount can be eventually
accessed in a more efficient way.

Guess you'd need to compare code-generation of the TYPE_VECTOR_SUBPARTS
accessor on aarch64 / x86_64.

Am I correct that NUM_POLY_INT_COEFFS is 1 for targets that do not
have variable length vector modes?

Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle
>         polynomial numbers of units.
>         (SET_TYPE_VECTOR_SUBPARTS): Likewise.
>         (valid_vector_subparts_p): New function.
>         (build_vector_type): Remove temporary shim and take the number
>         of units as a poly_uint64 rather than an int.
>         (build_opaque_vector_type): Take the number of units as a
>         poly_uint64 rather than an int.
>         * tree.c (build_vector): Handle polynomial TYPE_VECTOR_SUBPARTS.
>         (build_vector_from_ctor, type_hash_canon_hash): Likewise.
>         (type_cache_hasher::equal, uniform_vector_p): Likewise.
>         (vector_type_mode): Likewise.
>         (build_vector_from_val): If the number of units isn't constant,
>         use build_vec_duplicate_cst for constant operands and
>         VEC_DUPLICATE_EXPR otherwise.
>         (make_vector_type): Remove temporary is_constant ().
>         (build_vector_type, build_opaque_vector_type): Take the number of
>         units as a poly_uint64 rather than an int.
>         * cfgexpand.c (expand_debug_expr): Handle polynomial
>         TYPE_VECTOR_SUBPARTS.
>         * expr.c (count_type_elements, store_constructor): Likewise.
>         * fold-const.c (const_binop, const_unop, fold_convert_const)
>         (operand_equal_p, fold_view_convert_expr, fold_vec_perm)
>         (fold_ternary_loc, fold_relational_const): Likewise.
>         (native_interpret_vector): Likewise.  Change the size from an
>         int to an unsigned int.
>         * gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial
>         TYPE_VECTOR_SUBPARTS.
>         (gimple_fold_indirect_ref, gimple_build_vector): Likewise.
>         (gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when
>         duplicating a non-constant operand into a variable-length vector.
>         * match.pd: Handle polynomial TYPE_VECTOR_SUBPARTS.
>         * omp-simd-clone.c (simd_clone_subparts): Likewise.
>         * print-tree.c (print_node): Likewise.
>         * stor-layout.c (layout_type): Likewise.
>         * targhooks.c (default_builtin_vectorization_cost): Likewise.
>         * tree-cfg.c (verify_gimple_comparison): Likewise.
>         (verify_gimple_assign_binary): Likewise.
>         (verify_gimple_assign_ternary): Likewise.
>         (verify_gimple_assign_single): Likewise.
>         * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
>         * tree-vect-data-refs.c (vect_permute_store_chain): Likewise.
>         (vect_grouped_load_supported, vect_permute_load_chain): Likewise.
>         (vect_shift_permute_load_chain): Likewise.
>         * tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise.
>         (expand_vector_condition, optimize_vector_constructor): Likewise.
>         (lower_vec_perm, get_compute_type): Likewise.
>         * tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
>         (get_initial_defs_for_reduction, vect_transform_loop): Likewise.
>         * tree-vect-patterns.c (vect_recog_bool_pattern): Likewise.
>         (vect_recog_mask_conversion_pattern): Likewise.
>         * tree-vect-slp.c (vect_supported_load_permutation_p): Likewise.
>         (vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise.
>         * tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
>         (get_group_load_store_type, vectorizable_mask_load_store): Likewise.
>         (vectorizable_bswap, simd_clone_subparts, vectorizable_assignment)
>         (vectorizable_shift, vectorizable_operation, vectorizable_store)
>         (vect_gen_perm_mask_any, vectorizable_load, vect_is_simple_cond)
>         (vectorizable_comparison, supportable_widening_operation): Likewise.
>         (supportable_narrowing_operation): Likewise.
>
> gcc/ada/
>         * gcc-interface/utils.c (gnat_types_compatible_p): Handle
>         polynomial TYPE_VECTOR_SUBPARTS.
>
> gcc/brig/
>         * brigfrontend/brig-to-generic.cc (get_unsigned_int_type): Handle
>         polynomial TYPE_VECTOR_SUBPARTS.
>         * brigfrontend/brig-util.h (gccbrig_type_vector_subparts): Likewise.
>
> gcc/c-family/
>         * c-common.c (vector_types_convertible_p, c_build_vec_perm_expr)
>         (convert_vector_to_array_for_subscript): Handle polynomial
>         TYPE_VECTOR_SUBPARTS.
>         (c_common_type_for_mode): Check valid_vector_subparts_p.
>
> gcc/c/
>         * c-typeck.c (comptypes_internal, build_binary_op): Handle polynomial
>         TYPE_VECTOR_SUBPARTS.
>
> gcc/cp/
>         * call.c (build_conditional_expr_1): Handle polynomial
>         TYPE_VECTOR_SUBPARTS.
>         * constexpr.c (cxx_fold_indirect_ref): Likewise.
>         * decl.c (cp_finish_decomp): Likewise.
>         * mangle.c (write_type): Likewise.
>         * typeck.c (structural_comptypes): Likewise.
>         (cp_build_binary_op): Likewise.
>         * typeck2.c (process_init_constructor_array): Likewise.
>
> gcc/fortran/
>         * trans-types.c (gfc_type_for_mode): Check valid_vector_subparts_p.
>
> gcc/lto/
>         * lto-lang.c (lto_type_for_mode): Check valid_vector_subparts_p.
>         * lto.c (hash_canonical_type): Handle polynomial TYPE_VECTOR_SUBPARTS.
>
> gcc/go/
>         * go-lang.c (go_langhook_type_for_mode): Check valid_vector_subparts_p.
>
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h  2017-10-23 17:22:35.831905077 +0100
> +++ gcc/tree.h  2017-10-23 17:25:51.773378674 +0100
> @@ -2041,15 +2041,6 @@ #define TREE_VISITED(NODE) ((NODE)->base
>     If set in a INTEGER_TYPE, indicates a character type.  */
>  #define TYPE_STRING_FLAG(NODE) (TYPE_CHECK (NODE)->type_common.string_flag)
>
> -/* For a VECTOR_TYPE, this is the number of sub-parts of the vector.  */
> -#define TYPE_VECTOR_SUBPARTS(VECTOR_TYPE) \
> -  (HOST_WIDE_INT_1U \
> -   << VECTOR_TYPE_CHECK (VECTOR_TYPE)->type_common.precision)
> -
> -/* Set precision to n when we have 2^n sub-parts of the vector.  */
> -#define SET_TYPE_VECTOR_SUBPARTS(VECTOR_TYPE, X) \
> -  (VECTOR_TYPE_CHECK (VECTOR_TYPE)->type_common.precision = exact_log2 (X))
> -
>  /* Nonzero in a VECTOR_TYPE if the frontends should not emit warnings
>     about missing conversions to other vector types of the same size.  */
>  #define TYPE_VECTOR_OPAQUE(NODE) \
> @@ -3671,6 +3662,64 @@ id_equal (const char *str, const_tree id
>    return !strcmp (str, IDENTIFIER_POINTER (id));
>  }
>
> +/* Return the number of elements in the VECTOR_TYPE given by NODE.  */
> +
> +inline poly_uint64
> +TYPE_VECTOR_SUBPARTS (const_tree node)
> +{
> +  STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 2);
> +  unsigned int precision = VECTOR_TYPE_CHECK (node)->type_common.precision;
> +  if (NUM_POLY_INT_COEFFS == 2)
> +    {
> +      poly_uint64 res = 0;
> +      res.coeffs[0] = 1 << (precision / 2);
> +      if (precision & 1)
> +       res.coeffs[1] = 1 << (precision / 2);
> +      return res;
> +    }
> +  else
> +    return 1 << precision;
> +}
> +
> +/* Set the number of elements in VECTOR_TYPE NODE to SUBPARTS, which must
> +   satisfy valid_vector_subparts_p.  */
> +
> +inline void
> +SET_TYPE_VECTOR_SUBPARTS (tree node, poly_uint64 subparts)
> +{
> +  STATIC_ASSERT (NUM_POLY_INT_COEFFS <= 2);
> +  unsigned HOST_WIDE_INT coeff0 = subparts.coeffs[0];
> +  int index = exact_log2 (coeff0);
> +  gcc_assert (index >= 0);
> +  if (NUM_POLY_INT_COEFFS == 2)
> +    {
> +      unsigned HOST_WIDE_INT coeff1 = subparts.coeffs[1];
> +      gcc_assert (coeff1 == 0 || coeff1 == coeff0);
> +      VECTOR_TYPE_CHECK (node)->type_common.precision
> +       = index * 2 + (coeff1 != 0);
> +    }
> +  else
> +    VECTOR_TYPE_CHECK (node)->type_common.precision = index;
> +}
> +
> +/* Return true if we can construct vector types with the given number
> +   of subparts.  */
> +
> +static inline bool
> +valid_vector_subparts_p (poly_uint64 subparts)
> +{
> +  unsigned HOST_WIDE_INT coeff0 = subparts.coeffs[0];
> +  if (!pow2p_hwi (coeff0))
> +    return false;
> +  if (NUM_POLY_INT_COEFFS == 2)
> +    {
> +      unsigned HOST_WIDE_INT coeff1 = subparts.coeffs[1];
> +      if (coeff1 != 0 && coeff1 != coeff0)
> +       return false;
> +    }
> +  return true;
> +}
> +
>  #define error_mark_node                        global_trees[TI_ERROR_MARK]
>
>  #define intQI_type_node                        global_trees[TI_INTQI_TYPE]
> @@ -4108,16 +4157,10 @@ extern tree build_pointer_type (tree);
>  extern tree build_reference_type_for_mode (tree, machine_mode, bool);
>  extern tree build_reference_type (tree);
>  extern tree build_vector_type_for_mode (tree, machine_mode);
> -extern tree build_vector_type (tree innertype, int nunits);
> -/* Temporary.  */
> -inline tree
> -build_vector_type (tree innertype, poly_uint64 nunits)
> -{
> -  return build_vector_type (innertype, (int) nunits.to_constant ());
> -}
> +extern tree build_vector_type (tree, poly_int64);
>  extern tree build_truth_vector_type (poly_uint64, poly_uint64);
>  extern tree build_same_sized_truth_vector_type (tree vectype);
> -extern tree build_opaque_vector_type (tree innertype, int nunits);
> +extern tree build_opaque_vector_type (tree, poly_int64);
>  extern tree build_index_type (tree);
>  extern tree build_array_type (tree, tree, bool = false);
>  extern tree build_nonshared_array_type (tree, tree);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2017-10-23 17:25:48.625491825 +0100
> +++ gcc/tree.c  2017-10-23 17:25:51.771378746 +0100
> @@ -1877,7 +1877,7 @@ make_vector (unsigned len MEM_STAT_DECL)
>  build_vector (tree type, vec<tree> vals MEM_STAT_DECL)
>  {
>    unsigned int nelts = vals.length ();
> -  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
> +  gcc_assert (must_eq (nelts, TYPE_VECTOR_SUBPARTS (type)));
>    int over = 0;
>    unsigned cnt = 0;
>    tree v = make_vector (nelts);
> @@ -1907,10 +1907,11 @@ build_vector (tree type, vec<tree> vals
>  tree
>  build_vector_from_ctor (tree type, vec<constructor_elt, va_gc> *v)
>  {
> -  unsigned int nelts = TYPE_VECTOR_SUBPARTS (type);
> -  unsigned HOST_WIDE_INT idx;
> +  unsigned HOST_WIDE_INT idx, nelts;
>    tree value;
>
> +  /* We can't construct a VECTOR_CST for a variable number of elements.  */
> +  nelts = TYPE_VECTOR_SUBPARTS (type).to_constant ();
>    auto_vec<tree, 32> vec (nelts);
>    FOR_EACH_CONSTRUCTOR_VALUE (v, idx, value)
>      {
> @@ -1928,9 +1929,9 @@ build_vector_from_ctor (tree type, vec<c
>
>  /* Build a vector of type VECTYPE where all the elements are SCs.  */
>  tree
> -build_vector_from_val (tree vectype, tree sc)
> +build_vector_from_val (tree vectype, tree sc)
>  {
> -  int i, nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  unsigned HOST_WIDE_INT i, nunits;
>
>    if (sc == error_mark_node)
>      return sc;
> @@ -1944,6 +1945,13 @@ build_vector_from_val (tree vectype, tre
>    gcc_checking_assert (types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (sc)),
>                                            TREE_TYPE (vectype)));
>
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> +    {
> +      if (CONSTANT_CLASS_P (sc))
> +       return build_vec_duplicate_cst (vectype, sc);
> +      return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc);
> +    }
> +
>    if (CONSTANT_CLASS_P (sc))
>      {
>        auto_vec<tree, 32> v (nunits);
> @@ -6575,11 +6583,8 @@ type_hash_canon_hash (tree type)
>        }
>
>      case VECTOR_TYPE:
> -      {
> -       unsigned nunits = TYPE_VECTOR_SUBPARTS (type);
> -       hstate.add_object (nunits);
> -       break;
> -      }
> +      hstate.add_poly_int (TYPE_VECTOR_SUBPARTS (type));
> +      break;
>
>      default:
>        break;
> @@ -6623,7 +6628,8 @@ type_cache_hasher::equal (type_hash *a,
>        return 1;
>
>      case VECTOR_TYPE:
> -      return TYPE_VECTOR_SUBPARTS (a->type) == TYPE_VECTOR_SUBPARTS (b->type);
> +      return must_eq (TYPE_VECTOR_SUBPARTS (a->type),
> +                     TYPE_VECTOR_SUBPARTS (b->type));
>
>      case ENUMERAL_TYPE:
>        if (TYPE_VALUES (a->type) != TYPE_VALUES (b->type)
> @@ -9666,7 +9672,7 @@ make_vector_type (tree innertype, poly_i
>
>    t = make_node (VECTOR_TYPE);
>    TREE_TYPE (t) = mv_innertype;
> -  SET_TYPE_VECTOR_SUBPARTS (t, nunits.to_constant ()); /* Temporary */
> +  SET_TYPE_VECTOR_SUBPARTS (t, nunits);
>    SET_TYPE_MODE (t, mode);
>
>    if (TYPE_STRUCTURAL_EQUALITY_P (mv_innertype) || in_lto_p)
> @@ -10582,7 +10588,7 @@ build_vector_type_for_mode (tree innerty
>     a power of two.  */
>
>  tree
> -build_vector_type (tree innertype, int nunits)
> +build_vector_type (tree innertype, poly_int64 nunits)
>  {
>    return make_vector_type (innertype, nunits, VOIDmode);
>  }
> @@ -10627,7 +10633,7 @@ build_same_sized_truth_vector_type (tree
>  /* Similarly, but builds a variant type with TYPE_VECTOR_OPAQUE set.  */
>
>  tree
> -build_opaque_vector_type (tree innertype, int nunits)
> +build_opaque_vector_type (tree innertype, poly_int64 nunits)
>  {
>    tree t = make_vector_type (innertype, nunits, VOIDmode);
>    tree cand;
> @@ -10730,7 +10736,7 @@ initializer_zerop (const_tree init)
>  uniform_vector_p (const_tree vec)
>  {
>    tree first, t;
> -  unsigned i;
> +  unsigned HOST_WIDE_INT i, nelts;
>
>    if (vec == NULL_TREE)
>      return NULL_TREE;
> @@ -10753,7 +10759,8 @@ uniform_vector_p (const_tree vec)
>        return first;
>      }
>
> -  else if (TREE_CODE (vec) == CONSTRUCTOR)
> +  else if (TREE_CODE (vec) == CONSTRUCTOR
> +          && TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec)).is_constant (&nelts))
>      {
>        first = error_mark_node;
>
> @@ -10767,7 +10774,7 @@ uniform_vector_p (const_tree vec)
>           if (!operand_equal_p (first, t, 0))
>             return NULL_TREE;
>          }
> -      if (i != TYPE_VECTOR_SUBPARTS (TREE_TYPE (vec)))
> +      if (i != nelts)
>         return NULL_TREE;
>
>        return first;
> @@ -13011,8 +13018,8 @@ vector_type_mode (const_tree t)
>        /* For integers, try mapping it to a same-sized scalar mode.  */
>        if (is_int_mode (TREE_TYPE (t)->type_common.mode, &innermode))
>         {
> -         unsigned int size = (TYPE_VECTOR_SUBPARTS (t)
> -                              * GET_MODE_BITSIZE (innermode));
> +         poly_int64 size = (TYPE_VECTOR_SUBPARTS (t)
> +                            * GET_MODE_BITSIZE (innermode));
>           scalar_int_mode mode;
>           if (int_mode_for_size (size, 0).exists (&mode)
>               && have_regs_of_mode[mode])
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2017-10-23 17:19:04.559212322 +0100
> +++ gcc/cfgexpand.c     2017-10-23 17:25:51.727380328 +0100
> @@ -4961,10 +4961,13 @@ expand_debug_expr (tree exp)
>        else if (TREE_CODE (TREE_TYPE (exp)) == VECTOR_TYPE)
>         {
>           unsigned i;
> +         unsigned HOST_WIDE_INT nelts;
>           tree val;
>
> -         op0 = gen_rtx_CONCATN
> -           (mode, rtvec_alloc (TYPE_VECTOR_SUBPARTS (TREE_TYPE (exp))));
> +         if (!TYPE_VECTOR_SUBPARTS (TREE_TYPE (exp)).is_constant (&nelts))
> +           goto flag_unsupported;
> +
> +         op0 = gen_rtx_CONCATN (mode, rtvec_alloc (nelts));
>
>           FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (exp), i, val)
>             {
> @@ -4974,7 +4977,7 @@ expand_debug_expr (tree exp)
>               XVECEXP (op0, 0, i) = op1;
>             }
>
> -         if (i < TYPE_VECTOR_SUBPARTS (TREE_TYPE (exp)))
> +         if (i < nelts)
>             {
>               op1 = expand_debug_expr
>                 (build_zero_cst (TREE_TYPE (TREE_TYPE (exp))));
> @@ -4982,7 +4985,7 @@ expand_debug_expr (tree exp)
>               if (!op1)
>                 return NULL;
>
> -             for (; i < TYPE_VECTOR_SUBPARTS (TREE_TYPE (exp)); i++)
> +             for (; i < nelts; i++)
>                 XVECEXP (op0, 0, i) = op1;
>             }
>
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2017-10-23 17:25:38.241865064 +0100
> +++ gcc/expr.c  2017-10-23 17:25:51.740379860 +0100
> @@ -5847,7 +5847,13 @@ count_type_elements (const_tree type, bo
>        return 2;
>
>      case VECTOR_TYPE:
> -      return TYPE_VECTOR_SUBPARTS (type);
> +      {
> +       unsigned HOST_WIDE_INT nelts;
> +       if (TYPE_VECTOR_SUBPARTS (type).is_constant (&nelts))
> +         return nelts;
> +       else
> +         return -1;
> +      }
>
>      case INTEGER_TYPE:
>      case REAL_TYPE:
> @@ -6594,7 +6600,8 @@ store_constructor (tree exp, rtx target,
>         HOST_WIDE_INT bitsize;
>         HOST_WIDE_INT bitpos;
>         rtvec vector = NULL;
> -       unsigned n_elts;
> +       poly_uint64 n_elts;
> +       unsigned HOST_WIDE_INT const_n_elts;
>         alias_set_type alias;
>         bool vec_vec_init_p = false;
>         machine_mode mode = GET_MODE (target);
> @@ -6619,7 +6626,9 @@ store_constructor (tree exp, rtx target,
>           }
>
>         n_elts = TYPE_VECTOR_SUBPARTS (type);
> -       if (REG_P (target) && VECTOR_MODE_P (mode))
> +       if (REG_P (target)
> +           && VECTOR_MODE_P (mode)
> +           && n_elts.is_constant (&const_n_elts))
>           {
>             machine_mode emode = eltmode;
>
> @@ -6628,14 +6637,15 @@ store_constructor (tree exp, rtx target,
>                     == VECTOR_TYPE))
>               {
>                 tree etype = TREE_TYPE (CONSTRUCTOR_ELT (exp, 0)->value);
> -               gcc_assert (CONSTRUCTOR_NELTS (exp) * TYPE_VECTOR_SUBPARTS (etype)
> -                           == n_elts);
> +               gcc_assert (must_eq (CONSTRUCTOR_NELTS (exp)
> +                                    * TYPE_VECTOR_SUBPARTS (etype),
> +                                    n_elts));
>                 emode = TYPE_MODE (etype);
>               }
>             icode = convert_optab_handler (vec_init_optab, mode, emode);
>             if (icode != CODE_FOR_nothing)
>               {
> -               unsigned int i, n = n_elts;
> +               unsigned int i, n = const_n_elts;
>
>                 if (emode != eltmode)
>                   {
> @@ -6674,7 +6684,8 @@ store_constructor (tree exp, rtx target,
>
>             /* Clear the entire vector first if there are any missing elements,
>                or if the incidence of zero elements is >= 75%.  */
> -           need_to_clear = (count < n_elts || 4 * zero_count >= 3 * count);
> +           need_to_clear = (may_lt (count, n_elts)
> +                            || 4 * zero_count >= 3 * count);
>           }
>
>         if (need_to_clear && may_gt (size, 0) && !vector)
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-10-23 17:22:48.984540760 +0100
> +++ gcc/fold-const.c    2017-10-23 17:25:51.744379717 +0100
> @@ -1645,7 +1645,7 @@ const_binop (enum tree_code code, tree t
>         in_nelts = VECTOR_CST_NELTS (arg1);
>         out_nelts = in_nelts * 2;
>         gcc_assert (in_nelts == VECTOR_CST_NELTS (arg2)
> -                   && out_nelts == TYPE_VECTOR_SUBPARTS (type));
> +                   && must_eq (out_nelts, TYPE_VECTOR_SUBPARTS (type)));
>
>         auto_vec<tree, 32> elts (out_nelts);
>         for (i = 0; i < out_nelts; i++)
> @@ -1677,7 +1677,7 @@ const_binop (enum tree_code code, tree t
>         in_nelts = VECTOR_CST_NELTS (arg1);
>         out_nelts = in_nelts / 2;
>         gcc_assert (in_nelts == VECTOR_CST_NELTS (arg2)
> -                   && out_nelts == TYPE_VECTOR_SUBPARTS (type));
> +                   && must_eq (out_nelts, TYPE_VECTOR_SUBPARTS (type)));
>
>         if (code == VEC_WIDEN_MULT_LO_EXPR)
>           scale = 0, ofs = BYTES_BIG_ENDIAN ? out_nelts : 0;
> @@ -1841,7 +1841,7 @@ const_unop (enum tree_code code, tree ty
>
>         in_nelts = VECTOR_CST_NELTS (arg0);
>         out_nelts = in_nelts / 2;
> -       gcc_assert (out_nelts == TYPE_VECTOR_SUBPARTS (type));
> +       gcc_assert (must_eq (out_nelts, TYPE_VECTOR_SUBPARTS (type)));
>
>         unsigned int offset = 0;
>         if ((!BYTES_BIG_ENDIAN) ^ (code == VEC_UNPACK_LO_EXPR
> @@ -2329,7 +2329,7 @@ fold_convert_const (enum tree_code code,
>    else if (TREE_CODE (type) == VECTOR_TYPE)
>      {
>        if (TREE_CODE (arg1) == VECTOR_CST
> -         && TYPE_VECTOR_SUBPARTS (type) == VECTOR_CST_NELTS (arg1))
> +         && must_eq (TYPE_VECTOR_SUBPARTS (type), VECTOR_CST_NELTS (arg1)))
>         {
>           int len = VECTOR_CST_NELTS (arg1);
>           tree elttype = TREE_TYPE (type);
> @@ -2345,8 +2345,8 @@ fold_convert_const (enum tree_code code,
>           return build_vector (type, v);
>         }
>        if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> -         && (TYPE_VECTOR_SUBPARTS (type)
> -             == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
> +         && must_eq (TYPE_VECTOR_SUBPARTS (type),
> +                     TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
>         {
>           tree sub = fold_convert_const (code, TREE_TYPE (type),
>                                          VEC_DUPLICATE_CST_ELT (arg1));
> @@ -3491,8 +3491,8 @@ #define OP_SAME_WITH_NULL(N)                              \
>              We only tested element precision and modes to match.
>              Vectors may be BLKmode and thus also check that the number of
>              parts match.  */
> -         if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))
> -             != TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)))
> +         if (may_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)),
> +                     TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
>             return 0;
>
>           vec<constructor_elt, va_gc> *v0 = CONSTRUCTOR_ELTS (arg0);
> @@ -7613,15 +7613,16 @@ native_interpret_complex (tree type, con
>     If the buffer cannot be interpreted, return NULL_TREE.  */
>
>  static tree
> -native_interpret_vector (tree type, const unsigned char *ptr, int len)
> +native_interpret_vector (tree type, const unsigned char *ptr, unsigned int len)
>  {
>    tree etype, elem;
> -  int i, size, count;
> +  unsigned int i, size;
> +  unsigned HOST_WIDE_INT count;
>
>    etype = TREE_TYPE (type);
>    size = GET_MODE_SIZE (SCALAR_TYPE_MODE (etype));
> -  count = TYPE_VECTOR_SUBPARTS (type);
> -  if (size * count > len)
> +  if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&count)
> +      || size * count > len)
>      return NULL_TREE;
>
>    auto_vec<tree, 32> elements (count);
> @@ -7707,7 +7708,8 @@ fold_view_convert_expr (tree type, tree
>    tree expr_type = TREE_TYPE (expr);
>    if (TREE_CODE (expr) == VEC_DUPLICATE_CST
>        && VECTOR_TYPE_P (type)
> -      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type)
> +      && must_eq (TYPE_VECTOR_SUBPARTS (type),
> +                 TYPE_VECTOR_SUBPARTS (expr_type))
>        && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type)))
>      {
>        tree sub = fold_view_convert_expr (TREE_TYPE (type),
> @@ -9025,9 +9027,9 @@ fold_vec_perm (tree type, tree arg0, tre
>    bool need_ctor = false;
>
>    unsigned int nelts = sel.length ();
> -  gcc_assert (TYPE_VECTOR_SUBPARTS (type) == nelts
> -             && TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)) == nelts
> -             && TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)) == nelts);
> +  gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (type), nelts)
> +             && must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)), nelts)
> +             && must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1)), nelts));
>    if (TREE_TYPE (TREE_TYPE (arg0)) != TREE_TYPE (type)
>        || TREE_TYPE (TREE_TYPE (arg1)) != TREE_TYPE (type))
>      return NULL_TREE;
> @@ -11440,7 +11442,7 @@ fold_ternary_loc (location_t loc, enum t
>                   || TREE_CODE (arg2) == CONSTRUCTOR))
>             {
>               unsigned int nelts = VECTOR_CST_NELTS (arg0), i;
> -             gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
> +             gcc_assert (must_eq (nelts, TYPE_VECTOR_SUBPARTS (type)));
>               auto_vec_perm_indices sel (nelts);
>               for (i = 0; i < nelts; i++)
>                 {
> @@ -11706,7 +11708,8 @@ fold_ternary_loc (location_t loc, enum t
>           if (n != 0
>               && (idx % width) == 0
>               && (n % width) == 0
> -             && ((idx + n) / width) <= TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))
> +             && must_le ((idx + n) / width,
> +                         TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0))))
>             {
>               idx = idx / width;
>               n = n / width;
> @@ -11783,7 +11786,7 @@ fold_ternary_loc (location_t loc, enum t
>
>           mask2 = 2 * nelts - 1;
>           mask = single_arg ? (nelts - 1) : mask2;
> -         gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
> +         gcc_assert (must_eq (nelts, TYPE_VECTOR_SUBPARTS (type)));
>           auto_vec_perm_indices sel (nelts);
>           auto_vec_perm_indices sel2 (nelts);
>           for (i = 0; i < nelts; i++)
> @@ -14034,7 +14037,7 @@ fold_relational_const (enum tree_code co
>         }
>        unsigned count = VECTOR_CST_NELTS (op0);
>        gcc_assert (VECTOR_CST_NELTS (op1) == count
> -                 && TYPE_VECTOR_SUBPARTS (type) == count);
> +                 && must_eq (TYPE_VECTOR_SUBPARTS (type), count));
>
>        auto_vec<tree, 32> elts (count);
>        for (unsigned i = 0; i < count; i++)
> Index: gcc/gimple-fold.c
> ===================================================================
> --- gcc/gimple-fold.c   2017-10-23 17:22:18.228825053 +0100
> +++ gcc/gimple-fold.c   2017-10-23 17:25:51.747379609 +0100
> @@ -5909,13 +5909,13 @@ gimple_fold_stmt_to_constant_1 (gimple *
>                 }
>               else if (TREE_CODE (rhs) == CONSTRUCTOR
>                        && TREE_CODE (TREE_TYPE (rhs)) == VECTOR_TYPE
> -                      && (CONSTRUCTOR_NELTS (rhs)
> -                          == TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs))))
> +                      && must_eq (CONSTRUCTOR_NELTS (rhs),
> +                                  TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs))))
>                 {
>                   unsigned i, nelts;
>                   tree val;
>
> -                 nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (rhs));
> +                 nelts = CONSTRUCTOR_NELTS (rhs);
>                   auto_vec<tree, 32> vec (nelts);
>                   FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (rhs), i, val)
>                     {
> @@ -6761,8 +6761,8 @@ gimple_fold_indirect_ref (tree t)
>              = tree_to_shwi (part_width) / BITS_PER_UNIT;
>            unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
>            tree index = bitsize_int (indexi);
> -          if (offset / part_widthi
> -             < TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype)))
> +         if (must_lt (offset / part_widthi,
> +                      TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype))))
>              return fold_build3 (BIT_FIELD_REF, type, TREE_OPERAND (addr, 0),
>                                  part_width, index);
>         }
> @@ -7064,6 +7064,10 @@ gimple_convert_to_ptrofftype (gimple_seq
>  gimple_build_vector_from_val (gimple_seq *seq, location_t loc, tree type,
>                               tree op)
>  {
> +  if (!TYPE_VECTOR_SUBPARTS (type).is_constant ()
> +      && !CONSTANT_CLASS_P (op))
> +    return gimple_build (seq, loc, VEC_DUPLICATE_EXPR, type, op);
> +
>    tree res, vec = build_vector_from_val (type, op);
>    if (is_gimple_val (vec))
>      return vec;
> @@ -7086,7 +7090,7 @@ gimple_build_vector (gimple_seq *seq, lo
>                      vec<tree> elts)
>  {
>    unsigned int nelts = elts.length ();
> -  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
> +  gcc_assert (must_eq (nelts, TYPE_VECTOR_SUBPARTS (type)));
>    for (unsigned int i = 0; i < nelts; ++i)
>      if (!TREE_CONSTANT (elts[i]))
>        {
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd        2017-10-23 17:22:50.031432167 +0100
> +++ gcc/match.pd        2017-10-23 17:25:51.750379501 +0100
> @@ -83,7 +83,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (nop_convert @0)
>   (view_convert @0)
>   (if (VECTOR_TYPE_P (type) && VECTOR_TYPE_P (TREE_TYPE (@0))
> -      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0))
> +      && must_eq (TYPE_VECTOR_SUBPARTS (type),
> +                 TYPE_VECTOR_SUBPARTS (TREE_TYPE (@0)))
>        && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE (@0))))))
>  /* This one has to be last, or it shadows the others.  */
>  (match (nop_convert @0)
> @@ -2628,7 +2629,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (simplify
>   (plus:c @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 integer_zerop@2)))
>   (if (VECTOR_TYPE_P (type)
> -      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1))
> +      && must_eq (TYPE_VECTOR_SUBPARTS (type),
> +                 TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))
>        && (TYPE_MODE (TREE_TYPE (type))
>            == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1)))))
>    (minus @3 (view_convert (vec_cond @0 (negate @1) @2)))))
> @@ -2637,7 +2639,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (simplify
>   (minus @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 integer_zerop@2)))
>   (if (VECTOR_TYPE_P (type)
> -      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1))
> +      && must_eq (TYPE_VECTOR_SUBPARTS (type),
> +                 TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))
>        && (TYPE_MODE (TREE_TYPE (type))
>            == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1)))))
>    (plus @3 (view_convert (vec_cond @0 (negate @1) @2)))))
> @@ -4301,7 +4304,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>     (if (n != 0
>         && (idx % width) == 0
>         && (n % width) == 0
> -       && ((idx + n) / width) <= TYPE_VECTOR_SUBPARTS (TREE_TYPE (ctor)))
> +       && must_le ((idx + n) / width,
> +                   TYPE_VECTOR_SUBPARTS (TREE_TYPE (ctor))))
>      (with
>       {
>         idx = idx / width;
> Index: gcc/omp-simd-clone.c
> ===================================================================
> --- gcc/omp-simd-clone.c        2017-10-23 17:22:47.947648317 +0100
> +++ gcc/omp-simd-clone.c        2017-10-23 17:25:51.751379465 +0100
> @@ -57,7 +57,7 @@ Software Foundation; either version 3, o
>  static unsigned HOST_WIDE_INT
>  simd_clone_subparts (tree vectype)
>  {
> -  return TYPE_VECTOR_SUBPARTS (vectype);
> +  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
>  }
>
>  /* Allocate a fresh `simd_clone' and return it.  NARGS is the number
> Index: gcc/print-tree.c
> ===================================================================
> --- gcc/print-tree.c    2017-10-23 17:11:40.246949037 +0100
> +++ gcc/print-tree.c    2017-10-23 17:25:51.751379465 +0100
> @@ -630,7 +630,10 @@ print_node (FILE *file, const char *pref
>        else if (code == ARRAY_TYPE)
>         print_node (file, "domain", TYPE_DOMAIN (node), indent + 4);
>        else if (code == VECTOR_TYPE)
> -       fprintf (file, " nunits:%d", (int) TYPE_VECTOR_SUBPARTS (node));
> +       {
> +         fprintf (file, " nunits:");
> +         print_dec (TYPE_VECTOR_SUBPARTS (node), file);
> +       }
>        else if (code == RECORD_TYPE
>                || code == UNION_TYPE
>                || code == QUAL_UNION_TYPE)
> Index: gcc/stor-layout.c
> ===================================================================
> --- gcc/stor-layout.c   2017-10-23 17:11:54.535862371 +0100
> +++ gcc/stor-layout.c   2017-10-23 17:25:51.753379393 +0100
> @@ -2267,11 +2267,9 @@ layout_type (tree type)
>
>      case VECTOR_TYPE:
>        {
> -       int nunits = TYPE_VECTOR_SUBPARTS (type);
> +       poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (type);
>         tree innertype = TREE_TYPE (type);
>
> -       gcc_assert (!(nunits & (nunits - 1)));
> -
>         /* Find an appropriate mode for the vector type.  */
>         if (TYPE_MODE (type) == VOIDmode)
>           SET_TYPE_MODE (type,
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c     2017-10-23 17:22:32.725227332 +0100
> +++ gcc/targhooks.c     2017-10-23 17:25:51.753379393 +0100
> @@ -683,7 +683,7 @@ default_builtin_vectorization_cost (enum
>          return 3;
>
>        case vec_construct:
> -       return TYPE_VECTOR_SUBPARTS (vectype) - 1;
> +       return estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype)) - 1;
>
>        default:
>          gcc_unreachable ();
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2017-10-23 17:20:50.883679845 +0100
> +++ gcc/tree-cfg.c      2017-10-23 17:25:51.756379285 +0100
> @@ -3640,7 +3640,8 @@ verify_gimple_comparison (tree type, tre
>            return true;
>          }
>
> -      if (TYPE_VECTOR_SUBPARTS (type) != TYPE_VECTOR_SUBPARTS (op0_type))
> +      if (may_ne (TYPE_VECTOR_SUBPARTS (type),
> +                 TYPE_VECTOR_SUBPARTS (op0_type)))
>          {
>            error ("invalid vector comparison resulting type");
>            debug_generic_expr (type);
> @@ -4070,8 +4071,8 @@ verify_gimple_assign_binary (gassign *st
>        if (VECTOR_BOOLEAN_TYPE_P (lhs_type)
>           && VECTOR_BOOLEAN_TYPE_P (rhs1_type)
>           && types_compatible_p (rhs1_type, rhs2_type)
> -         && (TYPE_VECTOR_SUBPARTS (lhs_type)
> -             == 2 * TYPE_VECTOR_SUBPARTS (rhs1_type)))
> +         && must_eq (TYPE_VECTOR_SUBPARTS (lhs_type),
> +                     2 * TYPE_VECTOR_SUBPARTS (rhs1_type)))
>         return false;
>
>        /* Fallthru.  */
> @@ -4221,8 +4222,8 @@ verify_gimple_assign_ternary (gassign *s
>
>      case VEC_COND_EXPR:
>        if (!VECTOR_BOOLEAN_TYPE_P (rhs1_type)
> -         || TYPE_VECTOR_SUBPARTS (rhs1_type)
> -            != TYPE_VECTOR_SUBPARTS (lhs_type))
> +         || may_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
> +                    TYPE_VECTOR_SUBPARTS (lhs_type)))
>         {
>           error ("the first argument of a VEC_COND_EXPR must be of a "
>                  "boolean vector type of the same number of elements "
> @@ -4268,11 +4269,12 @@ verify_gimple_assign_ternary (gassign *s
>           return true;
>         }
>
> -      if (TYPE_VECTOR_SUBPARTS (rhs1_type) != TYPE_VECTOR_SUBPARTS (rhs2_type)
> -         || TYPE_VECTOR_SUBPARTS (rhs2_type)
> -            != TYPE_VECTOR_SUBPARTS (rhs3_type)
> -         || TYPE_VECTOR_SUBPARTS (rhs3_type)
> -            != TYPE_VECTOR_SUBPARTS (lhs_type))
> +      if (may_ne (TYPE_VECTOR_SUBPARTS (rhs1_type),
> +                 TYPE_VECTOR_SUBPARTS (rhs2_type))
> +         || may_ne (TYPE_VECTOR_SUBPARTS (rhs2_type),
> +                    TYPE_VECTOR_SUBPARTS (rhs3_type))
> +         || may_ne (TYPE_VECTOR_SUBPARTS (rhs3_type),
> +                    TYPE_VECTOR_SUBPARTS (lhs_type)))
>         {
>           error ("vectors with different element number found "
>                  "in vector permute expression");
> @@ -4554,9 +4556,9 @@ verify_gimple_assign_single (gassign *st
>                           debug_generic_stmt (rhs1);
>                           return true;
>                         }
> -                     else if (CONSTRUCTOR_NELTS (rhs1)
> -                              * TYPE_VECTOR_SUBPARTS (elt_t)
> -                              != TYPE_VECTOR_SUBPARTS (rhs1_type))
> +                     else if (may_ne (CONSTRUCTOR_NELTS (rhs1)
> +                                      * TYPE_VECTOR_SUBPARTS (elt_t),
> +                                      TYPE_VECTOR_SUBPARTS (rhs1_type)))
>                         {
>                           error ("incorrect number of vector CONSTRUCTOR"
>                                  " elements");
> @@ -4571,8 +4573,8 @@ verify_gimple_assign_single (gassign *st
>                       debug_generic_stmt (rhs1);
>                       return true;
>                     }
> -                 else if (CONSTRUCTOR_NELTS (rhs1)
> -                          > TYPE_VECTOR_SUBPARTS (rhs1_type))
> +                 else if (may_gt (CONSTRUCTOR_NELTS (rhs1),
> +                                  TYPE_VECTOR_SUBPARTS (rhs1_type)))
>                     {
>                       error ("incorrect number of vector CONSTRUCTOR elements");
>                       debug_generic_stmt (rhs1);
> Index: gcc/tree-ssa-forwprop.c
> ===================================================================
> --- gcc/tree-ssa-forwprop.c     2017-10-23 17:20:50.883679845 +0100
> +++ gcc/tree-ssa-forwprop.c     2017-10-23 17:25:51.756379285 +0100
> @@ -1948,7 +1948,8 @@ simplify_vector_constructor (gimple_stmt
>    gimple *stmt = gsi_stmt (*gsi);
>    gimple *def_stmt;
>    tree op, op2, orig, type, elem_type;
> -  unsigned elem_size, nelts, i;
> +  unsigned elem_size, i;
> +  unsigned HOST_WIDE_INT nelts;
>    enum tree_code code, conv_code;
>    constructor_elt *elt;
>    bool maybe_ident;
> @@ -1959,7 +1960,8 @@ simplify_vector_constructor (gimple_stmt
>    type = TREE_TYPE (op);
>    gcc_checking_assert (TREE_CODE (type) == VECTOR_TYPE);
>
> -  nelts = TYPE_VECTOR_SUBPARTS (type);
> +  if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&nelts))
> +    return false;
>    elem_type = TREE_TYPE (type);
>    elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>
> @@ -2031,8 +2033,8 @@ simplify_vector_constructor (gimple_stmt
>      return false;
>
>    if (! VECTOR_TYPE_P (TREE_TYPE (orig))
> -      || (TYPE_VECTOR_SUBPARTS (type)
> -         != TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig))))
> +      || may_ne (TYPE_VECTOR_SUBPARTS (type),
> +                TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig))))
>      return false;
>
>    tree tem;
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2017-10-23 17:25:50.361429427 +0100
> +++ gcc/tree-vect-data-refs.c   2017-10-23 17:25:51.758379213 +0100
> @@ -4743,7 +4743,7 @@ vect_permute_store_chain (vec<tree> dr_c
>    if (length == 3)
>      {
>        /* vect_grouped_store_supported ensures that this is constant.  */
> -      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype);
> +      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
>        unsigned int j0 = 0, j1 = 0, j2 = 0;
>
>        auto_vec_perm_indices sel (nelt);
> @@ -4807,7 +4807,7 @@ vect_permute_store_chain (vec<tree> dr_c
>        gcc_assert (pow2p_hwi (length));
>
>        /* vect_grouped_store_supported ensures that this is constant.  */
> -      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype);
> +      unsigned int nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
>        auto_vec_perm_indices sel (nelt);
>        sel.quick_grow (nelt);
>        for (i = 0, n = nelt / 2; i < n; i++)
> @@ -5140,7 +5140,7 @@ vect_grouped_load_supported (tree vectyp
>       that leaves unused vector loads around punt - we at least create
>       very sub-optimal code in that case (and blow up memory,
>       see PR65518).  */
> -  if (single_element_p && count > TYPE_VECTOR_SUBPARTS (vectype))
> +  if (single_element_p && may_gt (count, TYPE_VECTOR_SUBPARTS (vectype)))
>      {
>        if (dump_enabled_p ())
>         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -5333,7 +5333,7 @@ vect_permute_load_chain (vec<tree> dr_ch
>    if (length == 3)
>      {
>        /* vect_grouped_load_supported ensures that this is constant.  */
> -      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
> +      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
>        unsigned int k;
>
>        auto_vec_perm_indices sel (nelt);
> @@ -5384,7 +5384,7 @@ vect_permute_load_chain (vec<tree> dr_ch
>        gcc_assert (pow2p_hwi (length));
>
>        /* vect_grouped_load_supported ensures that this is constant.  */
> -      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
> +      unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
>        auto_vec_perm_indices sel (nelt);
>        sel.quick_grow (nelt);
>        for (i = 0; i < nelt; ++i)
> @@ -5525,12 +5525,12 @@ vect_shift_permute_load_chain (vec<tree>
>
>    tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (stmt));
>    unsigned int i;
> -  unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>
> -  unsigned HOST_WIDE_INT vf;
> -  if (!LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&vf))
> +  unsigned HOST_WIDE_INT nelt, vf;
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nelt)
> +      || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&vf))
>      /* Not supported for variable-length vectors.  */
>      return false;
>
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-10-23 17:25:48.623491897 +0100
> +++ gcc/tree-vect-generic.c     2017-10-23 17:25:51.759379177 +0100
> @@ -48,7 +48,7 @@ static void expand_vector_operations_1 (
>  static unsigned int
>  nunits_for_known_piecewise_op (const_tree type)
>  {
> -  return TYPE_VECTOR_SUBPARTS (type);
> +  return TYPE_VECTOR_SUBPARTS (type).to_constant ();
>  }
>
>  /* Return true if TYPE1 has more elements than TYPE2, where either
> @@ -916,9 +916,9 @@ expand_vector_condition (gimple_stmt_ite
>       Similarly for vbfld_10 instead of x_2 < y_3.  */
>    if (VECTOR_BOOLEAN_TYPE_P (type)
>        && SCALAR_INT_MODE_P (TYPE_MODE (type))
> -      && (GET_MODE_BITSIZE (TYPE_MODE (type))
> -         < (TYPE_VECTOR_SUBPARTS (type)
> -            * GET_MODE_BITSIZE (TYPE_MODE (TREE_TYPE (type)))))
> +      && must_lt (GET_MODE_BITSIZE (TYPE_MODE (type)),
> +                 TYPE_VECTOR_SUBPARTS (type)
> +                 * GET_MODE_BITSIZE (SCALAR_TYPE_MODE (TREE_TYPE (type))))
>        && (a_is_comparison
>           ? useless_type_conversion_p (type, TREE_TYPE (a))
>           : expand_vec_cmp_expr_p (TREE_TYPE (a1), type, TREE_CODE (a))))
> @@ -1083,14 +1083,17 @@ optimize_vector_constructor (gimple_stmt
>    tree lhs = gimple_assign_lhs (stmt);
>    tree rhs = gimple_assign_rhs1 (stmt);
>    tree type = TREE_TYPE (rhs);
> -  unsigned int i, j, nelts = TYPE_VECTOR_SUBPARTS (type);
> +  unsigned int i, j;
> +  unsigned HOST_WIDE_INT nelts;
>    bool all_same = true;
>    constructor_elt *elt;
>    gimple *g;
>    tree base = NULL_TREE;
>    optab op;
>
> -  if (nelts <= 2 || CONSTRUCTOR_NELTS (rhs) != nelts)
> +  if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&nelts)
> +      || nelts <= 2
> +      || CONSTRUCTOR_NELTS (rhs) != nelts)
>      return;
>    op = optab_for_tree_code (PLUS_EXPR, type, optab_default);
>    if (op == unknown_optab
> @@ -1302,7 +1305,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
>    tree mask_type = TREE_TYPE (mask);
>    tree vect_elt_type = TREE_TYPE (vect_type);
>    tree mask_elt_type = TREE_TYPE (mask_type);
> -  unsigned int elements = TYPE_VECTOR_SUBPARTS (vect_type);
> +  unsigned HOST_WIDE_INT elements;
>    vec<constructor_elt, va_gc> *v;
>    tree constr, t, si, i_val;
>    tree vec0tmp = NULL_TREE, vec1tmp = NULL_TREE, masktmp = NULL_TREE;
> @@ -1310,6 +1313,9 @@ lower_vec_perm (gimple_stmt_iterator *gs
>    location_t loc = gimple_location (gsi_stmt (*gsi));
>    unsigned i;
>
> +  if (!TYPE_VECTOR_SUBPARTS (vect_type).is_constant (&elements))
> +    return;
> +
>    if (TREE_CODE (mask) == SSA_NAME)
>      {
>        gimple *def_stmt = SSA_NAME_DEF_STMT (mask);
> @@ -1467,7 +1473,7 @@ get_compute_type (enum tree_code code, o
>         = type_for_widest_vector_mode (TREE_TYPE (type), op);
>        if (vector_compute_type != NULL_TREE
>           && subparts_gt (compute_type, vector_compute_type)
> -         && TYPE_VECTOR_SUBPARTS (vector_compute_type) > 1
> +         && may_ne (TYPE_VECTOR_SUBPARTS (vector_compute_type), 1U)
>           && (optab_handler (op, TYPE_MODE (vector_compute_type))
>               != CODE_FOR_nothing))
>         compute_type = vector_compute_type;
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c        2017-10-23 17:25:48.624491861 +0100
> +++ gcc/tree-vect-loop.c        2017-10-23 17:25:51.761379105 +0100
> @@ -255,9 +255,11 @@ vect_determine_vectorization_factor (loo
>                 }
>
>               if (dump_enabled_p ())
> -               dump_printf_loc (MSG_NOTE, vect_location,
> -                                "nunits = " HOST_WIDE_INT_PRINT_DEC "\n",
> -                                 TYPE_VECTOR_SUBPARTS (vectype));
> +               {
> +                 dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");
> +                 dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (vectype));
> +                 dump_printf (MSG_NOTE, "\n");
> +               }
>
>               vect_update_max_nunits (&vectorization_factor, vectype);
>             }
> @@ -548,9 +550,11 @@ vect_determine_vectorization_factor (loo
>             }
>
>           if (dump_enabled_p ())
> -           dump_printf_loc (MSG_NOTE, vect_location,
> -                            "nunits = " HOST_WIDE_INT_PRINT_DEC "\n",
> -                            TYPE_VECTOR_SUBPARTS (vf_vectype));
> +           {
> +             dump_printf_loc (MSG_NOTE, vect_location, "nunits = ");
> +             dump_dec (MSG_NOTE, TYPE_VECTOR_SUBPARTS (vf_vectype));
> +             dump_printf (MSG_NOTE, "\n");
> +           }
>
>           vect_update_max_nunits (&vectorization_factor, vf_vectype);
>
> @@ -632,8 +636,8 @@ vect_determine_vectorization_factor (loo
>
>               if (!mask_type)
>                 mask_type = vectype;
> -             else if (TYPE_VECTOR_SUBPARTS (mask_type)
> -                      != TYPE_VECTOR_SUBPARTS (vectype))
> +             else if (may_ne (TYPE_VECTOR_SUBPARTS (mask_type),
> +                              TYPE_VECTOR_SUBPARTS (vectype)))
>                 {
>                   if (dump_enabled_p ())
>                     {
> @@ -4152,7 +4156,7 @@ get_initial_defs_for_reduction (slp_tree
>    scalar_type = TREE_TYPE (vector_type);
>    /* vectorizable_reduction has already rejected SLP reductions on
>       variable-length vectors.  */
> -  nunits = TYPE_VECTOR_SUBPARTS (vector_type);
> +  nunits = TYPE_VECTOR_SUBPARTS (vector_type).to_constant ();
>
>    gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def);
>
> @@ -7672,9 +7676,8 @@ vect_transform_loop (loop_vec_info loop_
>
>           if (STMT_VINFO_VECTYPE (stmt_info))
>             {
> -             unsigned int nunits
> -               = (unsigned int)
> -                 TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
> +             poly_uint64 nunits
> +               = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (stmt_info));
>               if (!STMT_SLP_TYPE (stmt_info)
>                   && may_ne (nunits, vf)
>                   && dump_enabled_p ())
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> --- gcc/tree-vect-patterns.c    2017-10-10 17:55:22.109175458 +0100
> +++ gcc/tree-vect-patterns.c    2017-10-23 17:25:51.763379034 +0100
> @@ -3714,8 +3714,9 @@ vect_recog_bool_pattern (vec<gimple *> *
>           vectorized matches the vector type of the result in
>          size and number of elements.  */
>        unsigned prec
> -       = wi::udiv_trunc (wi::to_wide (TYPE_SIZE (vectype)),
> -                         TYPE_VECTOR_SUBPARTS (vectype)).to_uhwi ();
> +       = vector_element_size (tree_to_poly_uint64 (TYPE_SIZE (vectype)),
> +                              TYPE_VECTOR_SUBPARTS (vectype));
> +
>        tree type
>         = build_nonstandard_integer_type (prec,
>                                           TYPE_UNSIGNED (TREE_TYPE (var)));
> @@ -3898,7 +3899,8 @@ vect_recog_mask_conversion_pattern (vec<
>        vectype2 = get_mask_type_for_scalar_type (rhs1_type);
>
>        if (!vectype1 || !vectype2
> -         || TYPE_VECTOR_SUBPARTS (vectype1) == TYPE_VECTOR_SUBPARTS (vectype2))
> +         || must_eq (TYPE_VECTOR_SUBPARTS (vectype1),
> +                     TYPE_VECTOR_SUBPARTS (vectype2)))
>         return NULL;
>
>        tmp = build_mask_conversion (rhs1, vectype1, stmt_vinfo, vinfo);
> @@ -3973,7 +3975,8 @@ vect_recog_mask_conversion_pattern (vec<
>        vectype2 = get_mask_type_for_scalar_type (rhs1_type);
>
>        if (!vectype1 || !vectype2
> -         || TYPE_VECTOR_SUBPARTS (vectype1) == TYPE_VECTOR_SUBPARTS (vectype2))
> +         || must_eq (TYPE_VECTOR_SUBPARTS (vectype1),
> +                     TYPE_VECTOR_SUBPARTS (vectype2)))
>         return NULL;
>
>        /* If rhs1 is a comparison we need to move it into a
> Index: gcc/tree-vect-slp.c
> ===================================================================
> --- gcc/tree-vect-slp.c 2017-10-23 17:22:43.865071801 +0100
> +++ gcc/tree-vect-slp.c 2017-10-23 17:25:51.764378998 +0100
> @@ -1621,15 +1621,16 @@ vect_supported_load_permutation_p (slp_i
>               stmt_vec_info group_info
>                 = vinfo_for_stmt (SLP_TREE_SCALAR_STMTS (node)[0]);
>               group_info = vinfo_for_stmt (GROUP_FIRST_ELEMENT (group_info));
> -             unsigned nunits
> -               = TYPE_VECTOR_SUBPARTS (STMT_VINFO_VECTYPE (group_info));
> +             unsigned HOST_WIDE_INT nunits;
>               unsigned k, maxk = 0;
>               FOR_EACH_VEC_ELT (SLP_TREE_LOAD_PERMUTATION (node), j, k)
>                 if (k > maxk)
>                   maxk = k;
>               /* In BB vectorization we may not actually use a loaded vector
>                  accessing elements in excess of GROUP_SIZE.  */
> -             if (maxk >= (GROUP_SIZE (group_info) & ~(nunits - 1)))
> +             tree vectype = STMT_VINFO_VECTYPE (group_info);
> +             if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits)
> +                 || maxk >= (GROUP_SIZE (group_info) & ~(nunits - 1)))
>                 {
>                   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>                                    "BB vectorization with gaps at the end of "
> @@ -3243,7 +3244,7 @@ vect_get_constant_vectors (tree op, slp_
>    else
>      vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
>    /* Enforced by vect_get_and_check_slp_defs.  */
> -  nunits = TYPE_VECTOR_SUBPARTS (vector_type);
> +  nunits = TYPE_VECTOR_SUBPARTS (vector_type).to_constant ();
>
>    if (STMT_VINFO_DATA_REF (stmt_vinfo))
>      {
> @@ -3600,12 +3601,12 @@ vect_transform_slp_perm_load (slp_tree n
>    gimple *stmt = SLP_TREE_SCALAR_STMTS (node)[0];
>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>    tree mask_element_type = NULL_TREE, mask_type;
> -  int nunits, vec_index = 0;
> +  int vec_index = 0;
>    tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>    int group_size = SLP_INSTANCE_GROUP_SIZE (slp_node_instance);
> -  int mask_element;
> +  unsigned int mask_element;
>    machine_mode mode;
> -  unsigned HOST_WIDE_INT const_vf;
> +  unsigned HOST_WIDE_INT nunits, const_vf;
>
>    if (!STMT_VINFO_GROUPED_ACCESS (stmt_info))
>      return false;
> @@ -3615,8 +3616,10 @@ vect_transform_slp_perm_load (slp_tree n
>    mode = TYPE_MODE (vectype);
>
>    /* At the moment, all permutations are represented using per-element
> -     indices, so we can't cope with variable vectorization factors.  */
> -  if (!vf.is_constant (&const_vf))
> +     indices, so we can't cope with variable vector lengths or
> +     vectorization factors.  */
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits)
> +      || !vf.is_constant (&const_vf))
>      return false;
>
>    /* The generic VEC_PERM_EXPR code always uses an integral type of the
> @@ -3624,7 +3627,6 @@ vect_transform_slp_perm_load (slp_tree n
>    mask_element_type = lang_hooks.types.type_for_mode
>      (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
>    mask_type = get_vectype_for_scalar_type (mask_element_type);
> -  nunits = TYPE_VECTOR_SUBPARTS (vectype);
>    auto_vec_perm_indices mask (nunits);
>    mask.quick_grow (nunits);
>
> @@ -3654,7 +3656,7 @@ vect_transform_slp_perm_load (slp_tree n
>       {c2,a3,b3,c3}.  */
>
>    int vect_stmts_counter = 0;
> -  int index = 0;
> +  unsigned int index = 0;
>    int first_vec_index = -1;
>    int second_vec_index = -1;
>    bool noop_p = true;
> @@ -3664,8 +3666,8 @@ vect_transform_slp_perm_load (slp_tree n
>      {
>        for (int k = 0; k < group_size; k++)
>         {
> -         int i = (SLP_TREE_LOAD_PERMUTATION (node)[k]
> -                  + j * STMT_VINFO_GROUP_SIZE (stmt_info));
> +         unsigned int i = (SLP_TREE_LOAD_PERMUTATION (node)[k]
> +                           + j * STMT_VINFO_GROUP_SIZE (stmt_info));
>           vec_index = i / nunits;
>           mask_element = i % nunits;
>           if (vec_index == first_vec_index
> @@ -3693,8 +3695,7 @@ vect_transform_slp_perm_load (slp_tree n
>               return false;
>             }
>
> -         gcc_assert (mask_element >= 0
> -                     && mask_element < 2 * nunits);
> +         gcc_assert (mask_element < 2 * nunits);
>           if (mask_element != index)
>             noop_p = false;
>           mask[index++] = mask_element;
> @@ -3727,7 +3728,7 @@ vect_transform_slp_perm_load (slp_tree n
>                   if (! noop_p)
>                     {
>                       auto_vec<tree, 32> mask_elts (nunits);
> -                     for (int l = 0; l < nunits; ++l)
> +                     for (unsigned int l = 0; l < nunits; ++l)
>                         mask_elts.quick_push (build_int_cst (mask_element_type,
>                                                              mask[l]));
>                       mask_vec = build_vector (mask_type, mask_elts);
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> --- gcc/tree-vect-stmts.c       2017-10-23 17:22:41.879277786 +0100
> +++ gcc/tree-vect-stmts.c       2017-10-23 17:25:51.767378890 +0100
> @@ -1713,9 +1713,10 @@ compare_step_with_zero (gimple *stmt)
>  static tree
>  perm_mask_for_reverse (tree vectype)
>  {
> -  int i, nunits;
> +  unsigned HOST_WIDE_INT i, nunits;
>
> -  nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> +    return NULL_TREE;
>
>    auto_vec_perm_indices sel (nunits);
>    for (i = 0; i < nunits; ++i)
> @@ -1750,7 +1751,7 @@ get_group_load_store_type (gimple *stmt,
>    bool single_element_p = (stmt == first_stmt
>                            && !GROUP_NEXT_ELEMENT (stmt_info));
>    unsigned HOST_WIDE_INT gap = GROUP_GAP (vinfo_for_stmt (first_stmt));
> -  unsigned nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
>
>    /* True if the vectorized statements would access beyond the last
>       statement in the group.  */
> @@ -1774,7 +1775,7 @@ get_group_load_store_type (gimple *stmt,
>           /* Try to use consecutive accesses of GROUP_SIZE elements,
>              separated by the stride, until we have a complete vector.
>              Fall back to scalar accesses if that isn't possible.  */
> -         if (nunits % group_size == 0)
> +         if (multiple_p (nunits, group_size))
>             *memory_access_type = VMAT_STRIDED_SLP;
>           else
>             *memory_access_type = VMAT_ELEMENTWISE;
> @@ -2102,7 +2103,8 @@ vectorizable_mask_load_store (gimple *st
>      mask_vectype = get_mask_type_for_scalar_type (TREE_TYPE (vectype));
>
>    if (!mask_vectype || !VECTOR_BOOLEAN_TYPE_P (mask_vectype)
> -      || TYPE_VECTOR_SUBPARTS (mask_vectype) != TYPE_VECTOR_SUBPARTS (vectype))
> +      || may_ne (TYPE_VECTOR_SUBPARTS (mask_vectype),
> +                TYPE_VECTOR_SUBPARTS (vectype)))
>      return false;
>
>    if (gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
> @@ -2255,8 +2257,8 @@ vectorizable_mask_load_store (gimple *st
>
>           if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
>             {
> -             gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op))
> -                         == TYPE_VECTOR_SUBPARTS (idxtype));
> +             gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
> +                                  TYPE_VECTOR_SUBPARTS (idxtype)));
>               var = vect_get_new_ssa_name (idxtype, vect_simple_var);
>               op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
>               new_stmt
> @@ -2281,8 +2283,9 @@ vectorizable_mask_load_store (gimple *st
>               mask_op = vec_mask;
>               if (!useless_type_conversion_p (masktype, TREE_TYPE (vec_mask)))
>                 {
> -                 gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op))
> -                             == TYPE_VECTOR_SUBPARTS (masktype));
> +                 gcc_assert
> +                   (must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask_op)),
> +                             TYPE_VECTOR_SUBPARTS (masktype)));
>                   var = vect_get_new_ssa_name (masktype, vect_simple_var);
>                   mask_op = build1 (VIEW_CONVERT_EXPR, masktype, mask_op);
>                   new_stmt
> @@ -2298,8 +2301,8 @@ vectorizable_mask_load_store (gimple *st
>
>           if (!useless_type_conversion_p (vectype, rettype))
>             {
> -             gcc_assert (TYPE_VECTOR_SUBPARTS (vectype)
> -                         == TYPE_VECTOR_SUBPARTS (rettype));
> +             gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (vectype),
> +                                  TYPE_VECTOR_SUBPARTS (rettype)));
>               op = vect_get_new_ssa_name (rettype, vect_simple_var);
>               gimple_call_set_lhs (new_stmt, op);
>               vect_finish_stmt_generation (stmt, new_stmt, gsi);
> @@ -2493,11 +2496,14 @@ vectorizable_bswap (gimple *stmt, gimple
>    tree op, vectype;
>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> -  unsigned ncopies, nunits;
> +  unsigned ncopies;
> +  unsigned HOST_WIDE_INT nunits, num_bytes;
>
>    op = gimple_call_arg (stmt, 0);
>    vectype = STMT_VINFO_VECTYPE (stmt_info);
> -  nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +
> +  if (!TYPE_VECTOR_SUBPARTS (vectype).is_constant (&nunits))
> +    return false;
>
>    /* Multiple types in SLP are handled by creating the appropriate number of
>       vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
> @@ -2513,7 +2519,9 @@ vectorizable_bswap (gimple *stmt, gimple
>    if (! char_vectype)
>      return false;
>
> -  unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
> +  if (!TYPE_VECTOR_SUBPARTS (char_vectype).is_constant (&num_bytes))
> +    return false;
> +
>    unsigned word_bytes = num_bytes / nunits;
>
>    auto_vec_perm_indices elts (num_bytes);
> @@ -3213,7 +3221,7 @@ vect_simd_lane_linear (tree op, struct l
>  static unsigned HOST_WIDE_INT
>  simd_clone_subparts (tree vectype)
>  {
> -  return TYPE_VECTOR_SUBPARTS (vectype);
> +  return TYPE_VECTOR_SUBPARTS (vectype).to_constant ();
>  }
>
>  /* Function vectorizable_simd_clone_call.
> @@ -4732,7 +4740,7 @@ vectorizable_assignment (gimple *stmt, g
>      op = TREE_OPERAND (op, 0);
>
>    tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> -  unsigned int nunits = TYPE_VECTOR_SUBPARTS (vectype);
> +  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vectype);
>
>    /* Multiple types in SLP are handled by creating the appropriate number of
>       vectorized stmts for each SLP node.  Hence, NCOPIES is always 1 in
> @@ -4757,7 +4765,7 @@ vectorizable_assignment (gimple *stmt, g
>    if ((CONVERT_EXPR_CODE_P (code)
>         || code == VIEW_CONVERT_EXPR)
>        && (!vectype_in
> -         || TYPE_VECTOR_SUBPARTS (vectype_in) != nunits
> +         || may_ne (TYPE_VECTOR_SUBPARTS (vectype_in), nunits)
>           || (GET_MODE_SIZE (TYPE_MODE (vectype))
>               != GET_MODE_SIZE (TYPE_MODE (vectype_in)))))
>      return false;
> @@ -4906,8 +4914,8 @@ vectorizable_shift (gimple *stmt, gimple
>    int ndts = 2;
>    gimple *new_stmt = NULL;
>    stmt_vec_info prev_stmt_info;
> -  int nunits_in;
> -  int nunits_out;
> +  poly_uint64 nunits_in;
> +  poly_uint64 nunits_out;
>    tree vectype_out;
>    tree op1_vectype;
>    int ncopies;
> @@ -4974,7 +4982,7 @@ vectorizable_shift (gimple *stmt, gimple
>
>    nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>    nunits_in = TYPE_VECTOR_SUBPARTS (vectype);
> -  if (nunits_out != nunits_in)
> +  if (may_ne (nunits_out, nunits_in))
>      return false;
>
>    op1 = gimple_assign_rhs2 (stmt);
> @@ -5274,8 +5282,8 @@ vectorizable_operation (gimple *stmt, gi
>    int ndts = 3;
>    gimple *new_stmt = NULL;
>    stmt_vec_info prev_stmt_info;
> -  int nunits_in;
> -  int nunits_out;
> +  poly_uint64 nunits_in;
> +  poly_uint64 nunits_out;
>    tree vectype_out;
>    int ncopies;
>    int j, i;
> @@ -5385,7 +5393,7 @@ vectorizable_operation (gimple *stmt, gi
>
>    nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
>    nunits_in = TYPE_VECTOR_SUBPARTS (vectype);
> -  if (nunits_out != nunits_in)
> +  if (may_ne (nunits_out, nunits_in))
>      return false;
>
>    if (op_type == binary_op || op_type == ternary_op)
> @@ -5937,8 +5945,8 @@ vectorizable_store (gimple *stmt, gimple
>
>           if (!useless_type_conversion_p (srctype, TREE_TYPE (src)))
>             {
> -             gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src))
> -                         == TYPE_VECTOR_SUBPARTS (srctype));
> +             gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (src)),
> +                                  TYPE_VECTOR_SUBPARTS (srctype)));
>               var = vect_get_new_ssa_name (srctype, vect_simple_var);
>               src = build1 (VIEW_CONVERT_EXPR, srctype, src);
>               new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, src);
> @@ -5948,8 +5956,8 @@ vectorizable_store (gimple *stmt, gimple
>
>           if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
>             {
> -             gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op))
> -                         == TYPE_VECTOR_SUBPARTS (idxtype));
> +             gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
> +                                  TYPE_VECTOR_SUBPARTS (idxtype)));
>               var = vect_get_new_ssa_name (idxtype, vect_simple_var);
>               op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
>               new_stmt = gimple_build_assign (var, VIEW_CONVERT_EXPR, op);
> @@ -6554,7 +6562,7 @@ vect_gen_perm_mask_any (tree vectype, ve
>    tree mask_elt_type, mask_type, mask_vec;
>
>    unsigned int nunits = sel.length ();
> -  gcc_checking_assert (nunits == TYPE_VECTOR_SUBPARTS (vectype));
> +  gcc_checking_assert (must_eq (nunits, TYPE_VECTOR_SUBPARTS (vectype)));
>
>    mask_elt_type = lang_hooks.types.type_for_mode
>      (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
> @@ -6993,8 +7001,8 @@ vectorizable_load (gimple *stmt, gimple_
>
>           if (!useless_type_conversion_p (idxtype, TREE_TYPE (op)))
>             {
> -             gcc_assert (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op))
> -                         == TYPE_VECTOR_SUBPARTS (idxtype));
> +             gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (TREE_TYPE (op)),
> +                                  TYPE_VECTOR_SUBPARTS (idxtype)));
>               var = vect_get_new_ssa_name (idxtype, vect_simple_var);
>               op = build1 (VIEW_CONVERT_EXPR, idxtype, op);
>               new_stmt
> @@ -7008,8 +7016,8 @@ vectorizable_load (gimple *stmt, gimple_
>
>           if (!useless_type_conversion_p (vectype, rettype))
>             {
> -             gcc_assert (TYPE_VECTOR_SUBPARTS (vectype)
> -                         == TYPE_VECTOR_SUBPARTS (rettype));
> +             gcc_assert (must_eq (TYPE_VECTOR_SUBPARTS (vectype),
> +                                  TYPE_VECTOR_SUBPARTS (rettype)));
>               op = vect_get_new_ssa_name (rettype, vect_simple_var);
>               gimple_call_set_lhs (new_stmt, op);
>               vect_finish_stmt_generation (stmt, new_stmt, gsi);
> @@ -7905,7 +7913,8 @@ vect_is_simple_cond (tree cond, vec_info
>      return false;
>
>    if (vectype1 && vectype2
> -      && TYPE_VECTOR_SUBPARTS (vectype1) != TYPE_VECTOR_SUBPARTS (vectype2))
> +      && may_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> +                TYPE_VECTOR_SUBPARTS (vectype2)))
>      return false;
>
>    *comp_vectype = vectype1 ? vectype1 : vectype2;
> @@ -8308,7 +8317,7 @@ vectorizable_comparison (gimple *stmt, g
>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    enum vect_def_type dts[2] = {vect_unknown_def_type, vect_unknown_def_type};
>    int ndts = 2;
> -  unsigned nunits;
> +  poly_uint64 nunits;
>    int ncopies;
>    enum tree_code code, bitop1 = NOP_EXPR, bitop2 = NOP_EXPR;
>    stmt_vec_info prev_stmt_info = NULL;
> @@ -8368,7 +8377,8 @@ vectorizable_comparison (gimple *stmt, g
>      return false;
>
>    if (vectype1 && vectype2
> -      && TYPE_VECTOR_SUBPARTS (vectype1) != TYPE_VECTOR_SUBPARTS (vectype2))
> +      && may_ne (TYPE_VECTOR_SUBPARTS (vectype1),
> +                TYPE_VECTOR_SUBPARTS (vectype2)))
>      return false;
>
>    vectype = vectype1 ? vectype1 : vectype2;
> @@ -8377,10 +8387,10 @@ vectorizable_comparison (gimple *stmt, g
>    if (!vectype)
>      {
>        vectype = get_vectype_for_scalar_type (TREE_TYPE (rhs1));
> -      if (TYPE_VECTOR_SUBPARTS (vectype) != nunits)
> +      if (may_ne (TYPE_VECTOR_SUBPARTS (vectype), nunits))
>         return false;
>      }
> -  else if (nunits != TYPE_VECTOR_SUBPARTS (vectype))
> +  else if (may_ne (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
>      return false;
>
>    /* Can't compare mask and non-mask types.  */
> @@ -9611,8 +9621,8 @@ supportable_widening_operation (enum tre
>          vector types having the same QImode.  Thus we
>          add additional check for elements number.  */
>      return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -           || (TYPE_VECTOR_SUBPARTS (vectype) / 2
> -               == TYPE_VECTOR_SUBPARTS (wide_vectype)));
> +           || must_eq (TYPE_VECTOR_SUBPARTS (vectype),
> +                       TYPE_VECTOR_SUBPARTS (wide_vectype) * 2));
>
>    /* Check if it's a multi-step conversion that can be done using intermediate
>       types.  */
> @@ -9633,8 +9643,10 @@ supportable_widening_operation (enum tre
>        intermediate_mode = insn_data[icode1].operand[0].mode;
>        if (VECTOR_BOOLEAN_TYPE_P (prev_type))
>         {
> +         poly_uint64 intermediate_nelts
> +           = exact_div (TYPE_VECTOR_SUBPARTS (prev_type), 2);
>           intermediate_type
> -           = build_truth_vector_type (TYPE_VECTOR_SUBPARTS (prev_type) / 2,
> +           = build_truth_vector_type (intermediate_nelts,
>                                        current_vector_size);
>           if (intermediate_mode != TYPE_MODE (intermediate_type))
>             return false;
> @@ -9664,8 +9676,8 @@ supportable_widening_operation (enum tre
>        if (insn_data[icode1].operand[0].mode == TYPE_MODE (wide_vectype)
>           && insn_data[icode2].operand[0].mode == TYPE_MODE (wide_vectype))
>         return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -               || (TYPE_VECTOR_SUBPARTS (intermediate_type) / 2
> -                   == TYPE_VECTOR_SUBPARTS (wide_vectype)));
> +               || must_eq (TYPE_VECTOR_SUBPARTS (intermediate_type),
> +                           TYPE_VECTOR_SUBPARTS (wide_vectype) * 2));
>
>        prev_type = intermediate_type;
>        prev_mode = intermediate_mode;
> @@ -9753,8 +9765,8 @@ supportable_narrowing_operation (enum tr
>         vector types having the same QImode.  Thus we
>         add additional check for elements number.  */
>      return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -           || (TYPE_VECTOR_SUBPARTS (vectype) * 2
> -               == TYPE_VECTOR_SUBPARTS (narrow_vectype)));
> +           || must_eq (TYPE_VECTOR_SUBPARTS (vectype) * 2,
> +                       TYPE_VECTOR_SUBPARTS (narrow_vectype)));
>
>    /* Check if it's a multi-step conversion that can be done using intermediate
>       types.  */
> @@ -9820,8 +9832,8 @@ supportable_narrowing_operation (enum tr
>
>        if (insn_data[icode1].operand[0].mode == TYPE_MODE (narrow_vectype))
>         return (!VECTOR_BOOLEAN_TYPE_P (vectype)
> -               || (TYPE_VECTOR_SUBPARTS (intermediate_type) * 2
> -                   == TYPE_VECTOR_SUBPARTS (narrow_vectype)));
> +               || must_eq (TYPE_VECTOR_SUBPARTS (intermediate_type) * 2,
> +                           TYPE_VECTOR_SUBPARTS (narrow_vectype)));
>
>        prev_mode = intermediate_mode;
>        prev_type = intermediate_type;
> Index: gcc/ada/gcc-interface/utils.c
> ===================================================================
> --- gcc/ada/gcc-interface/utils.c       2017-10-23 11:41:24.988650286 +0100
> +++ gcc/ada/gcc-interface/utils.c       2017-10-23 17:25:51.723380471 +0100
> @@ -3528,7 +3528,7 @@ gnat_types_compatible_p (tree t1, tree t
>    /* Vector types are also compatible if they have the same number of subparts
>       and the same form of (scalar) element type.  */
>    if (code == VECTOR_TYPE
> -      && TYPE_VECTOR_SUBPARTS (t1) == TYPE_VECTOR_SUBPARTS (t2)
> +      && must_eq (TYPE_VECTOR_SUBPARTS (t1), TYPE_VECTOR_SUBPARTS (t2))
>        && TREE_CODE (TREE_TYPE (t1)) == TREE_CODE (TREE_TYPE (t2))
>        && TYPE_PRECISION (TREE_TYPE (t1)) == TYPE_PRECISION (TREE_TYPE (t2)))
>      return 1;
> Index: gcc/brig/brigfrontend/brig-to-generic.cc
> ===================================================================
> --- gcc/brig/brigfrontend/brig-to-generic.cc    2017-10-10 16:57:41.296192291 +0100
> +++ gcc/brig/brigfrontend/brig-to-generic.cc    2017-10-23 17:25:51.724380435 +0100
> @@ -869,7 +869,7 @@ get_unsigned_int_type (tree original_typ
>      {
>        size_t esize
>         = int_size_in_bytes (TREE_TYPE (original_type)) * BITS_PER_UNIT;
> -      size_t ecount = TYPE_VECTOR_SUBPARTS (original_type);
> +      poly_uint64 ecount = TYPE_VECTOR_SUBPARTS (original_type);
>        return build_vector_type (build_nonstandard_integer_type (esize, true),
>                                 ecount);
>      }
> Index: gcc/brig/brigfrontend/brig-util.h
> ===================================================================
> --- gcc/brig/brigfrontend/brig-util.h   2017-10-23 17:22:46.882758777 +0100
> +++ gcc/brig/brigfrontend/brig-util.h   2017-10-23 17:25:51.724380435 +0100
> @@ -81,7 +81,7 @@ bool hsa_type_packed_p (BrigType16_t typ
>  inline unsigned HOST_WIDE_INT
>  gccbrig_type_vector_subparts (const_tree type)
>  {
> -  return TYPE_VECTOR_SUBPARTS (type);
> +  return TYPE_VECTOR_SUBPARTS (type).to_constant ();
>  }
>
>  #endif
> Index: gcc/c-family/c-common.c
> ===================================================================
> --- gcc/c-family/c-common.c     2017-10-23 11:41:23.219573771 +0100
> +++ gcc/c-family/c-common.c     2017-10-23 17:25:51.725380399 +0100
> @@ -942,15 +942,16 @@ vector_types_convertible_p (const_tree t
>
>    convertible_lax =
>      (tree_int_cst_equal (TYPE_SIZE (t1), TYPE_SIZE (t2))
> -     && (TREE_CODE (TREE_TYPE (t1)) != REAL_TYPE ||
> -        TYPE_VECTOR_SUBPARTS (t1) == TYPE_VECTOR_SUBPARTS (t2))
> +     && (TREE_CODE (TREE_TYPE (t1)) != REAL_TYPE
> +        || must_eq (TYPE_VECTOR_SUBPARTS (t1),
> +                    TYPE_VECTOR_SUBPARTS (t2)))
>       && (INTEGRAL_TYPE_P (TREE_TYPE (t1))
>          == INTEGRAL_TYPE_P (TREE_TYPE (t2))));
>
>    if (!convertible_lax || flag_lax_vector_conversions)
>      return convertible_lax;
>
> -  if (TYPE_VECTOR_SUBPARTS (t1) == TYPE_VECTOR_SUBPARTS (t2)
> +  if (must_eq (TYPE_VECTOR_SUBPARTS (t1), TYPE_VECTOR_SUBPARTS (t2))
>        && lang_hooks.types_compatible_p (TREE_TYPE (t1), TREE_TYPE (t2)))
>      return true;
>
> @@ -1018,10 +1019,10 @@ c_build_vec_perm_expr (location_t loc, t
>        return error_mark_node;
>      }
>
> -  if (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0))
> -      != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))
> -      && TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1))
> -        != TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
> +  if (may_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v0)),
> +             TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)))
> +      && may_ne (TYPE_VECTOR_SUBPARTS (TREE_TYPE (v1)),
> +                TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask))))
>      {
>        if (complain)
>         error_at (loc, "__builtin_shuffle number of elements of the "
> @@ -2280,7 +2281,8 @@ c_common_type_for_mode (machine_mode mod
>        if (inner_type != NULL_TREE)
>         return build_complex_type (inner_type);
>      }
> -  else if (VECTOR_MODE_P (mode))
> +  else if (VECTOR_MODE_P (mode)
> +          && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>      {
>        machine_mode inner_mode = GET_MODE_INNER (mode);
>        tree inner_type = c_common_type_for_mode (inner_mode, unsignedp);
> @@ -7591,7 +7593,7 @@ convert_vector_to_array_for_subscript (l
>
>        if (TREE_CODE (index) == INTEGER_CST)
>          if (!tree_fits_uhwi_p (index)
> -            || tree_to_uhwi (index) >= TYPE_VECTOR_SUBPARTS (type))
> +           || may_ge (tree_to_uhwi (index), TYPE_VECTOR_SUBPARTS (type)))
>            warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
>
>        /* We are building an ARRAY_REF so mark the vector as addressable
> Index: gcc/c/c-typeck.c
> ===================================================================
> --- gcc/c/c-typeck.c    2017-10-10 17:55:22.067175462 +0100
> +++ gcc/c/c-typeck.c    2017-10-23 17:25:51.726380364 +0100
> @@ -1238,7 +1238,7 @@ comptypes_internal (const_tree type1, co
>        break;
>
>      case VECTOR_TYPE:
> -      val = (TYPE_VECTOR_SUBPARTS (t1) == TYPE_VECTOR_SUBPARTS (t2)
> +      val = (must_eq (TYPE_VECTOR_SUBPARTS (t1), TYPE_VECTOR_SUBPARTS (t2))
>              && comptypes_internal (TREE_TYPE (t1), TREE_TYPE (t2),
>                                     enum_and_int_p, different_types_p));
>        break;
> @@ -11343,7 +11343,8 @@ build_binary_op (location_t location, en
>        if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
>           && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
>           && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
> -         && TYPE_VECTOR_SUBPARTS (type0) == TYPE_VECTOR_SUBPARTS (type1))
> +         && must_eq (TYPE_VECTOR_SUBPARTS (type0),
> +                     TYPE_VECTOR_SUBPARTS (type1)))
>         {
>           result_type = type0;
>           converted = 1;
> @@ -11400,7 +11401,8 @@ build_binary_op (location_t location, en
>        if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
>           && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
>           && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
> -         && TYPE_VECTOR_SUBPARTS (type0) == TYPE_VECTOR_SUBPARTS (type1))
> +         && must_eq (TYPE_VECTOR_SUBPARTS (type0),
> +                     TYPE_VECTOR_SUBPARTS (type1)))
>         {
>           result_type = type0;
>           converted = 1;
> @@ -11474,7 +11476,8 @@ build_binary_op (location_t location, en
>                return error_mark_node;
>              }
>
> -          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
> +         if (may_ne (TYPE_VECTOR_SUBPARTS (type0),
> +                     TYPE_VECTOR_SUBPARTS (type1)))
>              {
>                error_at (location, "comparing vectors with different "
>                                    "number of elements");
> @@ -11634,7 +11637,8 @@ build_binary_op (location_t location, en
>                return error_mark_node;
>              }
>
> -          if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
> +         if (may_ne (TYPE_VECTOR_SUBPARTS (type0),
> +                     TYPE_VECTOR_SUBPARTS (type1)))
>              {
>                error_at (location, "comparing vectors with different "
>                                    "number of elements");
> Index: gcc/cp/call.c
> ===================================================================
> --- gcc/cp/call.c       2017-10-23 11:41:24.251615675 +0100
> +++ gcc/cp/call.c       2017-10-23 17:25:51.728380292 +0100
> @@ -4928,8 +4928,8 @@ build_conditional_expr_1 (location_t loc
>         }
>
>        if (!same_type_p (arg2_type, arg3_type)
> -         || TYPE_VECTOR_SUBPARTS (arg1_type)
> -            != TYPE_VECTOR_SUBPARTS (arg2_type)
> +         || may_ne (TYPE_VECTOR_SUBPARTS (arg1_type),
> +                    TYPE_VECTOR_SUBPARTS (arg2_type))
>           || TYPE_SIZE (arg1_type) != TYPE_SIZE (arg2_type))
>         {
>           if (complain & tf_error)
> Index: gcc/cp/constexpr.c
> ===================================================================
> --- gcc/cp/constexpr.c  2017-10-23 17:18:47.657057799 +0100
> +++ gcc/cp/constexpr.c  2017-10-23 17:25:51.728380292 +0100
> @@ -3059,7 +3059,8 @@ cxx_fold_indirect_ref (location_t loc, t
>               unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
>               tree index = bitsize_int (indexi);
>
> -             if (offset / part_widthi < TYPE_VECTOR_SUBPARTS (op00type))
> +             if (must_lt (offset / part_widthi,
> +                          TYPE_VECTOR_SUBPARTS (op00type)))
>                 return fold_build3_loc (loc,
>                                         BIT_FIELD_REF, type, op00,
>                                         part_width, index);
> Index: gcc/cp/decl.c
> ===================================================================
> --- gcc/cp/decl.c       2017-10-23 11:41:24.223565801 +0100
> +++ gcc/cp/decl.c       2017-10-23 17:25:51.732380148 +0100
> @@ -7454,7 +7454,11 @@ cp_finish_decomp (tree decl, tree first,
>      }
>    else if (TREE_CODE (type) == VECTOR_TYPE)
>      {
> -      eltscnt = TYPE_VECTOR_SUBPARTS (type);
> +      if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&eltscnt))
> +       {
> +         error_at (loc, "cannot decompose variable length vector %qT", type);
> +         goto error_out;
> +       }
>        if (count != eltscnt)
>         goto cnt_mismatch;
>        eltype = cp_build_qualified_type (TREE_TYPE (type), TYPE_QUALS (type));
> Index: gcc/cp/mangle.c
> ===================================================================
> --- gcc/cp/mangle.c     2017-10-10 17:55:22.087175461 +0100
> +++ gcc/cp/mangle.c     2017-10-23 17:25:51.733380112 +0100
> @@ -2260,7 +2260,8 @@ write_type (tree type)
>                   write_string ("Dv");
>                   /* Non-constant vector size would be encoded with
>                      _ expression, but we don't support that yet.  */
> -                 write_unsigned_number (TYPE_VECTOR_SUBPARTS (type));
> +                 write_unsigned_number (TYPE_VECTOR_SUBPARTS (type)
> +                                        .to_constant ());
>                   write_char ('_');
>                 }
>               else
> Index: gcc/cp/typeck.c
> ===================================================================
> --- gcc/cp/typeck.c     2017-10-23 11:41:24.212926194 +0100
> +++ gcc/cp/typeck.c     2017-10-23 17:25:51.735380040 +0100
> @@ -1359,7 +1359,7 @@ structural_comptypes (tree t1, tree t2,
>        break;
>
>      case VECTOR_TYPE:
> -      if (TYPE_VECTOR_SUBPARTS (t1) != TYPE_VECTOR_SUBPARTS (t2)
> +      if (may_ne (TYPE_VECTOR_SUBPARTS (t1), TYPE_VECTOR_SUBPARTS (t2))
>           || !same_type_p (TREE_TYPE (t1), TREE_TYPE (t2)))
>         return false;
>        break;
> @@ -4513,9 +4513,10 @@ cp_build_binary_op (location_t location,
>            converted = 1;
>          }
>        else if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
> -         && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
> -         && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
> -         && TYPE_VECTOR_SUBPARTS (type0) == TYPE_VECTOR_SUBPARTS (type1))
> +              && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
> +              && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
> +              && must_eq (TYPE_VECTOR_SUBPARTS (type0),
> +                          TYPE_VECTOR_SUBPARTS (type1)))
>         {
>           result_type = type0;
>           converted = 1;
> @@ -4558,9 +4559,10 @@ cp_build_binary_op (location_t location,
>            converted = 1;
>          }
>        else if (code0 == VECTOR_TYPE && code1 == VECTOR_TYPE
> -         && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
> -         && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
> -         && TYPE_VECTOR_SUBPARTS (type0) == TYPE_VECTOR_SUBPARTS (type1))
> +              && TREE_CODE (TREE_TYPE (type0)) == INTEGER_TYPE
> +              && TREE_CODE (TREE_TYPE (type1)) == INTEGER_TYPE
> +              && must_eq (TYPE_VECTOR_SUBPARTS (type0),
> +                          TYPE_VECTOR_SUBPARTS (type1)))
>         {
>           result_type = type0;
>           converted = 1;
> @@ -4925,7 +4927,8 @@ cp_build_binary_op (location_t location,
>               return error_mark_node;
>             }
>
> -         if (TYPE_VECTOR_SUBPARTS (type0) != TYPE_VECTOR_SUBPARTS (type1))
> +         if (may_ne (TYPE_VECTOR_SUBPARTS (type0),
> +                     TYPE_VECTOR_SUBPARTS (type1)))
>             {
>               if (complain & tf_error)
>                 {
> Index: gcc/cp/typeck2.c
> ===================================================================
> --- gcc/cp/typeck2.c    2017-10-09 11:50:52.214211104 +0100
> +++ gcc/cp/typeck2.c    2017-10-23 17:25:51.736380004 +0100
> @@ -1276,7 +1276,7 @@ process_init_constructor_array (tree typ
>      }
>    else
>      /* Vectors are like simple fixed-size arrays.  */
> -    len = TYPE_VECTOR_SUBPARTS (type);
> +    unbounded = !TYPE_VECTOR_SUBPARTS (type).is_constant (&len);
>
>    /* There must not be more initializers than needed.  */
>    if (!unbounded && vec_safe_length (v) > len)
> Index: gcc/fortran/trans-types.c
> ===================================================================
> --- gcc/fortran/trans-types.c   2017-09-25 13:57:12.591118003 +0100
> +++ gcc/fortran/trans-types.c   2017-10-23 17:25:51.745379681 +0100
> @@ -3159,7 +3159,8 @@ gfc_type_for_mode (machine_mode mode, in
>        tree type = gfc_type_for_size (GET_MODE_PRECISION (int_mode), unsignedp);
>        return type != NULL_TREE && mode == TYPE_MODE (type) ? type : NULL_TREE;
>      }
> -  else if (VECTOR_MODE_P (mode))
> +  else if (VECTOR_MODE_P (mode)
> +          && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>      {
>        machine_mode inner_mode = GET_MODE_INNER (mode);
>        tree inner_type = gfc_type_for_mode (inner_mode, unsignedp);
> Index: gcc/lto/lto-lang.c
> ===================================================================
> --- gcc/lto/lto-lang.c  2017-10-23 11:41:25.563189078 +0100
> +++ gcc/lto/lto-lang.c  2017-10-23 17:25:51.748379573 +0100
> @@ -971,7 +971,8 @@ lto_type_for_mode (machine_mode mode, in
>        if (inner_type != NULL_TREE)
>         return build_complex_type (inner_type);
>      }
> -  else if (VECTOR_MODE_P (mode))
> +  else if (VECTOR_MODE_P (mode)
> +          && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>      {
>        machine_mode inner_mode = GET_MODE_INNER (mode);
>        tree inner_type = lto_type_for_mode (inner_mode, unsigned_p);
> Index: gcc/lto/lto.c
> ===================================================================
> --- gcc/lto/lto.c       2017-10-13 10:23:39.776947828 +0100
> +++ gcc/lto/lto.c       2017-10-23 17:25:51.749379537 +0100
> @@ -316,7 +316,7 @@ hash_canonical_type (tree type)
>
>    if (VECTOR_TYPE_P (type))
>      {
> -      hstate.add_int (TYPE_VECTOR_SUBPARTS (type));
> +      hstate.add_poly_int (TYPE_VECTOR_SUBPARTS (type));
>        hstate.add_int (TYPE_UNSIGNED (type));
>      }
>
> Index: gcc/go/go-lang.c
> ===================================================================
> --- gcc/go/go-lang.c    2017-08-30 12:20:57.010045759 +0100
> +++ gcc/go/go-lang.c    2017-10-23 17:25:51.747379609 +0100
> @@ -372,7 +372,8 @@ go_langhook_type_for_mode (machine_mode
>       make sense for the middle-end to ask the frontend for a type
>       which the frontend does not support.  However, at least for now
>       it is required.  See PR 46805.  */
> -  if (VECTOR_MODE_P (mode))
> +  if (VECTOR_MODE_P (mode)
> +      && valid_vector_subparts_p (GET_MODE_NUNITS (mode)))
>      {
>        tree inner;
>

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
                   ` (106 preceding siblings ...)
  2017-10-23 17:48 ` [107/nnn] poly_int: GET_MODE_SIZE Richard Sandiford
@ 2017-10-24  9:25 ` Eric Botcazou
  2017-10-24  9:58   ` Richard Sandiford
  107 siblings, 1 reply; 302+ messages in thread
From: Eric Botcazou @ 2017-10-24  9:25 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

> The patch that adds poly_int has detailed documentation, but the main
> points are:
> 
> * there's no total ordering between poly_ints, so the best we can do
>   when comparing them is to ask whether two values *might* or *must*
>   be related in a particular way.  E.g. if mode A has size 2 + 2X
>   and mode B has size 4, the condition:
> 
>     GET_MODE_SIZE (A) <= GET_MODE_SIZE (B)
> 
>   is true for X<=1 and false for X>=2.  This translates to:
> 
>     may_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == true
>     must_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == false
> 
>   Of course, the may/must distinction already exists in things like
>   alias analysis.

I presume that you considered using traditional operators instead of awkward 
names, despite the lack of total ordering, and rejected it?  Because:

-      && (bitpos == 0 || MEM_P (target)))
+      && (known_zero (bitpos) || MEM_P (target)))

-             && bitsize == TYPE_PRECISION (type))
+             && must_eq (bitsize, TYPE_PRECISION (type)))

-       if (need_to_clear && size > 0)
+       if (need_to_clear && may_gt (size, 0))

is really ugly...

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS
  2017-10-24  9:06   ` Richard Biener
@ 2017-10-24  9:40     ` Richard Sandiford
  2017-10-24 10:01       ` Richard Biener
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-24  9:40 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Oct 23, 2017 at 7:41 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64.  The value is
>> encoded in the 10-bit precision field and was previously always stored
>> as a simple log2 value.  The challenge was to use this 10 bits to
>> encode the number of elements in variable-length vectors, so that
>> we didn't need to increase the size of the tree.
>>
>> In practice the number of vector elements should always have the form
>> N + N * X (where X is the runtime value), and as for constant-length
>> vectors, N must be a power of 2 (even though X itself might not be).
>> The patch therefore uses the low bit to select between constant-length
>> and variable-length and uses the upper 9 bits to encode log2(N).
>> Targets without variable-length vectors continue to use the old scheme.
>>
>> A new valid_vector_subparts_p function tests whether a given number
>> of elements can be encoded.  This is false for the vector modes that
>> represent an LD3 or ST3 vector triple (which we want to treat as arrays
>> of vectors rather than single vectors).
>>
>> Most of the patch is mechanical; previous patches handled the changes
>> that weren't entirely straightforward.
>
> One comment, w/o actually reviewing may/must stuff (will comment on that
> elsewhere).
>
> You split 10 bits into 9 and 1, wouldn't it be more efficient to use the
> lower 8 bits for the log2 value of N and either of the two remaining bits
> for the flag?  That way the 8 bits for the shift amount can be eventually
> accessed in a more efficient way.
>
> Guess you'd need to compare code-generation of the TYPE_VECTOR_SUBPARTS
> accessor on aarch64 / x86_64.

Ah, yeah.  I'll give that a go.

> Am I correct that NUM_POLY_INT_COEFFS is 1 for targets that do not
> have variable length vector modes?

Right.  1 is the default and only AArch64 defines it to anything else (2).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-24  9:25 ` [000/nnn] poly_int: representation of runtime offsets and sizes Eric Botcazou
@ 2017-10-24  9:58   ` Richard Sandiford
  2017-10-24 10:53     ` Eric Botcazou
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-24  9:58 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

Eric Botcazou <ebotcazou@adacore.com> writes:
>> The patch that adds poly_int has detailed documentation, but the main
>> points are:
>> 
>> * there's no total ordering between poly_ints, so the best we can do
>>   when comparing them is to ask whether two values *might* or *must*
>>   be related in a particular way.  E.g. if mode A has size 2 + 2X
>>   and mode B has size 4, the condition:
>> 
>>     GET_MODE_SIZE (A) <= GET_MODE_SIZE (B)
>> 
>>   is true for X<=1 and false for X>=2.  This translates to:
>> 
>>     may_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == true
>>     must_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == false
>> 
>>   Of course, the may/must distinction already exists in things like
>>   alias analysis.
>
> I presume that you considered using traditional operators instead of awkward 
> names, despite the lack of total ordering, and rejected it?

Yeah.  E.g. for ==, the two options would be:

a) must_eq (a, b)   -> a == b
   must_ne (a, b)   -> a != b

   which has the weird property that (a == b) != (!(a != b))

b) must_eq (a, b)   -> a == b
   may_ne (a, b)    -> a != b

   which has the weird property that a can be equal to b when a != b

may/must matters in a similar way as it does for alias analysis:
"may" usually selects conservatively-correct, just-in-case behaviour
while "must" selects something that would be wrong if the condition
didn't hold.

> Because:
>
> -      && (bitpos == 0 || MEM_P (target)))
> +      && (known_zero (bitpos) || MEM_P (target)))
>
> -             && bitsize == TYPE_PRECISION (type))
> +             && must_eq (bitsize, TYPE_PRECISION (type)))
>
> -       if (need_to_clear && size > 0)
> +       if (need_to_clear && may_gt (size, 0))
>
> is really ugly...

Sorry about that.  It's the best I could come up with without losing
the may/must distinction.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS
  2017-10-24  9:40     ` Richard Sandiford
@ 2017-10-24 10:01       ` Richard Biener
  2017-10-24 11:20         ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Biener @ 2017-10-24 10:01 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Tue, Oct 24, 2017 at 11:40 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Mon, Oct 23, 2017 at 7:41 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64.  The value is
>>> encoded in the 10-bit precision field and was previously always stored
>>> as a simple log2 value.  The challenge was to use this 10 bits to
>>> encode the number of elements in variable-length vectors, so that
>>> we didn't need to increase the size of the tree.
>>>
>>> In practice the number of vector elements should always have the form
>>> N + N * X (where X is the runtime value), and as for constant-length
>>> vectors, N must be a power of 2 (even though X itself might not be).
>>> The patch therefore uses the low bit to select between constant-length
>>> and variable-length and uses the upper 9 bits to encode log2(N).
>>> Targets without variable-length vectors continue to use the old scheme.
>>>
>>> A new valid_vector_subparts_p function tests whether a given number
>>> of elements can be encoded.  This is false for the vector modes that
>>> represent an LD3 or ST3 vector triple (which we want to treat as arrays
>>> of vectors rather than single vectors).
>>>
>>> Most of the patch is mechanical; previous patches handled the changes
>>> that weren't entirely straightforward.
>>
>> One comment, w/o actually reviewing may/must stuff (will comment on that
>> elsewhere).
>>
>> You split 10 bits into 9 and 1, wouldn't it be more efficient to use the
>> lower 8 bits for the log2 value of N and either of the two remaining bits
>> for the flag?  That way the 8 bits for the shift amount can be eventually
>> accessed in a more efficient way.
>>
>> Guess you'd need to compare code-generation of the TYPE_VECTOR_SUBPARTS
>> accessor on aarch64 / x86_64.
>
> Ah, yeah.  I'll give that a go.
>
>> Am I correct that NUM_POLY_INT_COEFFS is 1 for targets that do not
>> have variable length vector modes?
>
> Right.  1 is the default and only AArch64 defines it to anything else (2).

Going to be interesting (bitrot) times then?  I wonder if it makes sense
to initially define it to 2 globally and only change it to 1 later?

Do you have any numbers on the effect of poly-int on compile-times?
Esp. for example on stage2 build times when stage1 is -O0 -g "optimized"?

Thanks,
Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-24  9:58   ` Richard Sandiford
@ 2017-10-24 10:53     ` Eric Botcazou
  2017-10-24 11:25       ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Eric Botcazou @ 2017-10-24 10:53 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

> Yeah.  E.g. for ==, the two options would be:
> 
> a) must_eq (a, b)   -> a == b
>    must_ne (a, b)   -> a != b
> 
>    which has the weird property that (a == b) != (!(a != b))
> 
> b) must_eq (a, b)   -> a == b
>    may_ne (a, b)    -> a != b
> 
>    which has the weird property that a can be equal to b when a != b

Yes, a) was the one I had in mind, i.e. the traditional operators are the must 
variants and you use an outer ! in order to express the may.  Of course this 
would require a bit of discipline but, on the other hand, if most of the cases 
fall in the must category, that could be less ugly.

> Sorry about that.  It's the best I could come up with without losing
> the may/must distinction.

Which variant is known_zero though?  Must or may?

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS
  2017-10-24 10:01       ` Richard Biener
@ 2017-10-24 11:20         ` Richard Sandiford
  2017-10-24 11:30           ` Richard Biener
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-24 11:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Oct 24, 2017 at 11:40 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Mon, Oct 23, 2017 at 7:41 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64.  The value is
>>>> encoded in the 10-bit precision field and was previously always stored
>>>> as a simple log2 value.  The challenge was to use this 10 bits to
>>>> encode the number of elements in variable-length vectors, so that
>>>> we didn't need to increase the size of the tree.
>>>>
>>>> In practice the number of vector elements should always have the form
>>>> N + N * X (where X is the runtime value), and as for constant-length
>>>> vectors, N must be a power of 2 (even though X itself might not be).
>>>> The patch therefore uses the low bit to select between constant-length
>>>> and variable-length and uses the upper 9 bits to encode log2(N).
>>>> Targets without variable-length vectors continue to use the old scheme.
>>>>
>>>> A new valid_vector_subparts_p function tests whether a given number
>>>> of elements can be encoded.  This is false for the vector modes that
>>>> represent an LD3 or ST3 vector triple (which we want to treat as arrays
>>>> of vectors rather than single vectors).
>>>>
>>>> Most of the patch is mechanical; previous patches handled the changes
>>>> that weren't entirely straightforward.
>>>
>>> One comment, w/o actually reviewing may/must stuff (will comment on that
>>> elsewhere).
>>>
>>> You split 10 bits into 9 and 1, wouldn't it be more efficient to use the
>>> lower 8 bits for the log2 value of N and either of the two remaining bits
>>> for the flag?  That way the 8 bits for the shift amount can be eventually
>>> accessed in a more efficient way.
>>>
>>> Guess you'd need to compare code-generation of the TYPE_VECTOR_SUBPARTS
>>> accessor on aarch64 / x86_64.
>>
>> Ah, yeah.  I'll give that a go.
>>
>>> Am I correct that NUM_POLY_INT_COEFFS is 1 for targets that do not
>>> have variable length vector modes?
>>
>> Right.  1 is the default and only AArch64 defines it to anything else (2).
>
> Going to be interesting (bitrot) times then?  I wonder if it makes sense
> to initially define it to 2 globally and only change it to 1 later?

Well, the target-independent code doesn't have the implicit conversion
from poly_int<1, C> to C, so it can't e.g. do:

  poly_int64 x = ...;
  HOST_WIDE_INT y = x;

even when NUM_POLY_INT_COEFFS==1.  Only target-specific code (identified
by IN_TARGET_CODE) can do that.

So to target-independent code it doesn't really matter what
NUM_POLY_INT_COEFFS is.  Even if we bumped it to 2, the extra coefficient
would always be zero.

FWIW, the poly_int tests in [001/nnn] cover N == 1, 2 and (as far as
supported) 3 for all targets, so that part isn't sensitive to
NUM_POLY_INT_COEFFS.

> Do you have any numbers on the effect of poly-int on compile-times?
> Esp. for example on stage2 build times when stage1 is -O0 -g "optimized"?

I've just tried that for an x86_64 -j24 build and got:

real: +7%
user: +8.6%

I don't know how noisy the results are though.

It's compile-time neutral in terms of running a gcc built with
--enable-checking=release, within a margin of about [-0.1%, 0.1%].

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-24 10:53     ` Eric Botcazou
@ 2017-10-24 11:25       ` Richard Sandiford
  2017-10-24 12:24         ` Richard Biener
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-24 11:25 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

Eric Botcazou <ebotcazou@adacore.com> writes:
>> Yeah.  E.g. for ==, the two options would be:
>> 
>> a) must_eq (a, b)   -> a == b
>>    must_ne (a, b)   -> a != b
>> 
>>    which has the weird property that (a == b) != (!(a != b))
>> 
>> b) must_eq (a, b)   -> a == b
>>    may_ne (a, b)    -> a != b
>> 
>>    which has the weird property that a can be equal to b when a != b
>
> Yes, a) was the one I had in mind, i.e. the traditional operators are the must 
> variants and you use an outer ! in order to express the may.  Of course this 
> would require a bit of discipline but, on the other hand, if most of the cases 
> fall in the must category, that could be less ugly.

I just think that discipline is going to be hard to maintain in practice,
since it's so natural to assume (a == b || a != b) == true.  With the
may/must approach, static type checking forces the issue.

>> Sorry about that.  It's the best I could come up with without losing
>> the may/must distinction.
>
> Which variant is known_zero though?  Must or may?

must.  maybe_nonzero is the may version.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS
  2017-10-24 11:20         ` Richard Sandiford
@ 2017-10-24 11:30           ` Richard Biener
  2017-10-24 16:24             ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Biener @ 2017-10-24 11:30 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Tue, Oct 24, 2017 at 1:18 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Tue, Oct 24, 2017 at 11:40 AM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Mon, Oct 23, 2017 at 7:41 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64.  The value is
>>>>> encoded in the 10-bit precision field and was previously always stored
>>>>> as a simple log2 value.  The challenge was to use this 10 bits to
>>>>> encode the number of elements in variable-length vectors, so that
>>>>> we didn't need to increase the size of the tree.
>>>>>
>>>>> In practice the number of vector elements should always have the form
>>>>> N + N * X (where X is the runtime value), and as for constant-length
>>>>> vectors, N must be a power of 2 (even though X itself might not be).
>>>>> The patch therefore uses the low bit to select between constant-length
>>>>> and variable-length and uses the upper 9 bits to encode log2(N).
>>>>> Targets without variable-length vectors continue to use the old scheme.
>>>>>
>>>>> A new valid_vector_subparts_p function tests whether a given number
>>>>> of elements can be encoded.  This is false for the vector modes that
>>>>> represent an LD3 or ST3 vector triple (which we want to treat as arrays
>>>>> of vectors rather than single vectors).
>>>>>
>>>>> Most of the patch is mechanical; previous patches handled the changes
>>>>> that weren't entirely straightforward.
>>>>
>>>> One comment, w/o actually reviewing may/must stuff (will comment on that
>>>> elsewhere).
>>>>
>>>> You split 10 bits into 9 and 1, wouldn't it be more efficient to use the
>>>> lower 8 bits for the log2 value of N and either of the two remaining bits
>>>> for the flag?  That way the 8 bits for the shift amount can be eventually
>>>> accessed in a more efficient way.
>>>>
>>>> Guess you'd need to compare code-generation of the TYPE_VECTOR_SUBPARTS
>>>> accessor on aarch64 / x86_64.
>>>
>>> Ah, yeah.  I'll give that a go.
>>>
>>>> Am I correct that NUM_POLY_INT_COEFFS is 1 for targets that do not
>>>> have variable length vector modes?
>>>
>>> Right.  1 is the default and only AArch64 defines it to anything else (2).
>>
>> Going to be interesting (bitrot) times then?  I wonder if it makes sense
>> to initially define it to 2 globally and only change it to 1 later?
>
> Well, the target-independent code doesn't have the implicit conversion
> from poly_int<1, C> to C, so it can't e.g. do:
>
>   poly_int64 x = ...;
>   HOST_WIDE_INT y = x;
>
> even when NUM_POLY_INT_COEFFS==1.  Only target-specific code (identified
> by IN_TARGET_CODE) can do that.
>
> So to target-independent code it doesn't really matter what
> NUM_POLY_INT_COEFFS is.  Even if we bumped it to 2, the extra coefficient
> would always be zero.
>
> FWIW, the poly_int tests in [001/nnn] cover N == 1, 2 and (as far as
> supported) 3 for all targets, so that part isn't sensitive to
> NUM_POLY_INT_COEFFS.
>
>> Do you have any numbers on the effect of poly-int on compile-times?
>> Esp. for example on stage2 build times when stage1 is -O0 -g "optimized"?
>
> I've just tried that for an x86_64 -j24 build and got:
>
> real: +7%
> user: +8.6%
>
> I don't know how noisy the results are though.

What's the same on AARCH64 where NUM_POLY_INT_COEFFS is 2?

> It's compile-time neutral in terms of running a gcc built with
> --enable-checking=release, within a margin of about [-0.1%, 0.1%].

I would have expected that (on x86_64).  Well, hoped (you basically
stated that in 000/nnn.  The question is what is the effect on AARCH64.
As you know we build openSUSE for AARCH64 and build power is limited ;)

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-24 11:25       ` Richard Sandiford
@ 2017-10-24 12:24         ` Richard Biener
  2017-10-24 13:07           ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Biener @ 2017-10-24 12:24 UTC (permalink / raw)
  To: Eric Botcazou, GCC Patches, Richard Sandiford

On Tue, Oct 24, 2017 at 1:23 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Eric Botcazou <ebotcazou@adacore.com> writes:
>>> Yeah.  E.g. for ==, the two options would be:
>>>
>>> a) must_eq (a, b)   -> a == b
>>>    must_ne (a, b)   -> a != b
>>>
>>>    which has the weird property that (a == b) != (!(a != b))
>>>
>>> b) must_eq (a, b)   -> a == b
>>>    may_ne (a, b)    -> a != b
>>>
>>>    which has the weird property that a can be equal to b when a != b
>>
>> Yes, a) was the one I had in mind, i.e. the traditional operators are the must
>> variants and you use an outer ! in order to express the may.  Of course this
>> would require a bit of discipline but, on the other hand, if most of the cases
>> fall in the must category, that could be less ugly.
>
> I just think that discipline is going to be hard to maintain in practice,
> since it's so natural to assume (a == b || a != b) == true.  With the
> may/must approach, static type checking forces the issue.
>
>>> Sorry about that.  It's the best I could come up with without losing
>>> the may/must distinction.
>>
>> Which variant is known_zero though?  Must or may?
>
> must.  maybe_nonzero is the may version.

Can you rename known_zero to must_be_zero then?  What's wrong with
must_eq (X, 0) / may_eq (X, 0) btw?

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-24 12:24         ` Richard Biener
@ 2017-10-24 13:07           ` Richard Sandiford
  2017-10-24 13:18             ` Richard Biener
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-24 13:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: Eric Botcazou, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Oct 24, 2017 at 1:23 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Eric Botcazou <ebotcazou@adacore.com> writes:
>>>> Yeah.  E.g. for ==, the two options would be:
>>>>
>>>> a) must_eq (a, b)   -> a == b
>>>>    must_ne (a, b)   -> a != b
>>>>
>>>>    which has the weird property that (a == b) != (!(a != b))
>>>>
>>>> b) must_eq (a, b)   -> a == b
>>>>    may_ne (a, b)    -> a != b
>>>>
>>>>    which has the weird property that a can be equal to b when a != b
>>>
>>> Yes, a) was the one I had in mind, i.e. the traditional operators are
>>> the must
>>> variants and you use an outer ! in order to express the may.  Of course this
>>> would require a bit of discipline but, on the other hand, if most of
>>> the cases
>>> fall in the must category, that could be less ugly.
>>
>> I just think that discipline is going to be hard to maintain in practice,
>> since it's so natural to assume (a == b || a != b) == true.  With the
>> may/must approach, static type checking forces the issue.
>>
>>>> Sorry about that.  It's the best I could come up with without losing
>>>> the may/must distinction.
>>>
>>> Which variant is known_zero though?  Must or may?
>>
>> must.  maybe_nonzero is the may version.
>
> Can you rename known_zero to must_be_zero then?

That'd be OK with me.

Another alternative I wondered about was must_eq_0 / may_ne_0.

> What's wrong with must_eq (X, 0) / may_eq (X, 0) btw?

must_eq (X, 0) generated a warning if X is unsigned, so sometimes you'd
need must_eq (X, 0) and sometimes must_eq (X, 0U).  Having a specific
function seemed cleaner, especially in routines that were polymorphic
in X.

Or we could suppress warnings by forcibly converting the input.
Sometimes the warnings are useful though.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-24 13:07           ` Richard Sandiford
@ 2017-10-24 13:18             ` Richard Biener
  2017-10-24 13:30               ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Biener @ 2017-10-24 13:18 UTC (permalink / raw)
  To: Richard Biener, Eric Botcazou, GCC Patches, Richard Sandiford

On Tue, Oct 24, 2017 at 2:48 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Tue, Oct 24, 2017 at 1:23 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Eric Botcazou <ebotcazou@adacore.com> writes:
>>>>> Yeah.  E.g. for ==, the two options would be:
>>>>>
>>>>> a) must_eq (a, b)   -> a == b
>>>>>    must_ne (a, b)   -> a != b
>>>>>
>>>>>    which has the weird property that (a == b) != (!(a != b))
>>>>>
>>>>> b) must_eq (a, b)   -> a == b
>>>>>    may_ne (a, b)    -> a != b
>>>>>
>>>>>    which has the weird property that a can be equal to b when a != b
>>>>
>>>> Yes, a) was the one I had in mind, i.e. the traditional operators are
>>>> the must
>>>> variants and you use an outer ! in order to express the may.  Of course this
>>>> would require a bit of discipline but, on the other hand, if most of
>>>> the cases
>>>> fall in the must category, that could be less ugly.
>>>
>>> I just think that discipline is going to be hard to maintain in practice,
>>> since it's so natural to assume (a == b || a != b) == true.  With the
>>> may/must approach, static type checking forces the issue.
>>>
>>>>> Sorry about that.  It's the best I could come up with without losing
>>>>> the may/must distinction.
>>>>
>>>> Which variant is known_zero though?  Must or may?
>>>
>>> must.  maybe_nonzero is the may version.
>>
>> Can you rename known_zero to must_be_zero then?
>
> That'd be OK with me.
>
> Another alternative I wondered about was must_eq_0 / may_ne_0.
>
>> What's wrong with must_eq (X, 0) / may_eq (X, 0) btw?
>
> must_eq (X, 0) generated a warning if X is unsigned, so sometimes you'd
> need must_eq (X, 0) and sometimes must_eq (X, 0U).

Is that because they are templates?  Maybe providing a partial specialization
would help?

I'd be fine with must_eq_p and may_eq_0.

Richard.

>  Having a specific
> function seemed cleaner, especially in routines that were polymorphic
> in X.
>
> Or we could suppress warnings by forcibly converting the input.
> Sometimes the warnings are useful though.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-24 13:18             ` Richard Biener
@ 2017-10-24 13:30               ` Richard Sandiford
  2017-10-25 10:27                 ` Richard Biener
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-24 13:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: Eric Botcazou, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Oct 24, 2017 at 2:48 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Tue, Oct 24, 2017 at 1:23 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Eric Botcazou <ebotcazou@adacore.com> writes:
>>>>>> Yeah.  E.g. for ==, the two options would be:
>>>>>>
>>>>>> a) must_eq (a, b)   -> a == b
>>>>>>    must_ne (a, b)   -> a != b
>>>>>>
>>>>>>    which has the weird property that (a == b) != (!(a != b))
>>>>>>
>>>>>> b) must_eq (a, b)   -> a == b
>>>>>>    may_ne (a, b)    -> a != b
>>>>>>
>>>>>>    which has the weird property that a can be equal to b when a != b
>>>>>
>>>>> Yes, a) was the one I had in mind, i.e. the traditional operators are
>>>>> the must
>>>>> variants and you use an outer ! in order to express the may.  Of
>>>>> course this
>>>>> would require a bit of discipline but, on the other hand, if most of
>>>>> the cases
>>>>> fall in the must category, that could be less ugly.
>>>>
>>>> I just think that discipline is going to be hard to maintain in practice,
>>>> since it's so natural to assume (a == b || a != b) == true.  With the
>>>> may/must approach, static type checking forces the issue.
>>>>
>>>>>> Sorry about that.  It's the best I could come up with without losing
>>>>>> the may/must distinction.
>>>>>
>>>>> Which variant is known_zero though?  Must or may?
>>>>
>>>> must.  maybe_nonzero is the may version.
>>>
>>> Can you rename known_zero to must_be_zero then?
>>
>> That'd be OK with me.
>>
>> Another alternative I wondered about was must_eq_0 / may_ne_0.
>>
>>> What's wrong with must_eq (X, 0) / may_eq (X, 0) btw?
>>
>> must_eq (X, 0) generated a warning if X is unsigned, so sometimes you'd
>> need must_eq (X, 0) and sometimes must_eq (X, 0U).
>
> Is that because they are templates?  Maybe providing a partial specialization
> would help?

I don't think it's templates specifically.  We end up with something like:

  int f (unsigned int x, const int y)
  {
    return x != y;
  }

  int g (unsigned int x) { return f (x, 0); }

which generates a warning too.

> I'd be fine with must_eq_p and may_eq_0.

OK, I'll switch to that if there are no objections.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS
  2017-10-24 11:30           ` Richard Biener
@ 2017-10-24 16:24             ` Richard Sandiford
  0 siblings, 0 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-10-24 16:24 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Oct 24, 2017 at 1:18 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> Do you have any numbers on the effect of poly-int on compile-times?
>>> Esp. for example on stage2 build times when stage1 is -O0 -g "optimized"?
>>
>> I've just tried that for an x86_64 -j24 build and got:
>>
>> real: +7%
>> user: +8.6%
>>
>> I don't know how noisy the results are though.
>
> What's the same on AARCH64 where NUM_POLY_INT_COEFFS is 2?
>
>> It's compile-time neutral in terms of running a gcc built with
>> --enable-checking=release, within a margin of about [-0.1%, 0.1%].
>
> I would have expected that (on x86_64).  Well, hoped (you basically
> stated that in 000/nnn.

Sorry, wasn't sure how much of the series you'd had a chance to read.

> The question is what is the effect on AARCH64.
> As you know we build openSUSE for AARCH64 and build power is limited ;)

The timings for an AArch64 stage2-bubble with an -O0 -g stage1, for
NUM_POLY_INT_COEFFS==2 is:

real: +17%
user: +20%

Running a gcc built with --enable-checking=release is ~1% slower when
using -g and ~2% slower with -O2 -g.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-24 13:30               ` Richard Sandiford
@ 2017-10-25 10:27                 ` Richard Biener
  2017-10-25 10:45                   ` Jakub Jelinek
  2017-10-25 11:39                   ` Richard Sandiford
  0 siblings, 2 replies; 302+ messages in thread
From: Richard Biener @ 2017-10-25 10:27 UTC (permalink / raw)
  To: Richard Biener, Eric Botcazou, GCC Patches, Richard Sandiford

On Tue, Oct 24, 2017 at 3:24 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Tue, Oct 24, 2017 at 2:48 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Tue, Oct 24, 2017 at 1:23 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> Eric Botcazou <ebotcazou@adacore.com> writes:
>>>>>>> Yeah.  E.g. for ==, the two options would be:
>>>>>>>
>>>>>>> a) must_eq (a, b)   -> a == b
>>>>>>>    must_ne (a, b)   -> a != b
>>>>>>>
>>>>>>>    which has the weird property that (a == b) != (!(a != b))
>>>>>>>
>>>>>>> b) must_eq (a, b)   -> a == b
>>>>>>>    may_ne (a, b)    -> a != b
>>>>>>>
>>>>>>>    which has the weird property that a can be equal to b when a != b
>>>>>>
>>>>>> Yes, a) was the one I had in mind, i.e. the traditional operators are
>>>>>> the must
>>>>>> variants and you use an outer ! in order to express the may.  Of
>>>>>> course this
>>>>>> would require a bit of discipline but, on the other hand, if most of
>>>>>> the cases
>>>>>> fall in the must category, that could be less ugly.
>>>>>
>>>>> I just think that discipline is going to be hard to maintain in practice,
>>>>> since it's so natural to assume (a == b || a != b) == true.  With the
>>>>> may/must approach, static type checking forces the issue.
>>>>>
>>>>>>> Sorry about that.  It's the best I could come up with without losing
>>>>>>> the may/must distinction.
>>>>>>
>>>>>> Which variant is known_zero though?  Must or may?
>>>>>
>>>>> must.  maybe_nonzero is the may version.
>>>>
>>>> Can you rename known_zero to must_be_zero then?
>>>
>>> That'd be OK with me.
>>>
>>> Another alternative I wondered about was must_eq_0 / may_ne_0.
>>>
>>>> What's wrong with must_eq (X, 0) / may_eq (X, 0) btw?
>>>
>>> must_eq (X, 0) generated a warning if X is unsigned, so sometimes you'd
>>> need must_eq (X, 0) and sometimes must_eq (X, 0U).
>>
>> Is that because they are templates?  Maybe providing a partial specialization
>> would help?
>
> I don't think it's templates specifically.  We end up with something like:
>
>   int f (unsigned int x, const int y)
>   {
>     return x != y;
>   }
>
>   int g (unsigned int x) { return f (x, 0); }
>
> which generates a warning too.
>
>> I'd be fine with must_eq_p and may_eq_0.
>
> OK, I'll switch to that if there are no objections.

Hum.  But then we still warn for must_eq_p (x, 1), no?

So why does

  int f (unsigned int x)
  {
     return x != 0;
  }

not warn?  Probably because of promotion of the arg.

Shouldn't we then simply never have a may/must_*_p (T1, T2)
with T1 and T2 being not compatible?  That is, force promotion
rules on them with template magic?

Richard.


> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-25 10:27                 ` Richard Biener
@ 2017-10-25 10:45                   ` Jakub Jelinek
  2017-10-25 11:39                   ` Richard Sandiford
  1 sibling, 0 replies; 302+ messages in thread
From: Jakub Jelinek @ 2017-10-25 10:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: Eric Botcazou, GCC Patches, Richard Sandiford

On Wed, Oct 25, 2017 at 12:19:37PM +0200, Richard Biener wrote:
> Hum.  But then we still warn for must_eq_p (x, 1), no?
> 
> So why does
> 
>   int f (unsigned int x)
>   {
>      return x != 0;
>   }
> 
> not warn?  Probably because of promotion of the arg.

Because then one comparison operand is positive constant smaller
than the signed maximum.
We warn when both comparison operands are variable and one is signed and the
other is unsigned.

	Jakub

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-25 10:27                 ` Richard Biener
  2017-10-25 10:45                   ` Jakub Jelinek
@ 2017-10-25 11:39                   ` Richard Sandiford
  2017-10-25 13:09                     ` Richard Biener
  1 sibling, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-25 11:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: Eric Botcazou, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Oct 24, 2017 at 3:24 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Tue, Oct 24, 2017 at 2:48 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Tue, Oct 24, 2017 at 1:23 PM, Richard Sandiford
>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>> Eric Botcazou <ebotcazou@adacore.com> writes:
>>>>>>>> Yeah.  E.g. for ==, the two options would be:
>>>>>>>>
>>>>>>>> a) must_eq (a, b)   -> a == b
>>>>>>>>    must_ne (a, b)   -> a != b
>>>>>>>>
>>>>>>>>    which has the weird property that (a == b) != (!(a != b))
>>>>>>>>
>>>>>>>> b) must_eq (a, b)   -> a == b
>>>>>>>>    may_ne (a, b)    -> a != b
>>>>>>>>
>>>>>>>>    which has the weird property that a can be equal to b when a != b
>>>>>>>
>>>>>>> Yes, a) was the one I had in mind, i.e. the traditional operators are
>>>>>>> the must
>>>>>>> variants and you use an outer ! in order to express the may.  Of
>>>>>>> course this
>>>>>>> would require a bit of discipline but, on the other hand, if most of
>>>>>>> the cases
>>>>>>> fall in the must category, that could be less ugly.
>>>>>>
>>>>>> I just think that discipline is going to be hard to maintain in practice,
>>>>>> since it's so natural to assume (a == b || a != b) == true.  With the
>>>>>> may/must approach, static type checking forces the issue.
>>>>>>
>>>>>>>> Sorry about that.  It's the best I could come up with without losing
>>>>>>>> the may/must distinction.
>>>>>>>
>>>>>>> Which variant is known_zero though?  Must or may?
>>>>>>
>>>>>> must.  maybe_nonzero is the may version.
>>>>>
>>>>> Can you rename known_zero to must_be_zero then?
>>>>
>>>> That'd be OK with me.
>>>>
>>>> Another alternative I wondered about was must_eq_0 / may_ne_0.
>>>>
>>>>> What's wrong with must_eq (X, 0) / may_eq (X, 0) btw?
>>>>
>>>> must_eq (X, 0) generated a warning if X is unsigned, so sometimes you'd
>>>> need must_eq (X, 0) and sometimes must_eq (X, 0U).
>>>
>>> Is that because they are templates?  Maybe providing a partial specialization
>>> would help?
>>
>> I don't think it's templates specifically.  We end up with something like:
>>
>>   int f (unsigned int x, const int y)
>>   {
>>     return x != y;
>>   }
>>
>>   int g (unsigned int x) { return f (x, 0); }
>>
>> which generates a warning too.
>>
>>> I'd be fine with must_eq_p and may_eq_0.
>>
>> OK, I'll switch to that if there are no objections.
>
> Hum.  But then we still warn for must_eq_p (x, 1), no?

Yeah.  The patch also had a known_one and known_all_ones for
those two (fairly) common cases.  For other values the patches
just add "U" where necessary.

If you think it would be better to use U consistently and not
have the helpers, then I'm happy to do that instead.

> So why does
>
>   int f (unsigned int x)
>   {
>      return x != 0;
>   }
>
> not warn?  Probably because of promotion of the arg.

[Jakub's already answered this part.]

> Shouldn't we then simply never have a may/must_*_p (T1, T2)
> with T1 and T2 being not compatible?  That is, force promotion
> rules on them with template magic?

This was what I meant by:

  Or we could suppress warnings by forcibly converting the input.
  Sometimes the warnings are useful though.

We already do this kind of conversion for arithmetic, to ensure
that poly_uint16 + poly_uint16 -> poly_int64 promotes before the
addition rather than after it.  But it defeats the point of the
comparison warning, which is that you're potentially redefining
the sign bit.

I think the warning's just as valuable for may/must comparison of
non-literals as it is for normal comparison operators.  It's just
unfortunate that we no longer get the special handling of literals.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-25 11:39                   ` Richard Sandiford
@ 2017-10-25 13:09                     ` Richard Biener
  2017-11-08  9:51                       ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Biener @ 2017-10-25 13:09 UTC (permalink / raw)
  To: Richard Biener, Eric Botcazou, GCC Patches, Richard Sandiford

On Wed, Oct 25, 2017 at 1:26 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Tue, Oct 24, 2017 at 3:24 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Tue, Oct 24, 2017 at 2:48 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>> On Tue, Oct 24, 2017 at 1:23 PM, Richard Sandiford
>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>> Eric Botcazou <ebotcazou@adacore.com> writes:
>>>>>>>>> Yeah.  E.g. for ==, the two options would be:
>>>>>>>>>
>>>>>>>>> a) must_eq (a, b)   -> a == b
>>>>>>>>>    must_ne (a, b)   -> a != b
>>>>>>>>>
>>>>>>>>>    which has the weird property that (a == b) != (!(a != b))
>>>>>>>>>
>>>>>>>>> b) must_eq (a, b)   -> a == b
>>>>>>>>>    may_ne (a, b)    -> a != b
>>>>>>>>>
>>>>>>>>>    which has the weird property that a can be equal to b when a != b
>>>>>>>>
>>>>>>>> Yes, a) was the one I had in mind, i.e. the traditional operators are
>>>>>>>> the must
>>>>>>>> variants and you use an outer ! in order to express the may.  Of
>>>>>>>> course this
>>>>>>>> would require a bit of discipline but, on the other hand, if most of
>>>>>>>> the cases
>>>>>>>> fall in the must category, that could be less ugly.
>>>>>>>
>>>>>>> I just think that discipline is going to be hard to maintain in practice,
>>>>>>> since it's so natural to assume (a == b || a != b) == true.  With the
>>>>>>> may/must approach, static type checking forces the issue.
>>>>>>>
>>>>>>>>> Sorry about that.  It's the best I could come up with without losing
>>>>>>>>> the may/must distinction.
>>>>>>>>
>>>>>>>> Which variant is known_zero though?  Must or may?
>>>>>>>
>>>>>>> must.  maybe_nonzero is the may version.
>>>>>>
>>>>>> Can you rename known_zero to must_be_zero then?
>>>>>
>>>>> That'd be OK with me.
>>>>>
>>>>> Another alternative I wondered about was must_eq_0 / may_ne_0.
>>>>>
>>>>>> What's wrong with must_eq (X, 0) / may_eq (X, 0) btw?
>>>>>
>>>>> must_eq (X, 0) generated a warning if X is unsigned, so sometimes you'd
>>>>> need must_eq (X, 0) and sometimes must_eq (X, 0U).
>>>>
>>>> Is that because they are templates?  Maybe providing a partial specialization
>>>> would help?
>>>
>>> I don't think it's templates specifically.  We end up with something like:
>>>
>>>   int f (unsigned int x, const int y)
>>>   {
>>>     return x != y;
>>>   }
>>>
>>>   int g (unsigned int x) { return f (x, 0); }
>>>
>>> which generates a warning too.
>>>
>>>> I'd be fine with must_eq_p and may_eq_0.
>>>
>>> OK, I'll switch to that if there are no objections.
>>
>> Hum.  But then we still warn for must_eq_p (x, 1), no?
>
> Yeah.  The patch also had a known_one and known_all_ones for
> those two (fairly) common cases.  For other values the patches
> just add "U" where necessary.
>
> If you think it would be better to use U consistently and not
> have the helpers, then I'm happy to do that instead.
>
>> So why does
>>
>>   int f (unsigned int x)
>>   {
>>      return x != 0;
>>   }
>>
>> not warn?  Probably because of promotion of the arg.
>
> [Jakub's already answered this part.]
>
>> Shouldn't we then simply never have a may/must_*_p (T1, T2)
>> with T1 and T2 being not compatible?  That is, force promotion
>> rules on them with template magic?
>
> This was what I meant by:
>
>   Or we could suppress warnings by forcibly converting the input.
>   Sometimes the warnings are useful though.
>
> We already do this kind of conversion for arithmetic, to ensure
> that poly_uint16 + poly_uint16 -> poly_int64 promotes before the
> addition rather than after it.  But it defeats the point of the
> comparison warning, which is that you're potentially redefining
> the sign bit.
>
> I think the warning's just as valuable for may/must comparison of
> non-literals as it is for normal comparison operators.  It's just
> unfortunate that we no longer get the special handling of literals.

Ok, I see.

I think I have a slight preference for using 0U consistently but I haven't
looked at too many patches yet to see how common/ugly that would be.

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-10-23 16:58 ` [001/nnn] poly_int: add poly-int.h Richard Sandiford
@ 2017-10-25 16:17   ` Martin Sebor
  2017-11-08  9:44     ` Richard Sandiford
  2017-11-08 10:03   ` Richard Sandiford
  1 sibling, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-10-25 16:17 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 10:57 AM, Richard Sandiford wrote:
> This patch adds a new "poly_int" class to represent polynomial integers
> of the form:
>
>   C0 + C1*X1 + C2*X2 ... + Cn*Xn
>
> It also adds poly_int-based typedefs for offsets and sizes of various
> precisions.  In these typedefs, the Ci coefficients are compile-time
> constants and the Xi indeterminates are run-time invariants.  The number
> of coefficients is controlled by the target and is initially 1 for all
> ports.
>
> Most routines can handle general coefficient counts, but for now a few
> are specific to one or two coefficients.  Support for other coefficient
> counts can be added when needed.
>
> The patch also adds a new macro, IN_TARGET_CODE, that can be
> set to indicate that a TU contains target-specific rather than
> target-independent code.  When this macro is set and the number of
> coefficients is 1, the poly-int.h classes define a conversion operator
> to a constant.  This allows most existing target code to work without
> modification.  The main exceptions are:
>
> - values passed through ..., which need an explicit conversion to a
>   constant
>
> - ?: expression in which one arm ends up being a polynomial and the
>   other remains a constant.  In these cases it would be valid to convert
>   the constant to a polynomial and the polynomial to a constant, so a
>   cast is needed to break the ambiguity.
>
> The patch also adds a new target hook to return the estimated
> value of a polynomial for costing purposes.
>
> The patch also adds operator<< on wide_ints (it was already defined
> for offset_int and widest_int).  I think this was originally excluded
> because >> is ambiguous for wide_int, but << is useful for converting
> bytes to bits, etc., so is worth defining on its own.  The patch also
> adds operator% and operator/ for offset_int and widest_int, since those
> types are always signed.  These changes allow the poly_int interface to
> be more predictable.
>
> I'd originally tried adding the tests as selftests, but that ended up
> bloating cc1 by at least a third.  It also took a while to build them
> at -O2.  The patch therefore uses plugin tests instead, where we can
> force the tests to be built at -O0.  They still run in negligible time
> when built that way.
>
>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
>
> gcc/
> 	* poly-int.h: New file.
> 	* poly-int-types.h: Likewise.
> 	* coretypes.h: Include them.
> 	(POLY_INT_CONVERSION): Define.
> 	* target.def (estimated_poly_value): New hook.
> 	* doc/tm.texi.in (TARGET_ESTIMATED_POLY_VALUE): New hook.
> 	* doc/tm.texi: Regenerate.
> 	* doc/poly-int.texi: New file.
> 	* doc/gccint.texi: Include it.
> 	* doc/rtl.texi: Describe restrictions on subreg modes.
> 	* Makefile.in (TEXI_GCCINT_FILES): Add poly-int.texi.
> 	* genmodes.c (NUM_POLY_INT_COEFFS): Provide a default definition.
> 	(emit_insn_modes_h): Emit a definition of NUM_POLY_INT_COEFFS.
> 	* targhooks.h (default_estimated_poly_value): Declare.
> 	* targhooks.c (default_estimated_poly_value): New function.
> 	* target.h (estimated_poly_value): Likewise.
> 	* wide-int.h (WI_UNARY_RESULT): Use wi::binary_traits.
> 	(wi::unary_traits): Delete.
> 	(wi::binary_traits::signed_shift_result_type): Define for
> 	offset_int << HOST_WIDE_INT, etc.
> 	(generic_wide_int::operator <<=): Define for all types and use
> 	wi::lshift instead of <<.
> 	(wi::hwi_with_prec): Add a default constructor.
> 	(wi::ints_for): New class.
> 	(operator <<): Define for all wide-int types.
> 	(operator /): New function.
> 	(operator %): Likewise.
> 	* selftest.h (ASSERT_MUST_EQ, ASSERT_MUST_EQ_AT, ASSERT_MAY_NE)
> 	(ASSERT_MAY_NE_AT): New macros.
>
> gcc/testsuite/
> 	* gcc.dg/plugin/poly-int-tests.h,
> 	gcc.dg/plugin/poly-int-test-1.c,
> 	gcc.dg/plugin/poly-int-01_plugin.c,
> 	gcc.dg/plugin/poly-int-02_plugin.c,
> 	gcc.dg/plugin/poly-int-03_plugin.c,
> 	gcc.dg/plugin/poly-int-04_plugin.c,
> 	gcc.dg/plugin/poly-int-05_plugin.c,
> 	gcc.dg/plugin/poly-int-06_plugin.c,
> 	gcc.dg/plugin/poly-int-07_plugin.c: New tests.
> 	* gcc.dg/plugin/plugin.exp: Run them.

I haven't done nearly a thorough review but the dtor followed by
the placement new in the POLY_SET_COEFF() macro caught my eye so
I thought I'd ask sooner rather than later.  Given the macro
definition:

+   The dummy comparison against a null C * is just a way of checking
+   that C gives the right type.  */
+#define POLY_SET_COEFF(C, RES, I, VALUE) \
+  ((void) (&(RES).coeffs[0] == (C *) 0), \
+   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
+   ? (void) ((RES).coeffs[I] = VALUE) \
+   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))

is the following use well-defined?

+template<unsigned int N, typename C>
+inline poly_int_pod<N, C>&
+poly_int_pod<N, C>::operator <<= (unsigned int a)
+{
+  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);

It looks to me as though the VALUE argument in the ctor invoked
by the placement new expression is evaluated after the dtor has
destroyed the very array element the VALUE argument expands to.

Or am I misreading the code?

Whether or not is, in fact, a problem, it seems to me that using
a function template rather than a macro would be a clearer and
safer way to do the same thing.  (Safer in that the macro also
evaluates its arguments multiple times, which is often a source
of subtle bugs.)

Other than that, I would suggest changing 't' to something a bit
less terse, like perhaps 'type' in traits like the following:

+struct if_lossless;
+template<typename T1, typename T2, typename T3>
+struct if_lossless<T1, T2, T3, true>
+{
+  typedef T3 t;
+};

Lastly (for now), I note that the default ply_int ctor, like
that of the other xxx_int types, is a no-op.  That makes using
all these types error prone, e.g., as arrays in ctor-initializer
lists:

   struct Pair {
     poly_int<...> poly[2];

     Pair (): poly () { }   // poly[] (unexpectedly) uninitialized
   }

Martin

PS My initial interest in this class was to to see if the it is
less prone to error than wide_int and offset_int.  Specifically,
if it's easier to convert the various flavors of xxx_int among
one another and between basic integers and the xxx_ints.  But
after reading the documentation I have the impression it might
help with some of the range work I've been doing recently, so
I'll try to do a more thorough review in the (hopefully) near
future.

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-23 17:02 ` [006/nnn] poly_int: tree constants Richard Sandiford
@ 2017-10-25 17:14   ` Martin Sebor
  2017-10-25 21:35     ` Richard Sandiford
  2017-11-17  4:51   ` Jeff Law
  1 sibling, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-10-25 17:14 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:00 AM, Richard Sandiford wrote:
> +#if NUM_POLY_INT_COEFFS == 1
> +extern inline __attribute__ ((__gnu_inline__)) poly_int64
> +tree_to_poly_int64 (const_tree t)

I'm curious about the extern inline and __gnu_inline__ here and
not in poly_int_tree_p below.  Am I correct in assuming that
the combination is a holdover from the days when GCC was compiled
using a C compiler, and that the way to write the same definition
in C++ 98 is simply:

   inline poly_int64
   tree_to_poly_int64 (const_tree t)

> +{
> +  gcc_assert (tree_fits_poly_int64_p (t));
> +  return TREE_INT_CST_LOW (t);
> +}

If yes, I would suggest to use the C++ form (and at some point,
changing the existing uses of the GCC/C idiom to the C++ form
as well).

Otherwise, if something requires the use of the C form I would
suggest to add a brief comment explaining it.

...
> +
> +inline bool
> +poly_int_tree_p (const_tree t, poly_int64_pod *value)
> +{
...

>  /* The tree and const_tree overload templates.   */
>  namespace wi
>  {
> +  class unextended_tree
> +  {
> +  private:
> +    const_tree m_t;
> +
> +  public:
> +    unextended_tree () {}

Defining no-op ctors is quite dangerous and error-prone.  I suggest
to instead default initialize the member(s):

   unextended_tree (): m_t () {}

Ditto everywhere else, such as in:

...
>    template <int N>
>    class extended_tree
>    {
> @@ -5139,11 +5225,13 @@ extern bool anon_aggrname_p (const_tree)
>      const_tree m_t;
>
>    public:
> +    extended_tree () {}
>      extended_tree (const_tree);
...
> Index: gcc/tree.c
> ===================================================================
...
> +
> +/* Return true if T holds a polynomial pointer difference, storing it in
> +   *VALUE if so.  A true return means that T's precision is no greater
> +   than 64 bits, which is the largest address space we support, so *VALUE
> +   never loses precision.  However, the signedness of the result is
> +   somewhat arbitrary, since if B lives near the end of a 64-bit address
> +   range and A lives near the beginning, B - A is a large positive value
> +   outside the range of int64_t.  A - B is likewise a large negative value
> +   outside the range of int64_t.  All the pointer difference really
> +   gives is a raw pointer-sized bitstring that can be added to the first
> +   pointer value to get the second.  */

I'm not sure I understand the comment about the sign correctly, but
if I do, I don't think it's correct.

Because their difference wouldn't representable in any basic integer
type (i.,e., in ptrdiff_t) the pointers described above could never
point to the same object (or array), and so their difference is not
meaningful.  C/C++ only define the semantics of a difference between
pointers to the same object.  That restricts the size of the largest
possible object typically to SIZE_MAX / 2, or at most SIZE_MAX on
the handful of targets where ptrdiff_t has greater precision than
size_t.  But even on those targets, the difference between any two
pointers to the same object must be representable in ptrdiff_t,
including the sign.

> +bool
> +ptrdiff_tree_p (const_tree t, poly_int64_pod *value)
> +{

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-25 17:14   ` Martin Sebor
@ 2017-10-25 21:35     ` Richard Sandiford
  2017-10-26  5:52       ` Martin Sebor
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-25 21:35 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches

Martin Sebor <msebor@gmail.com> writes:
> On 10/23/2017 11:00 AM, Richard Sandiford wrote:
>> +#if NUM_POLY_INT_COEFFS == 1
>> +extern inline __attribute__ ((__gnu_inline__)) poly_int64
>> +tree_to_poly_int64 (const_tree t)
>
> I'm curious about the extern inline and __gnu_inline__ here and
> not in poly_int_tree_p below.  Am I correct in assuming that
> the combination is a holdover from the days when GCC was compiled
> using a C compiler, and that the way to write the same definition
> in C++ 98 is simply:
>
>    inline poly_int64
>    tree_to_poly_int64 (const_tree t)
>
>> +{
>> +  gcc_assert (tree_fits_poly_int64_p (t));
>> +  return TREE_INT_CST_LOW (t);
>> +}
>
> If yes, I would suggest to use the C++ form (and at some point,
> changing the existing uses of the GCC/C idiom to the C++ form
> as well).
>
> Otherwise, if something requires the use of the C form I would
> suggest to add a brief comment explaining it.

You probably saw that this is based on tree_to_[su]hwi.  AIUI the
differences from plain C++ inline are that:

a) with __gnu_inline__, an out-of-line definition must still exist.
   That fits this use case well, because the inline is conditional on
   the #ifdef and tree.c has an out-of-line definition either way.
   If we used normal inline, we'd need to add extra #ifs to tree.c
   as well, to avoid multiple definitions.

b) __gnu_inline__ has the strength of __always_inline__, but without the
   correctness implications if inlining is impossible for any reason.
   I did try normal inline first, but it wasn't strong enough.  The
   compiler ended up measurably faster if I copied the tree_to_[su]hwi
   approach.

> ...
>> +
>> +inline bool
>> +poly_int_tree_p (const_tree t, poly_int64_pod *value)
>> +{
> ...

[This one is unconditionally inline.]

>>  /* The tree and const_tree overload templates.   */
>>  namespace wi
>>  {
>> +  class unextended_tree
>> +  {
>> +  private:
>> +    const_tree m_t;
>> +
>> +  public:
>> +    unextended_tree () {}
>
> Defining no-op ctors is quite dangerous and error-prone.  I suggest
> to instead default initialize the member(s):
>
>    unextended_tree (): m_t () {}
>
> Ditto everywhere else, such as in:

This is really performance-senesitive code though, so I don't think
we want to add any unnecessary initialisation.  Primitive types are
uninitalised by default too, and the point of this class is to
provide an integer-like interface.

In your other message you used the example of explicit default
initialisation, such as:

class foo
{
  foo () : x () {}
  unextended_tree x;
};

But I think we should strongly discourage that kind of thing.
If someone wants to initialise x to a particular value, like
integer_zero_node, then it would be better to do it explicitly.
If they don't care what the initial value is, then for these
integer-mimicing classes, uninitialised is as good as anything
else. :-)

Note that for this class NULL_TREE is not a safe default value.
The same goes for the wide-int.h classes, where a length or precision
of 0 is undefined and isn't necessarily going to be handled gracefully
or predictably.

>>    template <int N>
>>    class extended_tree
>>    {
>> @@ -5139,11 +5225,13 @@ extern bool anon_aggrname_p (const_tree)
>>      const_tree m_t;
>>
>>    public:
>> +    extended_tree () {}
>>      extended_tree (const_tree);
> ...
>> Index: gcc/tree.c
>> ===================================================================
> ...
>> +
>> +/* Return true if T holds a polynomial pointer difference, storing it in
>> +   *VALUE if so.  A true return means that T's precision is no greater
>> +   than 64 bits, which is the largest address space we support, so *VALUE
>> +   never loses precision.  However, the signedness of the result is
>> +   somewhat arbitrary, since if B lives near the end of a 64-bit address
>> +   range and A lives near the beginning, B - A is a large positive value
>> +   outside the range of int64_t.  A - B is likewise a large negative value
>> +   outside the range of int64_t.  All the pointer difference really
>> +   gives is a raw pointer-sized bitstring that can be added to the first
>> +   pointer value to get the second.  */
>
> I'm not sure I understand the comment about the sign correctly, but
> if I do, I don't think it's correct.
>
> Because their difference wouldn't representable in any basic integer
> type (i.,e., in ptrdiff_t) the pointers described above could never
> point to the same object (or array), and so their difference is not
> meaningful.  C/C++ only define the semantics of a difference between
> pointers to the same object.  That restricts the size of the largest
> possible object typically to SIZE_MAX / 2, or at most SIZE_MAX on
> the handful of targets where ptrdiff_t has greater precision than
> size_t.  But even on those targets, the difference between any two
> pointers to the same object must be representable in ptrdiff_t,
> including the sign.

But does that apply even when no pointer difference of that size
occurs in the original source?  I.e., is:

  char *x = malloc (0x80000001)

undefined in itself on 32-bit targets?  Does it become undefined after:

  for (unsigned int i = 0; i < 0x80000001; ++i)
    x[i++] = 0;

where no large pointer difference is calculated?  But I realise
gcc's support for this kind of thing is limited, and that we do
try to emit a diagnostic for obvious instances...

In the (two) places that need this -- both conversions from
cst_and_fits_in_hwi -- the immediate problem is that the sign
of the type doesn't necessarily match the logical sign of the
difference.  E.g. a negative offset can be represented as a large
unsigned value of sizetype.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-25 21:35     ` Richard Sandiford
@ 2017-10-26  5:52       ` Martin Sebor
  2017-10-26  8:40         ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-10-26  5:52 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/25/2017 03:31 PM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>> On 10/23/2017 11:00 AM, Richard Sandiford wrote:
>>> +#if NUM_POLY_INT_COEFFS == 1
>>> +extern inline __attribute__ ((__gnu_inline__)) poly_int64
>>> +tree_to_poly_int64 (const_tree t)
>>
>> I'm curious about the extern inline and __gnu_inline__ here and
>> not in poly_int_tree_p below.  Am I correct in assuming that
>> the combination is a holdover from the days when GCC was compiled
>> using a C compiler, and that the way to write the same definition
>> in C++ 98 is simply:
>>
>>    inline poly_int64
>>    tree_to_poly_int64 (const_tree t)
>>
>>> +{
>>> +  gcc_assert (tree_fits_poly_int64_p (t));
>>> +  return TREE_INT_CST_LOW (t);
>>> +}
>>
>> If yes, I would suggest to use the C++ form (and at some point,
>> changing the existing uses of the GCC/C idiom to the C++ form
>> as well).
>>
>> Otherwise, if something requires the use of the C form I would
>> suggest to add a brief comment explaining it.
>
> You probably saw that this is based on tree_to_[su]hwi.  AIUI the
> differences from plain C++ inline are that:
>
> a) with __gnu_inline__, an out-of-line definition must still exist.
>    That fits this use case well, because the inline is conditional on
>    the #ifdef and tree.c has an out-of-line definition either way.
>    If we used normal inline, we'd need to add extra #ifs to tree.c
>    as well, to avoid multiple definitions.
>
> b) __gnu_inline__ has the strength of __always_inline__, but without the
>    correctness implications if inlining is impossible for any reason.
>    I did try normal inline first, but it wasn't strong enough.  The
>    compiler ended up measurably faster if I copied the tree_to_[su]hwi
>    approach.

Thanks for the clarification.  I'm not sure I fully understand
it but I'm happy to take your word for it that's necessary.  I
would just recommend adding a brief comment to this effect since
it isn't obvious.

>>> +
>>> +inline bool
>>> +poly_int_tree_p (const_tree t, poly_int64_pod *value)
>>> +{
>> ...
>
> [This one is unconditionally inline.]
>
>>>  /* The tree and const_tree overload templates.   */
>>>  namespace wi
>>>  {
>>> +  class unextended_tree
>>> +  {
>>> +  private:
>>> +    const_tree m_t;
>>> +
>>> +  public:
>>> +    unextended_tree () {}
>>
>> Defining no-op ctors is quite dangerous and error-prone.  I suggest
>> to instead default initialize the member(s):
>>
>>    unextended_tree (): m_t () {}
>>
>> Ditto everywhere else, such as in:
>
> This is really performance-senesitive code though, so I don't think
> we want to add any unnecessary initialisation.  Primitive types are
> uninitalised by default too, and the point of this class is to
> provide an integer-like interface.

I understand the performance concern (more on that below), but
to clarify the usability issues,  I don't think the analogy with
primitive types is quite fitting here: int() evaluates to zero,
as do the values of i and a[0] and a[1] after an object of type
S is constructed using its default ctor, i.e., S ():

   struct S {
     int i;
     int a[2];

     S (): i (), a () { }
   };

With the new (and some existing) classes that's not so, and it
makes them harder and more error-prone to use (I just recently
learned this the hard way about offset_int and the debugging
experience is still fresh in my memory).

When the cor is inline and the initialization unnecessary then
GCC will in most instances eliminate it, so I also don't think
the suggested change would have a significant impact on
the efficiency of optimized code, but...

...if it is thought essential to provide a no-op ctor, I would
suggest to consider making its property explicit, e.g., like so:

   struct unextended_tree {

     struct Uninit { };

     // ...
     unextended_tree (Uninit) { /* no initialization */ }
     // ...
   };

This way the programmer has to explicitly opt in to using the
unsafe ctor.  (This ctor is suitable for single objects, not
arrays of such things, but presumably that would be sufficient.
If not, there are tricks to make that work too.)

> In your other message you used the example of explicit default
> initialisation, such as:
>
> class foo
> {
>   foo () : x () {}
>   unextended_tree x;
> };
>
> But I think we should strongly discourage that kind of thing.
> If someone wants to initialise x to a particular value, like
> integer_zero_node, then it would be better to do it explicitly.
> If they don't care what the initial value is, then for these
> integer-mimicing classes, uninitialised is as good as anything
> else. :-)

Efficiency is certainly important, but it doesn't have to come
at the expense of usability or correctness.  I think it's possible
(and important) to design interfaces that are usable safely and
intuitively, and difficult to misuse, while also accommodating
advanced efficient use cases.

> Note that for this class NULL_TREE is not a safe default value.
> The same goes for the wide-int.h classes, where a length or precision
> of 0 is undefined and isn't necessarily going to be handled gracefully
> or predictably.

For offset_int both precision and length are known so I think
it would make sense to have the default ctor value-initialize
the object.  For wide_int, it seems to me that choosing some
default precision and length in the default ctor would still
be preferable to leaving the members indeterminate.  (That
functionality could still be provided by some other ctor as
I suggested above).

>>>    template <int N>
>>>    class extended_tree
>>>    {
>>> @@ -5139,11 +5225,13 @@ extern bool anon_aggrname_p (const_tree)
>>>      const_tree m_t;
>>>
>>>    public:
>>> +    extended_tree () {}
>>>      extended_tree (const_tree);
>> ...
>>> Index: gcc/tree.c
>>> ===================================================================
>> ...
>>> +
>>> +/* Return true if T holds a polynomial pointer difference, storing it in
>>> +   *VALUE if so.  A true return means that T's precision is no greater
>>> +   than 64 bits, which is the largest address space we support, so *VALUE
>>> +   never loses precision.  However, the signedness of the result is
>>> +   somewhat arbitrary, since if B lives near the end of a 64-bit address
>>> +   range and A lives near the beginning, B - A is a large positive value
>>> +   outside the range of int64_t.  A - B is likewise a large negative value
>>> +   outside the range of int64_t.  All the pointer difference really
>>> +   gives is a raw pointer-sized bitstring that can be added to the first
>>> +   pointer value to get the second.  */
>>
>> I'm not sure I understand the comment about the sign correctly, but
>> if I do, I don't think it's correct.
>>
>> Because their difference wouldn't representable in any basic integer
>> type (i.,e., in ptrdiff_t) the pointers described above could never
>> point to the same object (or array), and so their difference is not
>> meaningful.  C/C++ only define the semantics of a difference between
>> pointers to the same object.  That restricts the size of the largest
>> possible object typically to SIZE_MAX / 2, or at most SIZE_MAX on
>> the handful of targets where ptrdiff_t has greater precision than
>> size_t.  But even on those targets, the difference between any two
>> pointers to the same object must be representable in ptrdiff_t,
>> including the sign.
>
> But does that apply even when no pointer difference of that size
> occurs in the original source?  I.e., is:
>
>   char *x = malloc (0x80000001)
>
> undefined in itself on 32-bit targets?

No, the call itself isn't undefined, but it shouldn't succeed
on a conforming implementation where ptrdiff_t is a 32-bit type
(which is why GCC diagnoses it).  If the call were to succeed
then  pointers to the allocated object would fail to meet the
C requirements on additive operators.

> Does it become undefined after:
>
>   for (unsigned int i = 0; i < 0x80000001; ++i)
>     x[i++] = 0;
>
> where no large pointer difference is calculated?  But I realise
> gcc's support for this kind of thing is limited, and that we do
> try to emit a diagnostic for obvious instances...

Yes, this is undefined, both in C (unless ptrdiff_t is wider
than 32 bits) and in GCC, because x[0x80000000] doesn't refer
to the 2147483648-th element of x.

> In the (two) places that need this -- both conversions from
> cst_and_fits_in_hwi -- the immediate problem is that the sign
> of the type doesn't necessarily match the logical sign of the
> difference.  E.g. a negative offset can be represented as a large
> unsigned value of sizetype.

I only meant to suggest that the comment be reworded so as
not to imply that such pointers (that are farther apart than
PTRDIFF_MAX) can point to the same object and be subtracted.

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-26  5:52       ` Martin Sebor
@ 2017-10-26  8:40         ` Richard Sandiford
  2017-10-26 16:45           ` Martin Sebor
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-26  8:40 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches

Martin Sebor <msebor@gmail.com> writes:
> On 10/25/2017 03:31 PM, Richard Sandiford wrote:
>> Martin Sebor <msebor@gmail.com> writes:
>>> On 10/23/2017 11:00 AM, Richard Sandiford wrote:
>>>> +#if NUM_POLY_INT_COEFFS == 1
>>>> +extern inline __attribute__ ((__gnu_inline__)) poly_int64
>>>> +tree_to_poly_int64 (const_tree t)
>>>
>>> I'm curious about the extern inline and __gnu_inline__ here and
>>> not in poly_int_tree_p below.  Am I correct in assuming that
>>> the combination is a holdover from the days when GCC was compiled
>>> using a C compiler, and that the way to write the same definition
>>> in C++ 98 is simply:
>>>
>>>    inline poly_int64
>>>    tree_to_poly_int64 (const_tree t)
>>>
>>>> +{
>>>> +  gcc_assert (tree_fits_poly_int64_p (t));
>>>> +  return TREE_INT_CST_LOW (t);
>>>> +}
>>>
>>> If yes, I would suggest to use the C++ form (and at some point,
>>> changing the existing uses of the GCC/C idiom to the C++ form
>>> as well).
>>>
>>> Otherwise, if something requires the use of the C form I would
>>> suggest to add a brief comment explaining it.
>>
>> You probably saw that this is based on tree_to_[su]hwi.  AIUI the
>> differences from plain C++ inline are that:
>>
>> a) with __gnu_inline__, an out-of-line definition must still exist.
>>    That fits this use case well, because the inline is conditional on
>>    the #ifdef and tree.c has an out-of-line definition either way.
>>    If we used normal inline, we'd need to add extra #ifs to tree.c
>>    as well, to avoid multiple definitions.
>>
>> b) __gnu_inline__ has the strength of __always_inline__, but without the
>>    correctness implications if inlining is impossible for any reason.
>>    I did try normal inline first, but it wasn't strong enough.  The
>>    compiler ended up measurably faster if I copied the tree_to_[su]hwi
>>    approach.
>
> Thanks for the clarification.  I'm not sure I fully understand
> it but I'm happy to take your word for it that's necessary.  I
> would just recommend adding a brief comment to this effect since
> it isn't obvious.
>
>>>> +
>>>> +inline bool
>>>> +poly_int_tree_p (const_tree t, poly_int64_pod *value)
>>>> +{
>>> ...
>>
>> [This one is unconditionally inline.]
>>
>>>>  /* The tree and const_tree overload templates.   */
>>>>  namespace wi
>>>>  {
>>>> +  class unextended_tree
>>>> +  {
>>>> +  private:
>>>> +    const_tree m_t;
>>>> +
>>>> +  public:
>>>> +    unextended_tree () {}
>>>
>>> Defining no-op ctors is quite dangerous and error-prone.  I suggest
>>> to instead default initialize the member(s):
>>>
>>>    unextended_tree (): m_t () {}
>>>
>>> Ditto everywhere else, such as in:
>>
>> This is really performance-senesitive code though, so I don't think
>> we want to add any unnecessary initialisation.  Primitive types are
>> uninitalised by default too, and the point of this class is to
>> provide an integer-like interface.
>
> I understand the performance concern (more on that below), but
> to clarify the usability issues,  I don't think the analogy with
> primitive types is quite fitting here: int() evaluates to zero,
> as do the values of i and a[0] and a[1] after an object of type
> S is constructed using its default ctor, i.e., S ():
>
>    struct S {
>      int i;
>      int a[2];
>
>      S (): i (), a () { }
>    };

Sure, I realise that.  I meant that:

  int x;

doesn't initialise x to zero.  So it's a question of which case is the
most motivating one: using "x ()" to initialise x to 0 in a constructor
or "int x;" to declare a variable of type x, uninitialised.  I think the
latter use case is much more common (at least in GCC).  Rearranging
things, I said later:

>> In your other message you used the example of explicit default
>> initialisation, such as:
>>
>> class foo
>> {
>>   foo () : x () {}
>>   unextended_tree x;
>> };
>>
>> But I think we should strongly discourage that kind of thing.
>> If someone wants to initialise x to a particular value, like
>> integer_zero_node, then it would be better to do it explicitly.
>> If they don't care what the initial value is, then for these
>> integer-mimicing classes, uninitialised is as good as anything
>> else. :-)

What I meant was: if you want to initialise "i" to 1 in your example,
you'd have to write "i (1)".  Being able to write "i ()" instead of
"i (0)" saves one character but I don't think it adds much clarity.
Explicitly initialising something only seems worthwhile if you say
what you're initialising it to.

> With the new (and some existing) classes that's not so, and it
> makes them harder and more error-prone to use (I just recently
> learned this the hard way about offset_int and the debugging
> experience is still fresh in my memory).

Sorry about the bad experience.  But that kind of thing cuts
both ways.  If I write:

poly_int64
foo (void)
{
  poly_int64 x;
  x += 2;
  return x;
}

then I get a warning about x being used uninitialised, without
having had to run anything.  If we add default initialisation
then this becomes something that has to be debugged against
a particular test case, i.e. we've stopped the compiler from
giving us useful static analysis.

> When the cor is inline and the initialization unnecessary then
> GCC will in most instances eliminate it, so I also don't think
> the suggested change would have a significant impact on
> the efficiency of optimized code, but...
>
> ...if it is thought essential to provide a no-op ctor, I would
> suggest to consider making its property explicit, e.g., like so:
>
>    struct unextended_tree {
>
>      struct Uninit { };
>
>      // ...
>      unextended_tree (Uninit) { /* no initialization */ }
>      // ...
>    };
>
> This way the programmer has to explicitly opt in to using the
> unsafe ctor.  (This ctor is suitable for single objects, not
> arrays of such things, but presumably that would be sufficient.
> If not, there are tricks to make that work too.)

The default constructors for unextended_tree and extended_tree
are only there for the array case (in poly-int.h).

Part of the problem here is that we still have to live by C++03
POD rules.  If we moved to C++11, the need for the poly_int_pod/
poly_int split would go away and things would probably be much
simpler. :-)

[...]

>> Note that for this class NULL_TREE is not a safe default value.
>> The same goes for the wide-int.h classes, where a length or precision
>> of 0 is undefined and isn't necessarily going to be handled gracefully
>> or predictably.
>
> For offset_int both precision and length are known so I think
> it would make sense to have the default ctor value-initialize
> the object.  For wide_int, it seems to me that choosing some
> default precision and length in the default ctor would still
> be preferable to leaving the members indeterminate.  (That
> functionality could still be provided by some other ctor as
> I suggested above).

But which precision though?  If we pick a commonly-used one
then we make a missing initialisation bug very data-dependent.
Even if we pick a rarely-used one, we create a bug in which
the wide_int has the wrong precision even though all assignments
to it "obviously" have the right precision.

>>>>    template <int N>
>>>>    class extended_tree
>>>>    {
>>>> @@ -5139,11 +5225,13 @@ extern bool anon_aggrname_p (const_tree)
>>>>      const_tree m_t;
>>>>
>>>>    public:
>>>> +    extended_tree () {}
>>>>      extended_tree (const_tree);
>>> ...
>>>> Index: gcc/tree.c
>>>> ===================================================================
>>> ...
>>>> +
>>>> +/* Return true if T holds a polynomial pointer difference, storing it in
>>>> +   *VALUE if so.  A true return means that T's precision is no greater
>>>> +   than 64 bits, which is the largest address space we support, so *VALUE
>>>> +   never loses precision.  However, the signedness of the result is
>>>> +   somewhat arbitrary, since if B lives near the end of a 64-bit address
>>>> +   range and A lives near the beginning, B - A is a large positive value
>>>> +   outside the range of int64_t.  A - B is likewise a large negative value
>>>> +   outside the range of int64_t.  All the pointer difference really
>>>> +   gives is a raw pointer-sized bitstring that can be added to the first
>>>> +   pointer value to get the second.  */
>>>
>>> I'm not sure I understand the comment about the sign correctly, but
>>> if I do, I don't think it's correct.
>>>
>>> Because their difference wouldn't representable in any basic integer
>>> type (i.,e., in ptrdiff_t) the pointers described above could never
>>> point to the same object (or array), and so their difference is not
>>> meaningful.  C/C++ only define the semantics of a difference between
>>> pointers to the same object.  That restricts the size of the largest
>>> possible object typically to SIZE_MAX / 2, or at most SIZE_MAX on
>>> the handful of targets where ptrdiff_t has greater precision than
>>> size_t.  But even on those targets, the difference between any two
>>> pointers to the same object must be representable in ptrdiff_t,
>>> including the sign.
>>
>> But does that apply even when no pointer difference of that size
>> occurs in the original source?  I.e., is:
>>
>>   char *x = malloc (0x80000001)
>>
>> undefined in itself on 32-bit targets?
>
> No, the call itself isn't undefined, but it shouldn't succeed
> on a conforming implementation where ptrdiff_t is a 32-bit type
> (which is why GCC diagnoses it).  If the call were to succeed
> then  pointers to the allocated object would fail to meet the
> C requirements on additive operators.
>
>> Does it become undefined after:
>>
>>   for (unsigned int i = 0; i < 0x80000001; ++i)
>>     x[i++] = 0;
>>
>> where no large pointer difference is calculated?  But I realise
>> gcc's support for this kind of thing is limited, and that we do
>> try to emit a diagnostic for obvious instances...
>
> Yes, this is undefined, both in C (unless ptrdiff_t is wider
> than 32 bits) and in GCC, because x[0x80000000] doesn't refer
> to the 2147483648-th element of x.
>
>> In the (two) places that need this -- both conversions from
>> cst_and_fits_in_hwi -- the immediate problem is that the sign
>> of the type doesn't necessarily match the logical sign of the
>> difference.  E.g. a negative offset can be represented as a large
>> unsigned value of sizetype.
>
> I only meant to suggest that the comment be reworded so as
> not to imply that such pointers (that are farther apart than
> PTRDIFF_MAX) can point to the same object and be subtracted.

OK, how about:

/* Return true if T holds a polynomial pointer difference, storing it in
   *VALUE if so.  A true return means that T's precision is no greater
   than 64 bits, which is the largest address space we support, so *VALUE
   never loses precision.  However, the signedness of the result does
   not necessarily match the signedness of T: sometimes an unsigned type
   like sizetype is used to encode a value that is actually negative.  */

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-26  8:40         ` Richard Sandiford
@ 2017-10-26 16:45           ` Martin Sebor
  2017-10-26 18:05             ` Richard Sandiford
  2017-10-26 18:11             ` Pedro Alves
  0 siblings, 2 replies; 302+ messages in thread
From: Martin Sebor @ 2017-10-26 16:45 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

>>>>>  /* The tree and const_tree overload templates.   */
>>>>>  namespace wi
>>>>>  {
>>>>> +  class unextended_tree
>>>>> +  {
>>>>> +  private:
>>>>> +    const_tree m_t;
>>>>> +
>>>>> +  public:
>>>>> +    unextended_tree () {}
>>>>
>>>> Defining no-op ctors is quite dangerous and error-prone.  I suggest
>>>> to instead default initialize the member(s):
>>>>
>>>>    unextended_tree (): m_t () {}
>>>>
>>>> Ditto everywhere else, such as in:
>>>
>>> This is really performance-senesitive code though, so I don't think
>>> we want to add any unnecessary initialisation.  Primitive types are
>>> uninitalised by default too, and the point of this class is to
>>> provide an integer-like interface.
>>
>> I understand the performance concern (more on that below), but
>> to clarify the usability issues,  I don't think the analogy with
>> primitive types is quite fitting here: int() evaluates to zero,
>> as do the values of i and a[0] and a[1] after an object of type
>> S is constructed using its default ctor, i.e., S ():
>>
>>    struct S {
>>      int i;
>>      int a[2];
>>
>>      S (): i (), a () { }
>>    };
>
> Sure, I realise that.  I meant that:
>
>   int x;
>
> doesn't initialise x to zero.  So it's a question of which case is the
> most motivating one: using "x ()" to initialise x to 0 in a constructor
> or "int x;" to declare a variable of type x, uninitialised.  I think the
> latter use case is much more common (at least in GCC).  Rearranging
> things, I said later:

I agree that the latter use case is more common in GCC, but I don't
see it as a good thing.  GCC was written in C and most code still
uses now outdated C practices such as declaring variables at the top
of a (often long) function, and usually without initializing them.
It's been established that it's far better to declare variables with
the smallest scope, and to initialize them on declaration.  Compilers
are smart enough these days to eliminate redundant initialization or
assignments.

>>> In your other message you used the example of explicit default
>>> initialisation, such as:
>>>
>>> class foo
>>> {
>>>   foo () : x () {}
>>>   unextended_tree x;
>>> };
>>>
>>> But I think we should strongly discourage that kind of thing.
>>> If someone wants to initialise x to a particular value, like
>>> integer_zero_node, then it would be better to do it explicitly.
>>> If they don't care what the initial value is, then for these
>>> integer-mimicing classes, uninitialised is as good as anything
>>> else. :-)
>
> What I meant was: if you want to initialise "i" to 1 in your example,
> you'd have to write "i (1)".  Being able to write "i ()" instead of
> "i (0)" saves one character but I don't think it adds much clarity.
> Explicitly initialising something only seems worthwhile if you say
> what you're initialising it to.

My comment is not motivated by convenience.  What I'm concerned
about is that defining a default ctor to be a no-op defeats the
zero-initialization semantics most users expect of T().

This is particularly concerning for a class designed to behave
like an [improved] basic integer type.  Such a class should act
as closely as possible to the type it emulates and in the least
surprising ways.  Any sort of a deviation that replaces well-
defined behavior with undefined is a gotcha and a bug waiting
to happen.

It's also a concern in generic (template) contexts where T() is
expected to zero-initialize.  A template designed to work with
a fundamental integer type should also work with a user-defined
type designed to behave like an integer.

>> With the new (and some existing) classes that's not so, and it
>> makes them harder and more error-prone to use (I just recently
>> learned this the hard way about offset_int and the debugging
>> experience is still fresh in my memory).
>
> Sorry about the bad experience.  But that kind of thing cuts
> both ways.  If I write:
>
> poly_int64
> foo (void)
> {
>   poly_int64 x;
>   x += 2;
>   return x;
> }
>
> then I get a warning about x being used uninitialised, without
> having had to run anything.  If we add default initialisation
> then this becomes something that has to be debugged against
> a particular test case, i.e. we've stopped the compiler from
> giving us useful static analysis.

With default initialization the code above becomes valid and has
the expected effect of adding 2 to zero.  It's just more robust
than the same code with that uses a basic type instead.  This
seems no more unexpected and no less desirable than the well-
defined semantics of something like:

   std::string x;
   x += "2";
   return x;

or using any other C++ standard library type in a similar way.

(Incidentally, although I haven't tried with poly_int, I get no
warnings for the code above with offset_int or wide_int.)

>> When the cor is inline and the initialization unnecessary then
>> GCC will in most instances eliminate it, so I also don't think
>> the suggested change would have a significant impact on
>> the efficiency of optimized code, but...
>>
>> ...if it is thought essential to provide a no-op ctor, I would
>> suggest to consider making its property explicit, e.g., like so:
>>
>>    struct unextended_tree {
>>
>>      struct Uninit { };
>>
>>      // ...
>>      unextended_tree (Uninit) { /* no initialization */ }
>>      // ...
>>    };
>>
>> This way the programmer has to explicitly opt in to using the
>> unsafe ctor.  (This ctor is suitable for single objects, not
>> arrays of such things, but presumably that would be sufficient.
>> If not, there are tricks to make that work too.)
>
> The default constructors for unextended_tree and extended_tree
> are only there for the array case (in poly-int.h).

My main concern is with the new poly_int classes (and the existing
wide int classes) because I think those are or will be widely used,
far more so than the unextended_tree class (I confess this review
is the first time I've ever noticed it).

> Part of the problem here is that we still have to live by C++03
> POD rules.  If we moved to C++11, the need for the poly_int_pod/
> poly_int split would go away and things would probably be much
> simpler. :-)

Understood.  With the heavy use of templates, template templates,
and partial specialization, the poly_int classes will push older
C++ 98 compilers to their limits.  It seems that for stability's
sake it would make sense to require a more modern compiler.

>>> Note that for this class NULL_TREE is not a safe default value.
>>> The same goes for the wide-int.h classes, where a length or precision
>>> of 0 is undefined and isn't necessarily going to be handled gracefully
>>> or predictably.
>>
>> For offset_int both precision and length are known so I think
>> it would make sense to have the default ctor value-initialize
>> the object.  For wide_int, it seems to me that choosing some
>> default precision and length in the default ctor would still
>> be preferable to leaving the members indeterminate.  (That
>> functionality could still be provided by some other ctor as
>> I suggested above).
>
> But which precision though?  If we pick a commonly-used one
> then we make a missing initialisation bug very data-dependent.
> Even if we pick a rarely-used one, we create a bug in which
> the wide_int has the wrong precision even though all assignments
> to it "obviously" have the right precision.

For offset_int the default precision is 128-bits.  Making that
the default also for wide_int should be unsurprising.

>> I only meant to suggest that the comment be reworded so as
>> not to imply that such pointers (that are farther apart than
>> PTRDIFF_MAX) can point to the same object and be subtracted.
>
> OK, how about:
>
> /* Return true if T holds a polynomial pointer difference, storing it in
>    *VALUE if so.  A true return means that T's precision is no greater
>    than 64 bits, which is the largest address space we support, so *VALUE
>    never loses precision.  However, the signedness of the result does
>    not necessarily match the signedness of T: sometimes an unsigned type
>    like sizetype is used to encode a value that is actually negative.  */

That looks good to me.

Thanks
Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-26 16:45           ` Martin Sebor
@ 2017-10-26 18:05             ` Richard Sandiford
  2017-10-26 23:53               ` Martin Sebor
  2017-10-26 18:11             ` Pedro Alves
  1 sibling, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-26 18:05 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches

Martin Sebor <msebor@gmail.com> writes:
>>>>>>  /* The tree and const_tree overload templates.   */
>>>>>>  namespace wi
>>>>>>  {
>>>>>> +  class unextended_tree
>>>>>> +  {
>>>>>> +  private:
>>>>>> +    const_tree m_t;
>>>>>> +
>>>>>> +  public:
>>>>>> +    unextended_tree () {}
>>>>>
>>>>> Defining no-op ctors is quite dangerous and error-prone.  I suggest
>>>>> to instead default initialize the member(s):
>>>>>
>>>>>    unextended_tree (): m_t () {}
>>>>>
>>>>> Ditto everywhere else, such as in:
>>>>
>>>> This is really performance-senesitive code though, so I don't think
>>>> we want to add any unnecessary initialisation.  Primitive types are
>>>> uninitalised by default too, and the point of this class is to
>>>> provide an integer-like interface.
>>>
>>> I understand the performance concern (more on that below), but
>>> to clarify the usability issues,  I don't think the analogy with
>>> primitive types is quite fitting here: int() evaluates to zero,
>>> as do the values of i and a[0] and a[1] after an object of type
>>> S is constructed using its default ctor, i.e., S ():
>>>
>>>    struct S {
>>>      int i;
>>>      int a[2];
>>>
>>>      S (): i (), a () { }
>>>    };
>>
>> Sure, I realise that.  I meant that:
>>
>>   int x;
>>
>> doesn't initialise x to zero.  So it's a question of which case is the
>> most motivating one: using "x ()" to initialise x to 0 in a constructor
>> or "int x;" to declare a variable of type x, uninitialised.  I think the
>> latter use case is much more common (at least in GCC).  Rearranging
>> things, I said later:
>
> I agree that the latter use case is more common in GCC, but I don't
> see it as a good thing.  GCC was written in C and most code still
> uses now outdated C practices such as declaring variables at the top
> of a (often long) function, and usually without initializing them.
> It's been established that it's far better to declare variables with
> the smallest scope, and to initialize them on declaration.  Compilers
> are smart enough these days to eliminate redundant initialization or
> assignments.
>
>>>> In your other message you used the example of explicit default
>>>> initialisation, such as:
>>>>
>>>> class foo
>>>> {
>>>>   foo () : x () {}
>>>>   unextended_tree x;
>>>> };
>>>>
>>>> But I think we should strongly discourage that kind of thing.
>>>> If someone wants to initialise x to a particular value, like
>>>> integer_zero_node, then it would be better to do it explicitly.
>>>> If they don't care what the initial value is, then for these
>>>> integer-mimicing classes, uninitialised is as good as anything
>>>> else. :-)
>>
>> What I meant was: if you want to initialise "i" to 1 in your example,
>> you'd have to write "i (1)".  Being able to write "i ()" instead of
>> "i (0)" saves one character but I don't think it adds much clarity.
>> Explicitly initialising something only seems worthwhile if you say
>> what you're initialising it to.
>
> My comment is not motivated by convenience.  What I'm concerned
> about is that defining a default ctor to be a no-op defeats the
> zero-initialization semantics most users expect of T().
>
> This is particularly concerning for a class designed to behave
> like an [improved] basic integer type.  Such a class should act
> as closely as possible to the type it emulates and in the least
> surprising ways.  Any sort of a deviation that replaces well-
> defined behavior with undefined is a gotcha and a bug waiting
> to happen.
>
> It's also a concern in generic (template) contexts where T() is
> expected to zero-initialize.  A template designed to work with
> a fundamental integer type should also work with a user-defined
> type designed to behave like an integer.

But that kind of situation is one where using "T (0)" over "T ()"
is useful.  It means that template substitution will succeed for
T that are sufficiently integer-like to have a single well-defined
zero but not for T that aren't (such as wide_int).

>>> With the new (and some existing) classes that's not so, and it
>>> makes them harder and more error-prone to use (I just recently
>>> learned this the hard way about offset_int and the debugging
>>> experience is still fresh in my memory).
>>
>> Sorry about the bad experience.  But that kind of thing cuts
>> both ways.  If I write:
>>
>> poly_int64
>> foo (void)
>> {
>>   poly_int64 x;
>>   x += 2;
>>   return x;
>> }
>>
>> then I get a warning about x being used uninitialised, without
>> having had to run anything.  If we add default initialisation
>> then this becomes something that has to be debugged against
>> a particular test case, i.e. we've stopped the compiler from
>> giving us useful static analysis.
>
> With default initialization the code above becomes valid and has
> the expected effect of adding 2 to zero.  It's just more robust
> than the same code with that uses a basic type instead.  This
> seems no more unexpected and no less desirable than the well-
> defined semantics of something like:
>
>    std::string x;
>    x += "2";
>    return x;
>
> or using any other C++ standard library type in a similar way.
>
> (Incidentally, although I haven't tried with poly_int, I get no
> warnings for the code above with offset_int or wide_int.)
>
>>> When the cor is inline and the initialization unnecessary then
>>> GCC will in most instances eliminate it, so I also don't think
>>> the suggested change would have a significant impact on
>>> the efficiency of optimized code, but...
>>>
>>> ...if it is thought essential to provide a no-op ctor, I would
>>> suggest to consider making its property explicit, e.g., like so:
>>>
>>>    struct unextended_tree {
>>>
>>>      struct Uninit { };
>>>
>>>      // ...
>>>      unextended_tree (Uninit) { /* no initialization */ }
>>>      // ...
>>>    };
>>>
>>> This way the programmer has to explicitly opt in to using the
>>> unsafe ctor.  (This ctor is suitable for single objects, not
>>> arrays of such things, but presumably that would be sufficient.
>>> If not, there are tricks to make that work too.)
>>
>> The default constructors for unextended_tree and extended_tree
>> are only there for the array case (in poly-int.h).
>
> My main concern is with the new poly_int classes (and the existing
> wide int classes) because I think those are or will be widely used,
> far more so than the unextended_tree class (I confess this review
> is the first time I've ever noticed it).
>
>> Part of the problem here is that we still have to live by C++03
>> POD rules.  If we moved to C++11, the need for the poly_int_pod/
>> poly_int split would go away and things would probably be much
>> simpler. :-)
>
> Understood.  With the heavy use of templates, template templates,
> and partial specialization, the poly_int classes will push older
> C++ 98 compilers to their limits.  It seems that for stability's
> sake it would make sense to require a more modern compiler.
>
>>>> Note that for this class NULL_TREE is not a safe default value.
>>>> The same goes for the wide-int.h classes, where a length or precision
>>>> of 0 is undefined and isn't necessarily going to be handled gracefully
>>>> or predictably.
>>>
>>> For offset_int both precision and length are known so I think
>>> it would make sense to have the default ctor value-initialize
>>> the object.  For wide_int, it seems to me that choosing some
>>> default precision and length in the default ctor would still
>>> be preferable to leaving the members indeterminate.  (That
>>> functionality could still be provided by some other ctor as
>>> I suggested above).
>>
>> But which precision though?  If we pick a commonly-used one
>> then we make a missing initialisation bug very data-dependent.
>> Even if we pick a rarely-used one, we create a bug in which
>> the wide_int has the wrong precision even though all assignments
>> to it "obviously" have the right precision.
>
> For offset_int the default precision is 128-bits.  Making that
> the default also for wide_int should be unsurprising.

I think it'd be surprising.  offset_int should always be used in
preference to wide_int if the precision is known to be 128 bits
in advance, and there doesn't seem any reason to prefer the
precision of offset_int over widest_int, HOST_WIDE_INT or int.

We would end up with:

  wide_int
  f (const wide_int &y)
  {
    wide_int x;
    x += y;
    return x;
  }

being valid if y happens to have 128 bits as well, and a runtime error
otherwise.

Also, I think it'd be inconsistent to allow the specific case of 0
to be assigned by default construction, but not also allow:

  wide_int x (0);

  wide_int x;
  x = 0;

  wide_int x;
  x = 1;

etc.  And wide_int wasn't intended for that use case.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-26 16:45           ` Martin Sebor
  2017-10-26 18:05             ` Richard Sandiford
@ 2017-10-26 18:11             ` Pedro Alves
  2017-10-26 19:12               ` Martin Sebor
  1 sibling, 1 reply; 302+ messages in thread
From: Pedro Alves @ 2017-10-26 18:11 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches, richard.sandiford

On 10/26/2017 05:37 PM, Martin Sebor wrote:

> I agree that the latter use case is more common in GCC, but I don't
> see it as a good thing.  GCC was written in C and most code still
> uses now outdated C practices such as declaring variables at the top
> of a (often long) function, and usually without initializing them.
> It's been established that it's far better to declare variables with
> the smallest scope, and to initialize them on declaration.  Compilers
> are smart enough these days to eliminate redundant initialization or
> assignments.

I don't agree that that's established.  FWIW, I'm on the
"prefer the -Wuninitialized" warnings camp.  I've been looking
forward to all the VRP and threader improvements hoping that that
warning (and -Wmaybe-uninitialized...) will improve along.

> My comment is not motivated by convenience.  What I'm concerned
> about is that defining a default ctor to be a no-op defeats the
> zero-initialization semantics most users expect of T().

This sounds like it's a problem because GCC is written in C++98.

You can get the semantics you want in C++11 by defining
the constructor with "= default;" :

 struct T
 {
   T(int); // some other constructor forcing me to 
           // add a default constructor.

   T() = default; // give me default construction using
                  // default initialization.
   int i;
 };

And now 'T t;' leaves T::i default initialized, i.e.,
uninitialized, while T() value-initializes T::i, i.e.,
initializes it to zero.

So if that's a concern, maybe you could use "= default" 
conditionally depending on #if __cplusplus >= C++11, so that
you'd get it for stages after stage1.

Or just start requiring C++11 already. :-)

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-26 18:11             ` Pedro Alves
@ 2017-10-26 19:12               ` Martin Sebor
  2017-10-26 19:19                 ` Pedro Alves
  0 siblings, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-10-26 19:12 UTC (permalink / raw)
  To: Pedro Alves, gcc-patches, richard.sandiford

On 10/26/2017 12:05 PM, Pedro Alves wrote:
> On 10/26/2017 05:37 PM, Martin Sebor wrote:
>
>> I agree that the latter use case is more common in GCC, but I don't
>> see it as a good thing.  GCC was written in C and most code still
>> uses now outdated C practices such as declaring variables at the top
>> of a (often long) function, and usually without initializing them.
>> It's been established that it's far better to declare variables with
>> the smallest scope, and to initialize them on declaration.  Compilers
>> are smart enough these days to eliminate redundant initialization or
>> assignments.
>
> I don't agree that that's established.  FWIW, I'm on the
> "prefer the -Wuninitialized" warnings camp.  I've been looking
> forward to all the VRP and threader improvements hoping that that
> warning (and -Wmaybe-uninitialized...) will improve along.

You're by far not alone, but it's a shrinking camp as
the majority have come to appreciate the benefits of the practice.
You can see it reflected in most popular coding standards, including
the CERT C++ Secure Coding Standard, Google C++ Style Guide, Sutter
and Alexandrescu's C++ Coding Standards, Scott Meyer's books, and
so on.  Just like with every rule, there are exceptions, but there
should be no doubt that in the general case, it is preferable to
declare each variable at the point where its initial value is known
(or can be computed) and initialize it on its declaration.

>> My comment is not motivated by convenience.  What I'm concerned
>> about is that defining a default ctor to be a no-op defeats the
>> zero-initialization semantics most users expect of T().
>
> This sounds like it's a problem because GCC is written in C++98.
>
> You can get the semantics you want in C++11 by defining
> the constructor with "= default;" :
>
>  struct T
>  {
>    T(int); // some other constructor forcing me to
>            // add a default constructor.
>
>    T() = default; // give me default construction using
>                   // default initialization.
>    int i;
>  };
>
> And now 'T t;' leaves T::i default initialized, i.e.,
> uninitialized, while T() value-initializes T::i, i.e.,
> initializes it to zero.
>
> So if that's a concern, maybe you could use "= default"
> conditionally depending on #if __cplusplus >= C++11, so that
> you'd get it for stages after stage1.
>
> Or just start requiring C++11 already. :-)

That would make sense to me and from the sound of some of his
comments also Richard's preference.

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-26 19:12               ` Martin Sebor
@ 2017-10-26 19:19                 ` Pedro Alves
  2017-10-26 23:41                   ` Martin Sebor
  0 siblings, 1 reply; 302+ messages in thread
From: Pedro Alves @ 2017-10-26 19:19 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches, richard.sandiford

On 10/26/2017 07:54 PM, Martin Sebor wrote:

> (...) in the general case, it is preferable to
> declare each variable at the point where its initial value is known
> (or can be computed) and initialize it on its declaration.

With that I fully agree, except it's not always possible or
natural.  The issue at hand usually turns up with
conditional initialization, like:

void foo ()
{
  int t;
  if (something)
    t = 1;
  else if (something_else)
    t = 2;
  if (t == 1)
    bar (); 
}

That's a simple example of course, but more complicated
conditionals aren't so easy to grok and spot the bug.

In the case above, I'd much prefer if the compiler tells me
I missed initializing 't' than initializing it to 0 "just
in case".

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-26 19:19                 ` Pedro Alves
@ 2017-10-26 23:41                   ` Martin Sebor
  2017-10-30 10:26                     ` Pedro Alves
  0 siblings, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-10-26 23:41 UTC (permalink / raw)
  To: Pedro Alves, gcc-patches, richard.sandiford

On 10/26/2017 01:17 PM, Pedro Alves wrote:
> On 10/26/2017 07:54 PM, Martin Sebor wrote:
>
>> (...) in the general case, it is preferable to
>> declare each variable at the point where its initial value is known
>> (or can be computed) and initialize it on its declaration.
>
> With that I fully agree, except it's not always possible or
> natural.  The issue at hand usually turns up with
> conditional initialization, like:
>
> void foo ()
> {
>   int t;
>   if (something)
>     t = 1;
>   else if (something_else)
>     t = 2;
>   if (t == 1)
>     bar ();
> }
>
> That's a simple example of course, but more complicated
> conditionals aren't so easy to grok and spot the bug.
>
> In the case above, I'd much prefer if the compiler tells me
> I missed initializing 't' than initializing it to 0 "just
> in case".

Sure.  A similar observation could be made about std::string
or std::vector vs a plain C-style array, or for that matter,
about almost any other class.  But it would be absurd to use
such examples as arguments that it's better to define classes
with a no-op default constructor.   It's safer to initialize
objects to some value (whatever that might be) than not to
initialize them at all.  That's true for fundamental types
and even more so for user-defined classes with constructors.

IMO, a good rule of thumb to follow in class design is to have
every class with any user-defined ctor either define a default
ctor that puts the object into a determinate state, or make
the default ctor inaccessible (or deleted in new C++ versions).
If there is a use case for leaving newly constructed objects
of a class in an uninitialized state that's an exception to
the rule that can be accommodated by providing a special API
(or in C++ 11, a defaulted ctor).

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-26 18:05             ` Richard Sandiford
@ 2017-10-26 23:53               ` Martin Sebor
  2017-10-27  8:33                 ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-10-26 23:53 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/26/2017 11:52 AM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>>>>>>>  /* The tree and const_tree overload templates.   */
>>>>>>>  namespace wi
>>>>>>>  {
>>>>>>> +  class unextended_tree
>>>>>>> +  {
>>>>>>> +  private:
>>>>>>> +    const_tree m_t;
>>>>>>> +
>>>>>>> +  public:
>>>>>>> +    unextended_tree () {}
>>>>>>
>>>>>> Defining no-op ctors is quite dangerous and error-prone.  I suggest
>>>>>> to instead default initialize the member(s):
>>>>>>
>>>>>>    unextended_tree (): m_t () {}
>>>>>>
>>>>>> Ditto everywhere else, such as in:
>>>>>
>>>>> This is really performance-senesitive code though, so I don't think
>>>>> we want to add any unnecessary initialisation.  Primitive types are
>>>>> uninitalised by default too, and the point of this class is to
>>>>> provide an integer-like interface.
>>>>
>>>> I understand the performance concern (more on that below), but
>>>> to clarify the usability issues,  I don't think the analogy with
>>>> primitive types is quite fitting here: int() evaluates to zero,
>>>> as do the values of i and a[0] and a[1] after an object of type
>>>> S is constructed using its default ctor, i.e., S ():
>>>>
>>>>    struct S {
>>>>      int i;
>>>>      int a[2];
>>>>
>>>>      S (): i (), a () { }
>>>>    };
>>>
>>> Sure, I realise that.  I meant that:
>>>
>>>   int x;
>>>
>>> doesn't initialise x to zero.  So it's a question of which case is the
>>> most motivating one: using "x ()" to initialise x to 0 in a constructor
>>> or "int x;" to declare a variable of type x, uninitialised.  I think the
>>> latter use case is much more common (at least in GCC).  Rearranging
>>> things, I said later:
>>
>> I agree that the latter use case is more common in GCC, but I don't
>> see it as a good thing.  GCC was written in C and most code still
>> uses now outdated C practices such as declaring variables at the top
>> of a (often long) function, and usually without initializing them.
>> It's been established that it's far better to declare variables with
>> the smallest scope, and to initialize them on declaration.  Compilers
>> are smart enough these days to eliminate redundant initialization or
>> assignments.
>>
>>>>> In your other message you used the example of explicit default
>>>>> initialisation, such as:
>>>>>
>>>>> class foo
>>>>> {
>>>>>   foo () : x () {}
>>>>>   unextended_tree x;
>>>>> };
>>>>>
>>>>> But I think we should strongly discourage that kind of thing.
>>>>> If someone wants to initialise x to a particular value, like
>>>>> integer_zero_node, then it would be better to do it explicitly.
>>>>> If they don't care what the initial value is, then for these
>>>>> integer-mimicing classes, uninitialised is as good as anything
>>>>> else. :-)
>>>
>>> What I meant was: if you want to initialise "i" to 1 in your example,
>>> you'd have to write "i (1)".  Being able to write "i ()" instead of
>>> "i (0)" saves one character but I don't think it adds much clarity.
>>> Explicitly initialising something only seems worthwhile if you say
>>> what you're initialising it to.
>>
>> My comment is not motivated by convenience.  What I'm concerned
>> about is that defining a default ctor to be a no-op defeats the
>> zero-initialization semantics most users expect of T().
>>
>> This is particularly concerning for a class designed to behave
>> like an [improved] basic integer type.  Such a class should act
>> as closely as possible to the type it emulates and in the least
>> surprising ways.  Any sort of a deviation that replaces well-
>> defined behavior with undefined is a gotcha and a bug waiting
>> to happen.
>>
>> It's also a concern in generic (template) contexts where T() is
>> expected to zero-initialize.  A template designed to work with
>> a fundamental integer type should also work with a user-defined
>> type designed to behave like an integer.
>
> But that kind of situation is one where using "T (0)" over "T ()"
> is useful.  It means that template substitution will succeed for
> T that are sufficiently integer-like to have a single well-defined
> zero but not for T that aren't (such as wide_int).

That strikes me as a little too subtle.  But it also doesn't
sound like wide_int is as close to an integer as its name
suggests.  After all, it doesn't support relational operators
either, or even assignment from other integer types.  It's
really a different beast.  But that still doesn't in my mind
justify the no-op initialization semantics.

>> For offset_int the default precision is 128-bits.  Making that
>> the default also for wide_int should be unsurprising.
>
> I think it'd be surprising.  offset_int should always be used in
> preference to wide_int if the precision is known to be 128 bits
> in advance, and there doesn't seem any reason to prefer the
> precision of offset_int over widest_int, HOST_WIDE_INT or int.
>
> We would end up with:
>
>   wide_int
>   f (const wide_int &y)
>   {
>     wide_int x;
>     x += y;
>     return x;
>   }
>
> being valid if y happens to have 128 bits as well, and a runtime error
> otherwise.

Surely that would be far better than the undefined behavior we
have today.

>
> Also, I think it'd be inconsistent to allow the specific case of 0
> to be assigned by default construction, but not also allow:
>
>   wide_int x (0);
>
>   wide_int x;
>   x = 0;
>
>   wide_int x;
>   x = 1;
>
> etc.  And wide_int wasn't intended for that use case.

Then perhaps I don't fully understand wide_int.  I would expect
the above assignments to also "just work" and I can't imagine
why we would not want them to.  In what way is rejecting
the above helpful when the following is accepted but undefined?

   wide_int f ()
   {
     wide_int x;
     x += 0;
     return x;
   }

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-26 23:53               ` Martin Sebor
@ 2017-10-27  8:33                 ` Richard Sandiford
  2017-10-29 16:56                   ` Martin Sebor
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-10-27  8:33 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches

Martin Sebor <msebor@gmail.com> writes:
> On 10/26/2017 11:52 AM, Richard Sandiford wrote:
>> Martin Sebor <msebor@gmail.com> writes:
>>> For offset_int the default precision is 128-bits.  Making that
>>> the default also for wide_int should be unsurprising.
>>
>> I think it'd be surprising.  offset_int should always be used in
>> preference to wide_int if the precision is known to be 128 bits
>> in advance, and there doesn't seem any reason to prefer the
>> precision of offset_int over widest_int, HOST_WIDE_INT or int.
>>
>> We would end up with:
>>
>>   wide_int
>>   f (const wide_int &y)
>>   {
>>     wide_int x;
>>     x += y;
>>     return x;
>>   }
>>
>> being valid if y happens to have 128 bits as well, and a runtime error
>> otherwise.
>
> Surely that would be far better than the undefined behavior we
> have today.

I disagree.  People shouldn't rely on the above behaviour because
it's never useful.  If y is known to be 128 bits in advance then
the code should be using offset_int instead of wide_int.  And if
y isn't known to be 128 bits in advance, the code is incorrect,
because it needs to cope with precisions other than 128 but doesn't
do so.

The motivation for doing this was to initialise wide_ints to zero,
but the behaviour of f() wouldn't be same as:

   wide_int x = 0 + y;
   return x;

That's always valid, because in an operation involving a wide_int
and a primitive type, the primitive type promotes or demotes
to the same precision as the wide_int.

>> Also, I think it'd be inconsistent to allow the specific case of 0
>> to be assigned by default construction, but not also allow:
>>
>>   wide_int x (0);
>>
>>   wide_int x;
>>   x = 0;
>>
>>   wide_int x;
>>   x = 1;
>>
>> etc.  And wide_int wasn't intended for that use case.
>
> Then perhaps I don't fully understand wide_int.  I would expect
> the above assignments to also "just work" and I can't imagine
> why we would not want them to.  In what way is rejecting
> the above helpful when the following is accepted but undefined?
>
>    wide_int f ()
>    {
>      wide_int x;
>      x += 0;
>      return x;
>    }

Well, it compiles, but with sufficiently good static analysis
it should trigger a warning.  (GCC might not be there yet,
but these things improve.)  As mentioned above:

  wide_int f ()
  {
    wide_int x = ...;
    x += 0;
    return x;
  }

(or some value other than 0) is well-defined because the int
promotes to whatever precision x has.

The problem with the examples I gave was that wide_int always needs
to have a precision and nothing in that code says what the precision
should be.  The "right" way of writing it would be:

   wide_int x = wi::shwi (0, prec);

   wide_int x;
   x = wi::shwi (0, prec);

   wide_int x;
   x = wi::shwi (1, prec);

where prec specifies the precision of the integer.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-27  8:33                 ` Richard Sandiford
@ 2017-10-29 16:56                   ` Martin Sebor
  2017-10-30  6:36                     ` Trevor Saunders
  0 siblings, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-10-29 16:56 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/27/2017 02:08 AM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>> On 10/26/2017 11:52 AM, Richard Sandiford wrote:
>>> Martin Sebor <msebor@gmail.com> writes:
>>>> For offset_int the default precision is 128-bits.  Making that
>>>> the default also for wide_int should be unsurprising.
>>>
>>> I think it'd be surprising.  offset_int should always be used in
>>> preference to wide_int if the precision is known to be 128 bits
>>> in advance, and there doesn't seem any reason to prefer the
>>> precision of offset_int over widest_int, HOST_WIDE_INT or int.
>>>
>>> We would end up with:
>>>
>>>   wide_int
>>>   f (const wide_int &y)
>>>   {
>>>     wide_int x;
>>>     x += y;
>>>     return x;
>>>   }
>>>
>>> being valid if y happens to have 128 bits as well, and a runtime error
>>> otherwise.
>>
>> Surely that would be far better than the undefined behavior we
>> have today.
>
> I disagree.  People shouldn't rely on the above behaviour because
> it's never useful.

Well, yes, but the main point of my feedback on the poly_int default
ctor (and the ctor of the extended_tree class, and the existing wide
int classes) is that it makes them easy to misuse.  That they're not
meant to be [mis]used like that isn't an answer.

You explained earlier that the no-op initialization is necessary
for efficiency and I suggested a safer alternative: an API that
makes the lack of initialization explicit, while providing a safe
default.  I still believe this is the right approach for the new
poly_int classes.  I also think it's the right solution for
offset_int.

>>    wide_int f ()
>>    {
>>      wide_int x;
>>      x += 0;
>>      return x;
>>    }
>
> Well, it compiles, but with sufficiently good static analysis
> it should trigger a warning.  (GCC might not be there yet,
> but these things improve.)  As mentioned above:

Forgive me, but knowingly designing classes to be unsafe with
the hope that their accidental misuses may some day be detected
by sufficiently advanced static analyzers is not helpful.  It's
also unnecessary when less error-prone and equally efficient
alternatives exist.

>   wide_int f ()
>   {
>     wide_int x = ...;
>     x += 0;
>     return x;
>   }
>
> (or some value other than 0) is well-defined because the int
> promotes to whatever precision x has.
>
> The problem with the examples I gave was that wide_int always needs
> to have a precision and nothing in that code says what the precision
> should be.  The "right" way of writing it would be:
>
>    wide_int x = wi::shwi (0, prec);
>
>    wide_int x;
>    x = wi::shwi (0, prec);
>
>    wide_int x;
>    x = wi::shwi (1, prec);
>
> where prec specifies the precision of the integer.

Yes, I realize that.  But we got here by exploring the effects
of default zero-initialization.  You have given examples showing
where relying on the zero-intialization could lead to bugs.  Sure,
no one is disputing that there are such instances.  Those exist
with any type and are, in general, unavoidable.

My argument is that default initialization that leaves the object
in an indeterminate state suffers from all the same problems your
examples do plus infinitely many others (i.e., undefined behavior),
and so is an obviously inferior choice.  It's a design error that
should be avoided.

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-29 16:56                   ` Martin Sebor
@ 2017-10-30  6:36                     ` Trevor Saunders
  2017-10-31 20:25                       ` Martin Sebor
  0 siblings, 1 reply; 302+ messages in thread
From: Trevor Saunders @ 2017-10-30  6:36 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches, richard.sandiford

On Sun, Oct 29, 2017 at 10:25:38AM -0600, Martin Sebor wrote:
> On 10/27/2017 02:08 AM, Richard Sandiford wrote:
> > Martin Sebor <msebor@gmail.com> writes:
> > > On 10/26/2017 11:52 AM, Richard Sandiford wrote:
> > > > Martin Sebor <msebor@gmail.com> writes:
> > > > > For offset_int the default precision is 128-bits.  Making that
> > > > > the default also for wide_int should be unsurprising.
> > > > 
> > > > I think it'd be surprising.  offset_int should always be used in
> > > > preference to wide_int if the precision is known to be 128 bits
> > > > in advance, and there doesn't seem any reason to prefer the
> > > > precision of offset_int over widest_int, HOST_WIDE_INT or int.
> > > > 
> > > > We would end up with:
> > > > 
> > > >   wide_int
> > > >   f (const wide_int &y)
> > > >   {
> > > >     wide_int x;
> > > >     x += y;
> > > >     return x;
> > > >   }
> > > > 
> > > > being valid if y happens to have 128 bits as well, and a runtime error
> > > > otherwise.
> > > 
> > > Surely that would be far better than the undefined behavior we
> > > have today.
> > 
> > I disagree.  People shouldn't rely on the above behaviour because
> > it's never useful.
> 
> Well, yes, but the main point of my feedback on the poly_int default
> ctor (and the ctor of the extended_tree class, and the existing wide
> int classes) is that it makes them easy to misuse.  That they're not
> meant to be [mis]used like that isn't an answer.

I think Richard's point is different from saying don't misuse it.  I
think its that 0 initializing is also always a bug, and the user needs
to choosesome initialization to follow the default ctor in either case.

> You explained earlier that the no-op initialization is necessary
> for efficiency and I suggested a safer alternative: an API that
> makes the lack of initialization explicit, while providing a safe
> default.  I still believe this is the right approach for the new
> poly_int classes.  I also think it's the right solution for
> offset_int.
> 
> > >    wide_int f ()
> > >    {
> > >      wide_int x;
> > >      x += 0;
> > >      return x;
> > >    }
> > 
> > Well, it compiles, but with sufficiently good static analysis
> > it should trigger a warning.  (GCC might not be there yet,
> > but these things improve.)  As mentioned above:
> 
> Forgive me, but knowingly designing classes to be unsafe with
> the hope that their accidental misuses may some day be detected
> by sufficiently advanced static analyzers is not helpful.  It's
> also unnecessary when less error-prone and equally efficient
> alternatives exist.

If only the world was that nice, unfortunately whenever I go looking at
generated code I find things that make me sad.

> >   wide_int f ()
> >   {
> >     wide_int x = ...;
> >     x += 0;
> >     return x;
> >   }
> > 
> > (or some value other than 0) is well-defined because the int
> > promotes to whatever precision x has.
> > 
> > The problem with the examples I gave was that wide_int always needs
> > to have a precision and nothing in that code says what the precision
> > should be.  The "right" way of writing it would be:
> > 
> >    wide_int x = wi::shwi (0, prec);
> > 
> >    wide_int x;
> >    x = wi::shwi (0, prec);
> > 
> >    wide_int x;
> >    x = wi::shwi (1, prec);
> > 
> > where prec specifies the precision of the integer.
> 
> Yes, I realize that.  But we got here by exploring the effects
> of default zero-initialization.  You have given examples showing
> where relying on the zero-intialization could lead to bugs.  Sure,
> no one is disputing that there are such instances.  Those exist
> with any type and are, in general, unavoidable.
> 
> My argument is that default initialization that leaves the object
> in an indeterminate state suffers from all the same problems your
> examples do plus infinitely many others (i.e., undefined behavior),
> and so is an obviously inferior choice.  It's a design error that
> should be avoided.

I'd argue its not strictly inferior, one big advantage it has is
that its much easier for tools like valgrind or msan to find bugs where
something is uninitialized than ones where its initialized with garbage.
deciding a program exhibits undefined behavior in some case is a lot
easier than reasoning about if it did what it was supposed to.

The other problem is that 0 is an especially bad value to pick if it
isn't very likely to always be correct.  If you are going to initialize
something with a known garbage value it would be better to pick
something that is more likely to blow up immediately than something that
can hide bugs.  Sure, unitialized things change from run to run, but
they are much more likely to look like garbage than 0 is.

Trev

> 
> Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-26 23:41                   ` Martin Sebor
@ 2017-10-30 10:26                     ` Pedro Alves
  2017-10-31 16:12                       ` Martin Sebor
  0 siblings, 1 reply; 302+ messages in thread
From: Pedro Alves @ 2017-10-30 10:26 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches, richard.sandiford

On 10/27/2017 12:29 AM, Martin Sebor wrote:

> 
> IMO, a good rule of thumb to follow in class design is to have
> every class with any user-defined ctor either define a default
> ctor that puts the object into a determinate state, or make
> the default ctor inaccessible (or deleted in new C++ versions).
> If there is a use case for leaving newly constructed objects
> of a class in an uninitialized state that's an exception to
> the rule that can be accommodated by providing a special API
> (or in C++ 11, a defaulted ctor).

Yet another rule of thumb is to make classes that model
built-in types behave as close to the built-in types as
possible, making it easier to migrate between the custom
types and the built-in types (and vice versa), to follow
expectations, and to avoid pessimization around e.g., otherwise
useless forcing initialization of such types in containers/arrays
when you're going to immediately fill in the container/array with
real values.

BTW, there's a proposal for adding a wide_int class to C++20:

 http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0539r1.html

and I noticed:

~~~
 26.??.2.?? wide_integer constructors [numeric.wide_integer.cons]

 constexpr wide_integer() noexcept = default;

 Effects: A Constructs an object with undefined value.
~~~

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-30 10:26                     ` Pedro Alves
@ 2017-10-31 16:12                       ` Martin Sebor
  0 siblings, 0 replies; 302+ messages in thread
From: Martin Sebor @ 2017-10-31 16:12 UTC (permalink / raw)
  To: Pedro Alves, gcc-patches, richard.sandiford



On 10/30/2017 04:19 AM, Pedro Alves wrote:
> On 10/27/2017 12:29 AM, Martin Sebor wrote:
> 
>>
>> IMO, a good rule of thumb to follow in class design is to have
>> every class with any user-defined ctor either define a default
>> ctor that puts the object into a determinate state, or make
>> the default ctor inaccessible (or deleted in new C++ versions).
>> If there is a use case for leaving newly constructed objects
>> of a class in an uninitialized state that's an exception to
>> the rule that can be accommodated by providing a special API
>> (or in C++ 11, a defaulted ctor).
> 
> Yet another rule of thumb is to make classes that model
> built-in types behave as close to the built-in types as
> possible, making it easier to migrate between the custom
> types and the built-in types (and vice versa), to follow
> expectations, and to avoid pessimization around e.g., otherwise
> useless forcing initialization of such types in containers/arrays
> when you're going to immediately fill in the container/array with
> real values.
> 
> BTW, there's a proposal for adding a wide_int class to C++20:
> 
>   http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0539r1.html
> 
> and I noticed:
> 
> ~~~
>   26.??.2.?? wide_integer constructors [numeric.wide_integer.cons]
> 
>   constexpr wide_integer() noexcept = default;
> 
>   Effects: A Constructs an object with undefined value.
> ~~~

Thanks for the reference.  As I said in an earlier reply, this
would make sense to me if we could use C++ 11 or later.  Unlike
a no-op default ctor, the = default constructor provides syntax
to initialize the object, so both the safe use case and the
efficient one are supported.  I.e., the proposed wide_int is
zero-initialized by using the 'wide_int()' syntax.  The GCC
wide int and poly_int classes, on the other hand, are left in
an indeterminate state.  That's a bug waiting to happen (as I
already experienced with offset_int.)

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-30  6:36                     ` Trevor Saunders
@ 2017-10-31 20:25                       ` Martin Sebor
  0 siblings, 0 replies; 302+ messages in thread
From: Martin Sebor @ 2017-10-31 20:25 UTC (permalink / raw)
  To: Trevor Saunders; +Cc: gcc-patches, richard.sandiford

On 10/29/2017 09:14 PM, Trevor Saunders wrote:
> On Sun, Oct 29, 2017 at 10:25:38AM -0600, Martin Sebor wrote:
>> On 10/27/2017 02:08 AM, Richard Sandiford wrote:
>>> Martin Sebor <msebor@gmail.com> writes:
>>>> On 10/26/2017 11:52 AM, Richard Sandiford wrote:
>>>>> Martin Sebor <msebor@gmail.com> writes:
>>>>>> For offset_int the default precision is 128-bits.  Making that
>>>>>> the default also for wide_int should be unsurprising.
>>>>>
>>>>> I think it'd be surprising.  offset_int should always be used in
>>>>> preference to wide_int if the precision is known to be 128 bits
>>>>> in advance, and there doesn't seem any reason to prefer the
>>>>> precision of offset_int over widest_int, HOST_WIDE_INT or int.
>>>>>
>>>>> We would end up with:
>>>>>
>>>>>    wide_int
>>>>>    f (const wide_int &y)
>>>>>    {
>>>>>      wide_int x;
>>>>>      x += y;
>>>>>      return x;
>>>>>    }
>>>>>
>>>>> being valid if y happens to have 128 bits as well, and a runtime error
>>>>> otherwise.
>>>>
>>>> Surely that would be far better than the undefined behavior we
>>>> have today.
>>>
>>> I disagree.  People shouldn't rely on the above behaviour because
>>> it's never useful.
>>
>> Well, yes, but the main point of my feedback on the poly_int default
>> ctor (and the ctor of the extended_tree class, and the existing wide
>> int classes) is that it makes them easy to misuse.  That they're not
>> meant to be [mis]used like that isn't an answer.
> 
> I think Richard's point is different from saying don't misuse it.  I
> think its that 0 initializing is also always a bug, and the user needs
> to choosesome initialization to follow the default ctor in either case.

Initializing offset_int to zero isn't a bug and there are examples
of it in GCC sources.  Some of those are now being replaced with
those of poly_int xxx = 0.  Here's one example from [015/nnn]
poly_int: ao_ref and vn_reference_op_t:

@@ -1365,8 +1369,8 @@ indirect_refs_may_alias_p (tree ref1 ATT
  refs_may_alias_p_1 (ao_ref *ref1, ao_ref *ref2, bool tbaa_p)
  {
    tree base1, base2;
-  HOST_WIDE_INT offset1 = 0, offset2 = 0;
-  HOST_WIDE_INT max_size1 = -1, max_size2 = -1;
+  poly_int64 offset1 = 0, offset2 = 0;
+  poly_int64 max_size1 = -1, max_size2 = -1;

I'm not suggesting these be changed to avoid the explicit
initialization.  But I show this to disprove the claim above.
Clearly, zero initialization is valid and useful.

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-10-25 16:17   ` Martin Sebor
@ 2017-11-08  9:44     ` Richard Sandiford
  2017-11-08 16:51       ` Martin Sebor
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-11-08  9:44 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches

Martin Sebor <msebor@gmail.com> writes:
> I haven't done nearly a thorough review but the dtor followed by
> the placement new in the POLY_SET_COEFF() macro caught my eye so
> I thought I'd ask sooner rather than later.  Given the macro
> definition:
>
> +   The dummy comparison against a null C * is just a way of checking
> +   that C gives the right type.  */
> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
> +   ? (void) ((RES).coeffs[I] = VALUE) \
> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>
> is the following use well-defined?
>
> +template<unsigned int N, typename C>
> +inline poly_int_pod<N, C>&
> +poly_int_pod<N, C>::operator <<= (unsigned int a)
> +{
> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>
> It looks to me as though the VALUE argument in the ctor invoked
> by the placement new expression is evaluated after the dtor has
> destroyed the very array element the VALUE argument expands to.

Good catch!  It should simply have been doing <<= on each coefficient --
I must have got carried away when converting to POLY_SET_COEFF.

I double-checked the other uses and think that's the only one.

> Whether or not is, in fact, a problem, it seems to me that using
> a function template rather than a macro would be a clearer and
> safer way to do the same thing.  (Safer in that the macro also
> evaluates its arguments multiple times, which is often a source
> of subtle bugs.)

That would slow down -O0 builds though, by introducing an extra
function call and set of temporaries even when the coefficients
are primitive integers.

> Other than that, I would suggest changing 't' to something a bit
> less terse, like perhaps 'type' in traits like the following:
>
> +struct if_lossless;
> +template<typename T1, typename T2, typename T3>
> +struct if_lossless<T1, T2, T3, true>
> +{
> +  typedef T3 t;
> +};

OK, done in v2.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-10-25 13:09                     ` Richard Biener
@ 2017-11-08  9:51                       ` Richard Sandiford
  2017-11-08 11:57                         ` Richard Biener
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-11-08  9:51 UTC (permalink / raw)
  To: Richard Biener; +Cc: Eric Botcazou, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Oct 25, 2017 at 1:26 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Tue, Oct 24, 2017 at 3:24 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Tue, Oct 24, 2017 at 2:48 PM, Richard Sandiford
>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>> On Tue, Oct 24, 2017 at 1:23 PM, Richard Sandiford
>>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>>> Eric Botcazou <ebotcazou@adacore.com> writes:
>>>>>>>>>> Yeah.  E.g. for ==, the two options would be:
>>>>>>>>>>
>>>>>>>>>> a) must_eq (a, b)   -> a == b
>>>>>>>>>>    must_ne (a, b)   -> a != b
>>>>>>>>>>
>>>>>>>>>>    which has the weird property that (a == b) != (!(a != b))
>>>>>>>>>>
>>>>>>>>>> b) must_eq (a, b)   -> a == b
>>>>>>>>>>    may_ne (a, b)    -> a != b
>>>>>>>>>>
>>>>>>>>>>    which has the weird property that a can be equal to b when a != b
>>>>>>>>>
>>>>>>>>> Yes, a) was the one I had in mind, i.e. the traditional operators are
>>>>>>>>> the must
>>>>>>>>> variants and you use an outer ! in order to express the may.  Of
>>>>>>>>> course this
>>>>>>>>> would require a bit of discipline but, on the other hand, if most of
>>>>>>>>> the cases
>>>>>>>>> fall in the must category, that could be less ugly.
>>>>>>>>
>>>>>>>> I just think that discipline is going to be hard to maintain in
>>>>>>>> practice,
>>>>>>>> since it's so natural to assume (a == b || a != b) == true.  With the
>>>>>>>> may/must approach, static type checking forces the issue.
>>>>>>>>
>>>>>>>>>> Sorry about that.  It's the best I could come up with without losing
>>>>>>>>>> the may/must distinction.
>>>>>>>>>
>>>>>>>>> Which variant is known_zero though?  Must or may?
>>>>>>>>
>>>>>>>> must.  maybe_nonzero is the may version.
>>>>>>>
>>>>>>> Can you rename known_zero to must_be_zero then?
>>>>>>
>>>>>> That'd be OK with me.
>>>>>>
>>>>>> Another alternative I wondered about was must_eq_0 / may_ne_0.
>>>>>>
>>>>>>> What's wrong with must_eq (X, 0) / may_eq (X, 0) btw?
>>>>>>
>>>>>> must_eq (X, 0) generated a warning if X is unsigned, so sometimes you'd
>>>>>> need must_eq (X, 0) and sometimes must_eq (X, 0U).
>>>>>
>>>>> Is that because they are templates?  Maybe providing a partial
>>>>> specialization
>>>>> would help?
>>>>
>>>> I don't think it's templates specifically.  We end up with something like:
>>>>
>>>>   int f (unsigned int x, const int y)
>>>>   {
>>>>     return x != y;
>>>>   }
>>>>
>>>>   int g (unsigned int x) { return f (x, 0); }
>>>>
>>>> which generates a warning too.
>>>>
>>>>> I'd be fine with must_eq_p and may_eq_0.
>>>>
>>>> OK, I'll switch to that if there are no objections.
>>>
>>> Hum.  But then we still warn for must_eq_p (x, 1), no?
>>
>> Yeah.  The patch also had a known_one and known_all_ones for
>> those two (fairly) common cases.  For other values the patches
>> just add "U" where necessary.
>>
>> If you think it would be better to use U consistently and not
>> have the helpers, then I'm happy to do that instead.
>>
>>> So why does
>>>
>>>   int f (unsigned int x)
>>>   {
>>>      return x != 0;
>>>   }
>>>
>>> not warn?  Probably because of promotion of the arg.
>>
>> [Jakub's already answered this part.]
>>
>>> Shouldn't we then simply never have a may/must_*_p (T1, T2)
>>> with T1 and T2 being not compatible?  That is, force promotion
>>> rules on them with template magic?
>>
>> This was what I meant by:
>>
>>   Or we could suppress warnings by forcibly converting the input.
>>   Sometimes the warnings are useful though.
>>
>> We already do this kind of conversion for arithmetic, to ensure
>> that poly_uint16 + poly_uint16 -> poly_int64 promotes before the
>> addition rather than after it.  But it defeats the point of the
>> comparison warning, which is that you're potentially redefining
>> the sign bit.
>>
>> I think the warning's just as valuable for may/must comparison of
>> non-literals as it is for normal comparison operators.  It's just
>> unfortunate that we no longer get the special handling of literals.
>
> Ok, I see.
>
> I think I have a slight preference for using 0U consistently but I haven't
> looked at too many patches yet to see how common/ugly that would be.

OK.  FWIW, that's also how we had it until very recently.  I added the
known/maybe stuff in a late and probably misguided attempt to make
things prettier.

I've pulled that part out and switched back to using U.  I'll post the
new 001 patch in a sec.  Should I repost all the other patches that
changed as well, or isn't it worth it for a change like that?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-10-23 16:58 ` [001/nnn] poly_int: add poly-int.h Richard Sandiford
  2017-10-25 16:17   ` Martin Sebor
@ 2017-11-08 10:03   ` Richard Sandiford
  2017-11-14  0:42     ` Richard Sandiford
  1 sibling, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-11-08 10:03 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 4555 bytes --]

Richard Sandiford <richard.sandiford@linaro.org> writes:
> This patch adds a new "poly_int" class to represent polynomial integers
> of the form:
>
>   C0 + C1*X1 + C2*X2 ... + Cn*Xn
>
> It also adds poly_int-based typedefs for offsets and sizes of various
> precisions.  In these typedefs, the Ci coefficients are compile-time
> constants and the Xi indeterminates are run-time invariants.  The number
> of coefficients is controlled by the target and is initially 1 for all
> ports.
>
> Most routines can handle general coefficient counts, but for now a few
> are specific to one or two coefficients.  Support for other coefficient
> counts can be added when needed.
>
> The patch also adds a new macro, IN_TARGET_CODE, that can be
> set to indicate that a TU contains target-specific rather than
> target-independent code.  When this macro is set and the number of
> coefficients is 1, the poly-int.h classes define a conversion operator
> to a constant.  This allows most existing target code to work without
> modification.  The main exceptions are:
>
> - values passed through ..., which need an explicit conversion to a
>   constant
>
> - ?: expression in which one arm ends up being a polynomial and the
>   other remains a constant.  In these cases it would be valid to convert
>   the constant to a polynomial and the polynomial to a constant, so a
>   cast is needed to break the ambiguity.
>
> The patch also adds a new target hook to return the estimated
> value of a polynomial for costing purposes.
>
> The patch also adds operator<< on wide_ints (it was already defined
> for offset_int and widest_int).  I think this was originally excluded
> because >> is ambiguous for wide_int, but << is useful for converting
> bytes to bits, etc., so is worth defining on its own.  The patch also
> adds operator% and operator/ for offset_int and widest_int, since those
> types are always signed.  These changes allow the poly_int interface to
> be more predictable.
>
> I'd originally tried adding the tests as selftests, but that ended up
> bloating cc1 by at least a third.  It also took a while to build them
> at -O2.  The patch therefore uses plugin tests instead, where we can
> force the tests to be built at -O0.  They still run in negligible time
> when built that way.

Changes in v2:

- Drop the controversial known_zero etc. wrapper functions.
- Fix the operator<<= bug that Martin found.
- Switch from "t" to "type" in SFINAE classes (requested by Martin).

Not changed in v2:

- Default constructors are still empty.  I agree it makes sense to use
  "= default" when we switch to C++11, but it would be dangerous for
  that to make "poly_int64 x;" less defined than it is now.

Tested as before.

Thanks,
Richard


2017-11-08  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* poly-int.h: New file.
	* poly-int-types.h: Likewise.
	* coretypes.h: Include them.
	(POLY_INT_CONVERSION): Define.
	* target.def (estimated_poly_value): New hook.
	* doc/tm.texi.in (TARGET_ESTIMATED_POLY_VALUE): New hook.
	* doc/tm.texi: Regenerate.
	* doc/poly-int.texi: New file.
	* doc/gccint.texi: Include it.
	* doc/rtl.texi: Describe restrictions on subreg modes.
	* Makefile.in (TEXI_GCCINT_FILES): Add poly-int.texi.
	* genmodes.c (NUM_POLY_INT_COEFFS): Provide a default definition.
	(emit_insn_modes_h): Emit a definition of NUM_POLY_INT_COEFFS.
	* targhooks.h (default_estimated_poly_value): Declare.
	* targhooks.c (default_estimated_poly_value): New function.
	* target.h (estimated_poly_value): Likewise.
	* wide-int.h (WI_UNARY_RESULT): Use wi::binary_traits.
	(wi::unary_traits): Delete.
	(wi::binary_traits::signed_shift_result_type): Define for
	offset_int << HOST_WIDE_INT, etc.
	(generic_wide_int::operator <<=): Define for all types and use
	wi::lshift instead of <<.
	(wi::hwi_with_prec): Add a default constructor.
	(wi::ints_for): New class.
	(operator <<): Define for all wide-int types.
	(operator /): New function.
	(operator %): Likewise.
	* selftest.h (ASSERT_MUST_EQ, ASSERT_MUST_EQ_AT, ASSERT_MAY_NE)
	(ASSERT_MAY_NE_AT): New macros.

gcc/testsuite/
	* gcc.dg/plugin/poly-int-tests.h,
	gcc.dg/plugin/poly-int-test-1.c,
	gcc.dg/plugin/poly-int-01_plugin.c,
	gcc.dg/plugin/poly-int-02_plugin.c,
	gcc.dg/plugin/poly-int-03_plugin.c,
	gcc.dg/plugin/poly-int-04_plugin.c,
	gcc.dg/plugin/poly-int-05_plugin.c,
	gcc.dg/plugin/poly-int-06_plugin.c,
	gcc.dg/plugin/poly-int-07_plugin.c: New tests.
	* gcc.dg/plugin/plugin.exp: Run them.


[-- Attachment #2: poly-001-poly-int-h.diff.gz --]
[-- Type: application/gzip, Size: 47953 bytes --]

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [000/nnn] poly_int: representation of runtime offsets and sizes
  2017-11-08  9:51                       ` Richard Sandiford
@ 2017-11-08 11:57                         ` Richard Biener
  0 siblings, 0 replies; 302+ messages in thread
From: Richard Biener @ 2017-11-08 11:57 UTC (permalink / raw)
  To: Richard Biener, Eric Botcazou, GCC Patches, Richard Sandiford

On Wed, Nov 8, 2017 at 10:39 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Wed, Oct 25, 2017 at 1:26 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Tue, Oct 24, 2017 at 3:24 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>> On Tue, Oct 24, 2017 at 2:48 PM, Richard Sandiford
>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>>> On Tue, Oct 24, 2017 at 1:23 PM, Richard Sandiford
>>>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>>>> Eric Botcazou <ebotcazou@adacore.com> writes:
>>>>>>>>>>> Yeah.  E.g. for ==, the two options would be:
>>>>>>>>>>>
>>>>>>>>>>> a) must_eq (a, b)   -> a == b
>>>>>>>>>>>    must_ne (a, b)   -> a != b
>>>>>>>>>>>
>>>>>>>>>>>    which has the weird property that (a == b) != (!(a != b))
>>>>>>>>>>>
>>>>>>>>>>> b) must_eq (a, b)   -> a == b
>>>>>>>>>>>    may_ne (a, b)    -> a != b
>>>>>>>>>>>
>>>>>>>>>>>    which has the weird property that a can be equal to b when a != b
>>>>>>>>>>
>>>>>>>>>> Yes, a) was the one I had in mind, i.e. the traditional operators are
>>>>>>>>>> the must
>>>>>>>>>> variants and you use an outer ! in order to express the may.  Of
>>>>>>>>>> course this
>>>>>>>>>> would require a bit of discipline but, on the other hand, if most of
>>>>>>>>>> the cases
>>>>>>>>>> fall in the must category, that could be less ugly.
>>>>>>>>>
>>>>>>>>> I just think that discipline is going to be hard to maintain in
>>>>>>>>> practice,
>>>>>>>>> since it's so natural to assume (a == b || a != b) == true.  With the
>>>>>>>>> may/must approach, static type checking forces the issue.
>>>>>>>>>
>>>>>>>>>>> Sorry about that.  It's the best I could come up with without losing
>>>>>>>>>>> the may/must distinction.
>>>>>>>>>>
>>>>>>>>>> Which variant is known_zero though?  Must or may?
>>>>>>>>>
>>>>>>>>> must.  maybe_nonzero is the may version.
>>>>>>>>
>>>>>>>> Can you rename known_zero to must_be_zero then?
>>>>>>>
>>>>>>> That'd be OK with me.
>>>>>>>
>>>>>>> Another alternative I wondered about was must_eq_0 / may_ne_0.
>>>>>>>
>>>>>>>> What's wrong with must_eq (X, 0) / may_eq (X, 0) btw?
>>>>>>>
>>>>>>> must_eq (X, 0) generated a warning if X is unsigned, so sometimes you'd
>>>>>>> need must_eq (X, 0) and sometimes must_eq (X, 0U).
>>>>>>
>>>>>> Is that because they are templates?  Maybe providing a partial
>>>>>> specialization
>>>>>> would help?
>>>>>
>>>>> I don't think it's templates specifically.  We end up with something like:
>>>>>
>>>>>   int f (unsigned int x, const int y)
>>>>>   {
>>>>>     return x != y;
>>>>>   }
>>>>>
>>>>>   int g (unsigned int x) { return f (x, 0); }
>>>>>
>>>>> which generates a warning too.
>>>>>
>>>>>> I'd be fine with must_eq_p and may_eq_0.
>>>>>
>>>>> OK, I'll switch to that if there are no objections.
>>>>
>>>> Hum.  But then we still warn for must_eq_p (x, 1), no?
>>>
>>> Yeah.  The patch also had a known_one and known_all_ones for
>>> those two (fairly) common cases.  For other values the patches
>>> just add "U" where necessary.
>>>
>>> If you think it would be better to use U consistently and not
>>> have the helpers, then I'm happy to do that instead.
>>>
>>>> So why does
>>>>
>>>>   int f (unsigned int x)
>>>>   {
>>>>      return x != 0;
>>>>   }
>>>>
>>>> not warn?  Probably because of promotion of the arg.
>>>
>>> [Jakub's already answered this part.]
>>>
>>>> Shouldn't we then simply never have a may/must_*_p (T1, T2)
>>>> with T1 and T2 being not compatible?  That is, force promotion
>>>> rules on them with template magic?
>>>
>>> This was what I meant by:
>>>
>>>   Or we could suppress warnings by forcibly converting the input.
>>>   Sometimes the warnings are useful though.
>>>
>>> We already do this kind of conversion for arithmetic, to ensure
>>> that poly_uint16 + poly_uint16 -> poly_int64 promotes before the
>>> addition rather than after it.  But it defeats the point of the
>>> comparison warning, which is that you're potentially redefining
>>> the sign bit.
>>>
>>> I think the warning's just as valuable for may/must comparison of
>>> non-literals as it is for normal comparison operators.  It's just
>>> unfortunate that we no longer get the special handling of literals.
>>
>> Ok, I see.
>>
>> I think I have a slight preference for using 0U consistently but I haven't
>> looked at too many patches yet to see how common/ugly that would be.
>
> OK.  FWIW, that's also how we had it until very recently.  I added the
> known/maybe stuff in a late and probably misguided attempt to make
> things prettier.
>
> I've pulled that part out and switched back to using U.  I'll post the
> new 001 patch in a sec.  Should I repost all the other patches that
> changed as well, or isn't it worth it for a change like that?

Not worth re-posting IMHO.

Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-08  9:44     ` Richard Sandiford
@ 2017-11-08 16:51       ` Martin Sebor
  2017-11-08 16:56         ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-11-08 16:51 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 11/08/2017 02:32 AM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>> I haven't done nearly a thorough review but the dtor followed by
>> the placement new in the POLY_SET_COEFF() macro caught my eye so
>> I thought I'd ask sooner rather than later.  Given the macro
>> definition:
>>
>> +   The dummy comparison against a null C * is just a way of checking
>> +   that C gives the right type.  */
>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
>> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
>> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
>> +   ? (void) ((RES).coeffs[I] = VALUE) \
>> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>>
>> is the following use well-defined?
>>
>> +template<unsigned int N, typename C>
>> +inline poly_int_pod<N, C>&
>> +poly_int_pod<N, C>::operator <<= (unsigned int a)
>> +{
>> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>>
>> It looks to me as though the VALUE argument in the ctor invoked
>> by the placement new expression is evaluated after the dtor has
>> destroyed the very array element the VALUE argument expands to.
>
> Good catch!  It should simply have been doing <<= on each coefficient --
> I must have got carried away when converting to POLY_SET_COEFF.
>
> I double-checked the other uses and think that's the only one.
>
>> Whether or not is, in fact, a problem, it seems to me that using
>> a function template rather than a macro would be a clearer and
>> safer way to do the same thing.  (Safer in that the macro also
>> evaluates its arguments multiple times, which is often a source
>> of subtle bugs.)
>
> That would slow down -O0 builds though, by introducing an extra
> function call and set of temporaries even when the coefficients
> are primitive integers.

Would decorating the function template with attribute always_inline
help?

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-08 16:51       ` Martin Sebor
@ 2017-11-08 16:56         ` Richard Sandiford
  2017-11-08 17:33           ` Martin Sebor
  2017-11-08 17:34           ` Martin Sebor
  0 siblings, 2 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-11-08 16:56 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches

Martin Sebor <msebor@gmail.com> writes:
> On 11/08/2017 02:32 AM, Richard Sandiford wrote:
>> Martin Sebor <msebor@gmail.com> writes:
>>> I haven't done nearly a thorough review but the dtor followed by
>>> the placement new in the POLY_SET_COEFF() macro caught my eye so
>>> I thought I'd ask sooner rather than later.  Given the macro
>>> definition:
>>>
>>> +   The dummy comparison against a null C * is just a way of checking
>>> +   that C gives the right type.  */
>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
>>> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
>>> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
>>> +   ? (void) ((RES).coeffs[I] = VALUE) \
>>> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>>>
>>> is the following use well-defined?
>>>
>>> +template<unsigned int N, typename C>
>>> +inline poly_int_pod<N, C>&
>>> +poly_int_pod<N, C>::operator <<= (unsigned int a)
>>> +{
>>> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>>>
>>> It looks to me as though the VALUE argument in the ctor invoked
>>> by the placement new expression is evaluated after the dtor has
>>> destroyed the very array element the VALUE argument expands to.
>>
>> Good catch!  It should simply have been doing <<= on each coefficient --
>> I must have got carried away when converting to POLY_SET_COEFF.
>>
>> I double-checked the other uses and think that's the only one.
>>
>>> Whether or not is, in fact, a problem, it seems to me that using
>>> a function template rather than a macro would be a clearer and
>>> safer way to do the same thing.  (Safer in that the macro also
>>> evaluates its arguments multiple times, which is often a source
>>> of subtle bugs.)
>>
>> That would slow down -O0 builds though, by introducing an extra
>> function call and set of temporaries even when the coefficients
>> are primitive integers.
>
> Would decorating the function template with attribute always_inline
> help?

It would remove the call itself, but we'd still have the extra temporary
objects that were the function argument and return value.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-08 16:56         ` Richard Sandiford
@ 2017-11-08 17:33           ` Martin Sebor
  2017-11-08 17:34           ` Martin Sebor
  1 sibling, 0 replies; 302+ messages in thread
From: Martin Sebor @ 2017-11-08 17:33 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 11/08/2017 09:51 AM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>> On 11/08/2017 02:32 AM, Richard Sandiford wrote:
>>> Martin Sebor <msebor@gmail.com> writes:
>>>> I haven't done nearly a thorough review but the dtor followed by
>>>> the placement new in the POLY_SET_COEFF() macro caught my eye so
>>>> I thought I'd ask sooner rather than later.  Given the macro
>>>> definition:
>>>>
>>>> +   The dummy comparison against a null C * is just a way of checking
>>>> +   that C gives the right type.  */
>>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
>>>> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
>>>> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
>>>> +   ? (void) ((RES).coeffs[I] = VALUE) \
>>>> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>>>>
>>>> is the following use well-defined?
>>>>
>>>> +template<unsigned int N, typename C>
>>>> +inline poly_int_pod<N, C>&
>>>> +poly_int_pod<N, C>::operator <<= (unsigned int a)
>>>> +{
>>>> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>>>>
>>>> It looks to me as though the VALUE argument in the ctor invoked
>>>> by the placement new expression is evaluated after the dtor has
>>>> destroyed the very array element the VALUE argument expands to.
>>>
>>> Good catch!  It should simply have been doing <<= on each coefficient --
>>> I must have got carried away when converting to POLY_SET_COEFF.
>>>
>>> I double-checked the other uses and think that's the only one.
>>>
>>>> Whether or not is, in fact, a problem, it seems to me that using
>>>> a function template rather than a macro would be a clearer and
>>>> safer way to do the same thing.  (Safer in that the macro also
>>>> evaluates its arguments multiple times, which is often a source
>>>> of subtle bugs.)
>>>
>>> That would slow down -O0 builds though, by introducing an extra
>>> function call and set of temporaries even when the coefficients
>>> are primitive integers.
>>
>> Would decorating the function template with attribute always_inline
>> help?
>
> It would remove the call itself, but we'd still have the extra temporary
> objects that were the function argument and return value.

Sorry, I do not want to get into another long discussion about
trade-offs between safety and efficiency but I'm not sure I see
what extra temporaries it would create.  It seems to me that
an inline function template that took arguments of user-defined
types by reference and others by value should be just as efficient
as a macro.

 From GCC's own manual:

   6.43 An Inline Function is As Fast As a Macro
   https://gcc.gnu.org/onlinedocs/gcc/Inline.html

If that's not the case and there is a significant performance
penalty associated with inline functions at -O0 then GCC should
be fixed to avoid it.

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-08 16:56         ` Richard Sandiford
  2017-11-08 17:33           ` Martin Sebor
@ 2017-11-08 17:34           ` Martin Sebor
  2017-11-08 18:34             ` Richard Sandiford
  1 sibling, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-11-08 17:34 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 11/08/2017 09:51 AM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>> On 11/08/2017 02:32 AM, Richard Sandiford wrote:
>>> Martin Sebor <msebor@gmail.com> writes:
>>>> I haven't done nearly a thorough review but the dtor followed by
>>>> the placement new in the POLY_SET_COEFF() macro caught my eye so
>>>> I thought I'd ask sooner rather than later.  Given the macro
>>>> definition:
>>>>
>>>> +   The dummy comparison against a null C * is just a way of checking
>>>> +   that C gives the right type.  */
>>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
>>>> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
>>>> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
>>>> +   ? (void) ((RES).coeffs[I] = VALUE) \
>>>> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>>>>
>>>> is the following use well-defined?
>>>>
>>>> +template<unsigned int N, typename C>
>>>> +inline poly_int_pod<N, C>&
>>>> +poly_int_pod<N, C>::operator <<= (unsigned int a)
>>>> +{
>>>> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>>>>
>>>> It looks to me as though the VALUE argument in the ctor invoked
>>>> by the placement new expression is evaluated after the dtor has
>>>> destroyed the very array element the VALUE argument expands to.
>>>
>>> Good catch!  It should simply have been doing <<= on each coefficient --
>>> I must have got carried away when converting to POLY_SET_COEFF.
>>>
>>> I double-checked the other uses and think that's the only one.
>>>
>>>> Whether or not is, in fact, a problem, it seems to me that using
>>>> a function template rather than a macro would be a clearer and
>>>> safer way to do the same thing.  (Safer in that the macro also
>>>> evaluates its arguments multiple times, which is often a source
>>>> of subtle bugs.)
>>>
>>> That would slow down -O0 builds though, by introducing an extra
>>> function call and set of temporaries even when the coefficients
>>> are primitive integers.
>>
>> Would decorating the function template with attribute always_inline
>> help?
>
> It would remove the call itself, but we'd still have the extra temporary
> objects that were the function argument and return value.

Sorry, I do not want to get into another long discussion about
trade-offs between safety and efficiency but I'm not sure I see
what extra temporaries it would create.  It seems to me that
an inline function template that took arguments of user-defined
types by reference and others by value should be just as efficient
as a macro.

 From GCC's own manual:

   6.43 An Inline Function is As Fast As a Macro
   https://gcc.gnu.org/onlinedocs/gcc/Inline.html

If that's not the case and there is a significant performance
penalty associated with inline functions at -O0 then GCC should
be fixed to avoid it.

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-08 17:34           ` Martin Sebor
@ 2017-11-08 18:34             ` Richard Sandiford
  2017-11-09  9:10               ` Martin Sebor
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-11-08 18:34 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches

Martin Sebor <msebor@gmail.com> writes:
> On 11/08/2017 09:51 AM, Richard Sandiford wrote:
>> Martin Sebor <msebor@gmail.com> writes:
>>> On 11/08/2017 02:32 AM, Richard Sandiford wrote:
>>>> Martin Sebor <msebor@gmail.com> writes:
>>>>> I haven't done nearly a thorough review but the dtor followed by
>>>>> the placement new in the POLY_SET_COEFF() macro caught my eye so
>>>>> I thought I'd ask sooner rather than later.  Given the macro
>>>>> definition:
>>>>>
>>>>> +   The dummy comparison against a null C * is just a way of checking
>>>>> +   that C gives the right type.  */
>>>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
>>>>> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
>>>>> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
>>>>> +   ? (void) ((RES).coeffs[I] = VALUE) \
>>>>> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>>>>>
>>>>> is the following use well-defined?
>>>>>
>>>>> +template<unsigned int N, typename C>
>>>>> +inline poly_int_pod<N, C>&
>>>>> +poly_int_pod<N, C>::operator <<= (unsigned int a)
>>>>> +{
>>>>> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>>>>>
>>>>> It looks to me as though the VALUE argument in the ctor invoked
>>>>> by the placement new expression is evaluated after the dtor has
>>>>> destroyed the very array element the VALUE argument expands to.
>>>>
>>>> Good catch!  It should simply have been doing <<= on each coefficient --
>>>> I must have got carried away when converting to POLY_SET_COEFF.
>>>>
>>>> I double-checked the other uses and think that's the only one.
>>>>
>>>>> Whether or not is, in fact, a problem, it seems to me that using
>>>>> a function template rather than a macro would be a clearer and
>>>>> safer way to do the same thing.  (Safer in that the macro also
>>>>> evaluates its arguments multiple times, which is often a source
>>>>> of subtle bugs.)
>>>>
>>>> That would slow down -O0 builds though, by introducing an extra
>>>> function call and set of temporaries even when the coefficients
>>>> are primitive integers.
>>>
>>> Would decorating the function template with attribute always_inline
>>> help?
>>
>> It would remove the call itself, but we'd still have the extra temporary
>> objects that were the function argument and return value.
>
> Sorry, I do not want to get into another long discussion about
> trade-offs between safety and efficiency but I'm not sure I see
> what extra temporaries it would create.  It seems to me that
> an inline function template that took arguments of user-defined
> types by reference and others by value should be just as efficient
> as a macro.
>
>  From GCC's own manual:
>
>    6.43 An Inline Function is As Fast As a Macro
>    https://gcc.gnu.org/onlinedocs/gcc/Inline.html

You can see the difference with something like:

  inline
  void __attribute__((always_inline))
  f(int &dst, const int &src) { dst = src; }

  int g1(const int &y) { int x; f(x, y); return x; }
  int g2(const int &y) { int x; x = y; return x; }

where *.optimized from GCC 7.1 at -O0 is:

int g1(const int&) (const int & y)
{
  int & dst;
  const int & src;
  int x;
  int D.2285;
  int _3;
  int _6;

  <bb 2> [0.00%]:
  src_5 = y_2(D);
  _6 = *src_5;
  x = _6;
  _3 = x;
  x ={v} {CLOBBER};

<L1> [0.00%]:
  return _3;

}

vs:

int g2(const int&) (const int & y)
{
  int x;
  int D.2288;
  int _4;

  <bb 2> [0.00%]:
  x_3 = *y_2(D);
  _4 = x_3;

<L0> [0.00%]:
  return _4;

}

> If that's not the case and there is a significant performance
> penalty associated with inline functions at -O0 then GCC should
> be fixed to avoid it.

I think those docs are really talking about inline functions being as
fast as macros when optimisation is enabled.  I don't think we make
any guarantees about -O0 code quality.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-08 18:34             ` Richard Sandiford
@ 2017-11-09  9:10               ` Martin Sebor
  2017-11-09 11:14                 ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-11-09  9:10 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 11/08/2017 11:28 AM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>> On 11/08/2017 09:51 AM, Richard Sandiford wrote:
>>> Martin Sebor <msebor@gmail.com> writes:
>>>> On 11/08/2017 02:32 AM, Richard Sandiford wrote:
>>>>> Martin Sebor <msebor@gmail.com> writes:
>>>>>> I haven't done nearly a thorough review but the dtor followed by
>>>>>> the placement new in the POLY_SET_COEFF() macro caught my eye so
>>>>>> I thought I'd ask sooner rather than later.  Given the macro
>>>>>> definition:
>>>>>>
>>>>>> +   The dummy comparison against a null C * is just a way of checking
>>>>>> +   that C gives the right type.  */
>>>>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
>>>>>> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
>>>>>> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
>>>>>> +   ? (void) ((RES).coeffs[I] = VALUE) \
>>>>>> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>>>>>>
>>>>>> is the following use well-defined?
>>>>>>
>>>>>> +template<unsigned int N, typename C>
>>>>>> +inline poly_int_pod<N, C>&
>>>>>> +poly_int_pod<N, C>::operator <<= (unsigned int a)
>>>>>> +{
>>>>>> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>>>>>>
>>>>>> It looks to me as though the VALUE argument in the ctor invoked
>>>>>> by the placement new expression is evaluated after the dtor has
>>>>>> destroyed the very array element the VALUE argument expands to.
>>>>>
>>>>> Good catch!  It should simply have been doing <<= on each coefficient --
>>>>> I must have got carried away when converting to POLY_SET_COEFF.
>>>>>
>>>>> I double-checked the other uses and think that's the only one.
>>>>>
>>>>>> Whether or not is, in fact, a problem, it seems to me that using
>>>>>> a function template rather than a macro would be a clearer and
>>>>>> safer way to do the same thing.  (Safer in that the macro also
>>>>>> evaluates its arguments multiple times, which is often a source
>>>>>> of subtle bugs.)
>>>>>
>>>>> That would slow down -O0 builds though, by introducing an extra
>>>>> function call and set of temporaries even when the coefficients
>>>>> are primitive integers.
>>>>
>>>> Would decorating the function template with attribute always_inline
>>>> help?
>>>
>>> It would remove the call itself, but we'd still have the extra temporary
>>> objects that were the function argument and return value.
>>
>> Sorry, I do not want to get into another long discussion about
>> trade-offs between safety and efficiency but I'm not sure I see
>> what extra temporaries it would create.  It seems to me that
>> an inline function template that took arguments of user-defined
>> types by reference and others by value should be just as efficient
>> as a macro.
>>
>>  From GCC's own manual:
>>
>>    6.43 An Inline Function is As Fast As a Macro
>>    https://gcc.gnu.org/onlinedocs/gcc/Inline.html
>
> You can see the difference with something like:
>
>   inline
>   void __attribute__((always_inline))
>   f(int &dst, const int &src) { dst = src; }
>
>   int g1(const int &y) { int x; f(x, y); return x; }
>   int g2(const int &y) { int x; x = y; return x; }

Let me say at the outset that I struggle to comprehend that a few
instructions is even a consideration when not optimizing, especially
in light of the bug the macro caused that would have been prevented
by using a function instead.  But...

...I don't think your example above is representative of using
the POLY_SET_COEFF macro.  The function template I'm suggesting
might look something to this:

   template <unsigned N, class C>
   inline void __attribute__ ((always_inline))
   poly_set_coeff (poly_int_pod<N, C> *p, unsigned idx, C val)
   {
     ((void) (&(*p).coeffs[0] == (C *) 0), 
wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void) 
((*p).coeffs[0] = val) : (void) ((*p).coeffs[0].~C (), new 
(&(*p).coeffs[0]) C (val)));

     if (N >= 2)
       for (unsigned int i = 1; i < N; i++)
         ((void) (&(*p).coeffs[0] == (C *) 0), 
wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void) 
((*p).coeffs[i] = val) : (void) ((*p).coeffs[i].~C (), new 
(&(*p).coeffs[i]) C (val)));
   }

To compare apples to apples I suggest to instead compare the shift
operator (or any other poly_int function that uses the macro) that
doesn't suffer from the bug vs one that makes use of the function
template.  I see a difference of 2 instructions on x86_64 (21 vs
23) for operator<<=.

Are two assembly instructions even worth talking about?

>> If that's not the case and there is a significant performance
>> penalty associated with inline functions at -O0 then GCC should
>> be fixed to avoid it.
>
> I think those docs are really talking about inline functions being as
> fast as macros when optimisation is enabled.  I don't think we make
> any guarantees about -O0 code quality.

Sure, but you are using unsafe macros in preference to a safer
inline function even with optimization, introducing a bug as
a result, and making an argument that the performance impact
of a few instructions when not using optimization is what should
drive the decision between one and the other in all situations.
With all respect, I fail to see the logic in this like of
reasoning.  By that argument we would never be able to define
any inline functions.

That being said, if the performance implications of using inline
functions with no optimization are so serious here then I suggest
you should be concerned about introducing the poly_int API in its
current form at all: every access to the class is an inline
function.

On a more serious/constructive note, if you really are worried
about efficiency at this level then introducing an intrinsic
primitive into the compiler instead of a set of classes might
be worth thinking about.  It will only benefit GCC but it might
lay a foundation for all sorts of infinite precision integer
classes (including the C++ proposal that was pointed out in
the other thread).

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-09  9:10               ` Martin Sebor
@ 2017-11-09 11:14                 ` Richard Sandiford
  2017-11-09 17:42                   ` Martin Sebor
  2017-11-13 17:59                   ` Jeff Law
  0 siblings, 2 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-11-09 11:14 UTC (permalink / raw)
  To: Martin Sebor; +Cc: gcc-patches

Martin Sebor <msebor@gmail.com> writes:
> On 11/08/2017 11:28 AM, Richard Sandiford wrote:
>> Martin Sebor <msebor@gmail.com> writes:
>>> On 11/08/2017 09:51 AM, Richard Sandiford wrote:
>>>> Martin Sebor <msebor@gmail.com> writes:
>>>>> On 11/08/2017 02:32 AM, Richard Sandiford wrote:
>>>>>> Martin Sebor <msebor@gmail.com> writes:
>>>>>>> I haven't done nearly a thorough review but the dtor followed by
>>>>>>> the placement new in the POLY_SET_COEFF() macro caught my eye so
>>>>>>> I thought I'd ask sooner rather than later.  Given the macro
>>>>>>> definition:
>>>>>>>
>>>>>>> +   The dummy comparison against a null C * is just a way of checking
>>>>>>> +   that C gives the right type.  */
>>>>>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
>>>>>>> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
>>>>>>> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
>>>>>>> +   ? (void) ((RES).coeffs[I] = VALUE) \
>>>>>>> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>>>>>>>
>>>>>>> is the following use well-defined?
>>>>>>>
>>>>>>> +template<unsigned int N, typename C>
>>>>>>> +inline poly_int_pod<N, C>&
>>>>>>> +poly_int_pod<N, C>::operator <<= (unsigned int a)
>>>>>>> +{
>>>>>>> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>>>>>>>
>>>>>>> It looks to me as though the VALUE argument in the ctor invoked
>>>>>>> by the placement new expression is evaluated after the dtor has
>>>>>>> destroyed the very array element the VALUE argument expands to.
>>>>>>
>>>>>> Good catch!  It should simply have been doing <<= on each coefficient --
>>>>>> I must have got carried away when converting to POLY_SET_COEFF.
>>>>>>
>>>>>> I double-checked the other uses and think that's the only one.
>>>>>>
>>>>>>> Whether or not is, in fact, a problem, it seems to me that using
>>>>>>> a function template rather than a macro would be a clearer and
>>>>>>> safer way to do the same thing.  (Safer in that the macro also
>>>>>>> evaluates its arguments multiple times, which is often a source
>>>>>>> of subtle bugs.)
>>>>>>
>>>>>> That would slow down -O0 builds though, by introducing an extra
>>>>>> function call and set of temporaries even when the coefficients
>>>>>> are primitive integers.
>>>>>
>>>>> Would decorating the function template with attribute always_inline
>>>>> help?
>>>>
>>>> It would remove the call itself, but we'd still have the extra temporary
>>>> objects that were the function argument and return value.
>>>
>>> Sorry, I do not want to get into another long discussion about
>>> trade-offs between safety and efficiency but I'm not sure I see
>>> what extra temporaries it would create.  It seems to me that
>>> an inline function template that took arguments of user-defined
>>> types by reference and others by value should be just as efficient
>>> as a macro.
>>>
>>>  From GCC's own manual:
>>>
>>>    6.43 An Inline Function is As Fast As a Macro
>>>    https://gcc.gnu.org/onlinedocs/gcc/Inline.html
>>
>> You can see the difference with something like:
>>
>>   inline
>>   void __attribute__((always_inline))
>>   f(int &dst, const int &src) { dst = src; }
>>
>>   int g1(const int &y) { int x; f(x, y); return x; }
>>   int g2(const int &y) { int x; x = y; return x; }
>
> Let me say at the outset that I struggle to comprehend that a few
> instructions is even a consideration when not optimizing, especially
> in light of the bug the macro caused that would have been prevented
> by using a function instead.  But...

Many people still build at -O0 though.  One of the things I was asked
for was the time it takes to build stage 2 with an -O0 stage 1
(where stage 1 would usually be built by the host compiler).

> ...I don't think your example above is representative of using
> the POLY_SET_COEFF macro.  The function template I'm suggesting
> might look something to this:
>
>    template <unsigned N, class C>
>    inline void __attribute__ ((always_inline))
>    poly_set_coeff (poly_int_pod<N, C> *p, unsigned idx, C val)
>    {
>      ((void) (&(*p).coeffs[0] == (C *) 0), 
> wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void) 
> ((*p).coeffs[0] = val) : (void) ((*p).coeffs[0].~C (), new 
> (&(*p).coeffs[0]) C (val)));
>
>      if (N >= 2)
>        for (unsigned int i = 1; i < N; i++)
>          ((void) (&(*p).coeffs[0] == (C *) 0), 
> wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void) 
> ((*p).coeffs[i] = val) : (void) ((*p).coeffs[i].~C (), new 
> (&(*p).coeffs[i]) C (val)));
>    }

That ignores the idx parameter and sets all coefficents to val.  Did you
mean somnething like:

   template <unsigned N, typename C1, typename C2>
   inline void __attribute__ ((always_inline))
   poly_set_coeff (poly_int_pod<N, C1> *p, unsigned idx, C2 val)
   {
     wi::int_traits<C1>::precision_type == wi::FLEXIBLE_PRECISION ? (void) ((*p).coeffs[idx] = val) : (void) ((*p).coeffs[idx].~C1 (), new (&(*p).coeffs[idx]) C1 (val));
   }

?  If so...

> To compare apples to apples I suggest to instead compare the shift
> operator (or any other poly_int function that uses the macro) that
> doesn't suffer from the bug vs one that makes use of the function
> template.  I see a difference of 2 instructions on x86_64 (21 vs
> 23) for operator<<=.
>
> Are two assembly instructions even worth talking about?

...the problem is that passing C by value defeats the point of the
optimisation:

  /* RES is a poly_int result that has coefficients of type C and that
     is being built up a coefficient at a time.  Set coefficient number I
     to VALUE in the most efficient way possible.

     For primitive C it is better to assign directly, since it avoids
     any further calls and so is more efficient when the compiler is
     built at -O0.  But for wide-int based C it is better to construct
     the value in-place.  This means that calls out to a wide-int.cc
     routine can take the address of RES rather than the address of
     a temporary.

With the inline function, the wide-int.cc routines will be taking
the address of the temporary "val" object, which will then be used
to initialise the target object via a copy.  The macro was there
to avoid the copy.

E.g. for a normal --enable-checking=release build of current sources
on x86_64, mem_ref_offset is:

0000000000000034 T mem_ref_offset(tree_node const*)

With the POLY_SET_COEFF macro it's the same size (and code) with
poly-int.h:

0000000000000034 T mem_ref_offset(tree_node const*)

But using the function above gives:

0000000000000058 T mem_ref_offset(tree_node const*)

which is very similar to what we'd get by assigning to the coefficients
normally.

This kind of thing happened in quite a few other places.  mem_ref_offset
is just a nice example because it's so self-contained.  And it did have
a measurable effect on the speed of the compiler.

That's why the cut-down version quoted above passed the source by
reference too.  Doing that, i.e.:

   template <unsigned N, typename C1, typename C2>
   inline void __attribute__ ((always_inline))
   poly_set_coeff (poly_int_pod<N, C1> *p, unsigned idx, const C2 &val)

gives:

0000000000000052 T mem_ref_offset(tree_node const*)

But the use of this inline function in <<= would be just as incorrect as
using the macro.

[These are all sizes for normally-optimised release builds]

>>> If that's not the case and there is a significant performance
>>> penalty associated with inline functions at -O0 then GCC should
>>> be fixed to avoid it.
>>
>> I think those docs are really talking about inline functions being as
>> fast as macros when optimisation is enabled.  I don't think we make
>> any guarantees about -O0 code quality.
>
> Sure, but you are using unsafe macros in preference to a safer
> inline function even with optimization, introducing a bug as
> a result, and making an argument that the performance impact
> of a few instructions when not using optimization is what should
> drive the decision between one and the other in all situations.
> With all respect, I fail to see the logic in this like of
> reasoning.  By that argument we would never be able to define
> any inline functions.
>
> That being said, if the performance implications of using inline
> functions with no optimization are so serious here then I suggest
> you should be concerned about introducing the poly_int API in its
> current form at all: every access to the class is an inline
> function.

It's a trade-off.  It would be very difficult to do poly-int.h
via macros without making the changes even more invasive.
But with a case like this, where we *can* do something common
via a macro, I think using the macro makes sense.  Especially
when it's local to the file rather than a "public" interface.

> On a more serious/constructive note, if you really are worried
> about efficiency at this level then introducing an intrinsic
> primitive into the compiler instead of a set of classes might
> be worth thinking about.  It will only benefit GCC but it might
> lay a foundation for all sorts of infinite precision integer
> classes (including the C++ proposal that was pointed out in
> the other thread).

This has to work with host compilers other than GCC though.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-09 11:14                 ` Richard Sandiford
@ 2017-11-09 17:42                   ` Martin Sebor
  2017-11-13 17:59                   ` Jeff Law
  1 sibling, 0 replies; 302+ messages in thread
From: Martin Sebor @ 2017-11-09 17:42 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 11/09/2017 04:06 AM, Richard Sandiford wrote:
> Martin Sebor <msebor@gmail.com> writes:
>> On 11/08/2017 11:28 AM, Richard Sandiford wrote:
>>> Martin Sebor <msebor@gmail.com> writes:
>>>> On 11/08/2017 09:51 AM, Richard Sandiford wrote:
>>>>> Martin Sebor <msebor@gmail.com> writes:
>>>>>> On 11/08/2017 02:32 AM, Richard Sandiford wrote:
>>>>>>> Martin Sebor <msebor@gmail.com> writes:
>>>>>>>> I haven't done nearly a thorough review but the dtor followed by
>>>>>>>> the placement new in the POLY_SET_COEFF() macro caught my eye so
>>>>>>>> I thought I'd ask sooner rather than later.  Given the macro
>>>>>>>> definition:
>>>>>>>>
>>>>>>>> +   The dummy comparison against a null C * is just a way of checking
>>>>>>>> +   that C gives the right type.  */
>>>>>>>> +#define POLY_SET_COEFF(C, RES, I, VALUE) \
>>>>>>>> +  ((void) (&(RES).coeffs[0] == (C *) 0), \
>>>>>>>> +   wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION \
>>>>>>>> +   ? (void) ((RES).coeffs[I] = VALUE) \
>>>>>>>> +   : (void) ((RES).coeffs[I].~C (), new (&(RES).coeffs[I]) C (VALUE)))
>>>>>>>>
>>>>>>>> is the following use well-defined?
>>>>>>>>
>>>>>>>> +template<unsigned int N, typename C>
>>>>>>>> +inline poly_int_pod<N, C>&
>>>>>>>> +poly_int_pod<N, C>::operator <<= (unsigned int a)
>>>>>>>> +{
>>>>>>>> +  POLY_SET_COEFF (C, *this, 0, this->coeffs[0] << a);
>>>>>>>>
>>>>>>>> It looks to me as though the VALUE argument in the ctor invoked
>>>>>>>> by the placement new expression is evaluated after the dtor has
>>>>>>>> destroyed the very array element the VALUE argument expands to.
>>>>>>>
>>>>>>> Good catch!  It should simply have been doing <<= on each coefficient --
>>>>>>> I must have got carried away when converting to POLY_SET_COEFF.
>>>>>>>
>>>>>>> I double-checked the other uses and think that's the only one.
>>>>>>>
>>>>>>>> Whether or not is, in fact, a problem, it seems to me that using
>>>>>>>> a function template rather than a macro would be a clearer and
>>>>>>>> safer way to do the same thing.  (Safer in that the macro also
>>>>>>>> evaluates its arguments multiple times, which is often a source
>>>>>>>> of subtle bugs.)
>>>>>>>
>>>>>>> That would slow down -O0 builds though, by introducing an extra
>>>>>>> function call and set of temporaries even when the coefficients
>>>>>>> are primitive integers.
>>>>>>
>>>>>> Would decorating the function template with attribute always_inline
>>>>>> help?
>>>>>
>>>>> It would remove the call itself, but we'd still have the extra temporary
>>>>> objects that were the function argument and return value.
>>>>
>>>> Sorry, I do not want to get into another long discussion about
>>>> trade-offs between safety and efficiency but I'm not sure I see
>>>> what extra temporaries it would create.  It seems to me that
>>>> an inline function template that took arguments of user-defined
>>>> types by reference and others by value should be just as efficient
>>>> as a macro.
>>>>
>>>>  From GCC's own manual:
>>>>
>>>>    6.43 An Inline Function is As Fast As a Macro
>>>>    https://gcc.gnu.org/onlinedocs/gcc/Inline.html
>>>
>>> You can see the difference with something like:
>>>
>>>   inline
>>>   void __attribute__((always_inline))
>>>   f(int &dst, const int &src) { dst = src; }
>>>
>>>   int g1(const int &y) { int x; f(x, y); return x; }
>>>   int g2(const int &y) { int x; x = y; return x; }
>>
>> Let me say at the outset that I struggle to comprehend that a few
>> instructions is even a consideration when not optimizing, especially
>> in light of the bug the macro caused that would have been prevented
>> by using a function instead.  But...
>
> Many people still build at -O0 though.  One of the things I was asked
> for was the time it takes to build stage 2 with an -O0 stage 1
> (where stage 1 would usually be built by the host compiler).

Yes, of course.  I do all my development and basic testing at
-O0.  But I don't expect the performance to be comparable to
-O2.  I'd be surprised if anyone did.  What I do expect at -O0
(and what I'm grateful for) is GCC to make it easier to find
bugs in my code by enabling extra checks, even if they come at
the expense of slower execution.

That said, if your enhancement has such dramatic performance
implications at -O0 that the only way to avoid them is by using
macros then I would say it's not appropriate.

>> ...I don't think your example above is representative of using
>> the POLY_SET_COEFF macro.  The function template I'm suggesting
>> might look something to this:
>>
>>    template <unsigned N, class C>
>>    inline void __attribute__ ((always_inline))
>>    poly_set_coeff (poly_int_pod<N, C> *p, unsigned idx, C val)
>>    {
>>      ((void) (&(*p).coeffs[0] == (C *) 0),
>> wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void)
>> ((*p).coeffs[0] = val) : (void) ((*p).coeffs[0].~C (), new
>> (&(*p).coeffs[0]) C (val)));
>>
>>      if (N >= 2)
>>        for (unsigned int i = 1; i < N; i++)
>>          ((void) (&(*p).coeffs[0] == (C *) 0),
>> wi::int_traits<C>::precision_type == wi::FLEXIBLE_PRECISION ? (void)
>> ((*p).coeffs[i] = val) : (void) ((*p).coeffs[i].~C (), new
>> (&(*p).coeffs[i]) C (val)));
>>    }
>
> That ignores the idx parameter and sets all coefficents to val.  Did you
> mean somnething like:
>
>    template <unsigned N, typename C1, typename C2>
>    inline void __attribute__ ((always_inline))
>    poly_set_coeff (poly_int_pod<N, C1> *p, unsigned idx, C2 val)
>    {
>      wi::int_traits<C1>::precision_type == wi::FLEXIBLE_PRECISION ? (void) ((*p).coeffs[idx] = val) : (void) ((*p).coeffs[idx].~C1 (), new (&(*p).coeffs[idx]) C1 (val));
>    }
>
> ?  If so...

Yes, I didn't have it quite right.  With the above there's
a difference of three x86_64 mov instructions at -O0.  The code
at -O2 is still identical.

>> To compare apples to apples I suggest to instead compare the shift
>> operator (or any other poly_int function that uses the macro) that
>> doesn't suffer from the bug vs one that makes use of the function
>> template.  I see a difference of 2 instructions on x86_64 (21 vs
>> 23) for operator<<=.
>>
>> Are two assembly instructions even worth talking about?
>
> ...the problem is that passing C by value defeats the point of the
> optimisation:
>
>   /* RES is a poly_int result that has coefficients of type C and that
>      is being built up a coefficient at a time.  Set coefficient number I
>      to VALUE in the most efficient way possible.
>
>      For primitive C it is better to assign directly, since it avoids
>      any further calls and so is more efficient when the compiler is
>      built at -O0.  But for wide-int based C it is better to construct
>      the value in-place.  This means that calls out to a wide-int.cc
>      routine can take the address of RES rather than the address of
>      a temporary.
>
> With the inline function, the wide-int.cc routines will be taking
> the address of the temporary "val" object, which will then be used
> to initialise the target object via a copy.  The macro was there
> to avoid the copy.

There are many ways to write code that is both safe and efficient
that don't involve resorting to convoluted, error-prone macros.
I don't quite see how passing a fundamental type by value to
to an inline function can be a bottleneck.  If C can be both
a fundamental type or an aggregate or then the aggregate case
can be optimized by providing an overload or specialization.
But an answer that "the only way to write the code is by using
a macro" cannot possibly be acceptable (maybe 20 years ago but
not today).

> But the use of this inline function in <<= would be just as incorrect as
> using the macro.
>
> [These are all sizes for normally-optimised release builds]

We seem to have very different priorities.  I make mistakes all
the time and so I need all the help I can get to find problems
in my code.  Microoptimizations that make it easier to get
things wrong make debugging harder.  In both of the cases we
have discussed -- the no-op ctor and the macro this has actually
happened, to both of us.  -O0 is for meant for development and
should make it easy to avoid mistakes (as all the GCC checking
tries to do).  If the overhead of an always-inline function is
unacceptable when optimizing then use a macro if you must but
only then and please at least make the -O0 default safe.  And
open a bug to get the inefficiency fixed.  Otherwise we can't
really claim that An Inline Function is As Fast As a Macro.

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-09 11:14                 ` Richard Sandiford
  2017-11-09 17:42                   ` Martin Sebor
@ 2017-11-13 17:59                   ` Jeff Law
  2017-11-13 23:57                     ` Richard Sandiford
  1 sibling, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-11-13 17:59 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches, richard.sandiford

On 11/09/2017 04:06 AM, Richard Sandiford wrote:

>> Let me say at the outset that I struggle to comprehend that a few
>> instructions is even a consideration when not optimizing, especially
>> in light of the bug the macro caused that would have been prevented
>> by using a function instead.  But...
> 
> Many people still build at -O0 though.  One of the things I was asked
> for was the time it takes to build stage 2 with an -O0 stage 1
> (where stage 1 would usually be built by the host compiler).
I suspect folks are concerned about this because it potentially affects
their daily development cycle times.  So they're looking to see if the
introduction of the poly types has a significant impact.  It's a
legitimate question, particularly for the introduction of low level
infrastructure that potentially gets hit a lot.

Richard, what were the results of that test (if it's elsewhere in the
thread I'll eventually find it...  I'm just starting to try and make
some headway on this kit).

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-13 17:59                   ` Jeff Law
@ 2017-11-13 23:57                     ` Richard Sandiford
  2017-11-14  1:21                       ` Martin Sebor
  2017-11-17  3:31                       ` Jeff Law
  0 siblings, 2 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-11-13 23:57 UTC (permalink / raw)
  To: Jeff Law; +Cc: Martin Sebor, gcc-patches

Jeff Law <law@redhat.com> writes:
> On 11/09/2017 04:06 AM, Richard Sandiford wrote:
>
>>> Let me say at the outset that I struggle to comprehend that a few
>>> instructions is even a consideration when not optimizing, especially
>>> in light of the bug the macro caused that would have been prevented
>>> by using a function instead.  But...
>> 
>> Many people still build at -O0 though.  One of the things I was asked
>> for was the time it takes to build stage 2 with an -O0 stage 1
>> (where stage 1 would usually be built by the host compiler).
> I suspect folks are concerned about this because it potentially affects
> their daily development cycle times.  So they're looking to see if the
> introduction of the poly types has a significant impact.  It's a
> legitimate question, particularly for the introduction of low level
> infrastructure that potentially gets hit a lot.
>
> Richard, what were the results of that test (if it's elsewhere in the
> thread I'll eventually find it...

On an x86_64 box I got:

real: +7%
user: +8.6%

for building stage2 with an -O0 -g stage1.  For aarch64 with the
NUM_POLY_INT_COEFFS==2 change it was:

real: +17%
user: +20%

That's obviously not ideal, but C++11 would get rid of some of the
inefficiencies, once we can switch to that.

You've probably already seen this, but it's compile-time neutral on
x86_64 in terms of running a gcc built with --enable-checking=release,
within a margin of about [-0.1%, 0.1%].

For aarch64 with NUM_POLY_INT_COEFFS==2, a gcc built with
--enable-checking=release is ~1% slower when using -g and ~2%
slower with -O2 -g.

> I'm just starting to try and make some headway on this kit).

Thanks :-)  I guess it's going to be a real slog going through them,
sorry, even despite the attempt to split them up.

Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-08 10:03   ` Richard Sandiford
@ 2017-11-14  0:42     ` Richard Sandiford
  2017-12-06 20:11       ` Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-11-14  0:42 UTC (permalink / raw)
  To: gcc-patches

Richard Sandiford <richard.sandiford@linaro.org> writes:
> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> This patch adds a new "poly_int" class to represent polynomial integers
>> of the form:
>>
>>   C0 + C1*X1 + C2*X2 ... + Cn*Xn
>>
>> It also adds poly_int-based typedefs for offsets and sizes of various
>> precisions.  In these typedefs, the Ci coefficients are compile-time
>> constants and the Xi indeterminates are run-time invariants.  The number
>> of coefficients is controlled by the target and is initially 1 for all
>> ports.
>>
>> Most routines can handle general coefficient counts, but for now a few
>> are specific to one or two coefficients.  Support for other coefficient
>> counts can be added when needed.
>>
>> The patch also adds a new macro, IN_TARGET_CODE, that can be
>> set to indicate that a TU contains target-specific rather than
>> target-independent code.  When this macro is set and the number of
>> coefficients is 1, the poly-int.h classes define a conversion operator
>> to a constant.  This allows most existing target code to work without
>> modification.  The main exceptions are:
>>
>> - values passed through ..., which need an explicit conversion to a
>>   constant
>>
>> - ?: expression in which one arm ends up being a polynomial and the
>>   other remains a constant.  In these cases it would be valid to convert
>>   the constant to a polynomial and the polynomial to a constant, so a
>>   cast is needed to break the ambiguity.
>>
>> The patch also adds a new target hook to return the estimated
>> value of a polynomial for costing purposes.
>>
>> The patch also adds operator<< on wide_ints (it was already defined
>> for offset_int and widest_int).  I think this was originally excluded
>> because >> is ambiguous for wide_int, but << is useful for converting
>> bytes to bits, etc., so is worth defining on its own.  The patch also
>> adds operator% and operator/ for offset_int and widest_int, since those
>> types are always signed.  These changes allow the poly_int interface to
>> be more predictable.
>>
>> I'd originally tried adding the tests as selftests, but that ended up
>> bloating cc1 by at least a third.  It also took a while to build them
>> at -O2.  The patch therefore uses plugin tests instead, where we can
>> force the tests to be built at -O0.  They still run in negligible time
>> when built that way.
>
> Changes in v2:
>
> - Drop the controversial known_zero etc. wrapper functions.
> - Fix the operator<<= bug that Martin found.
> - Switch from "t" to "type" in SFINAE classes (requested by Martin).
>
> Not changed in v2:
>
> - Default constructors are still empty.  I agree it makes sense to use
>   "= default" when we switch to C++11, but it would be dangerous for
>   that to make "poly_int64 x;" less defined than it is now.

After talking about this a bit more internally, it was obvious that
the choice of "must" and "may" for the predicate names was a common
sticking point.  The idea was to match the names of alias predicates,
but given my track record with names ("too_empty_p" being a recently
questioned example :-)), I'd be happy to rename them to something else.
Some alternatives we came up with were:

- known_eq / maybe_eq / known_lt / maybe_lt etc.

  Some functions already use "known" and "maybe", so this would arguably
  be more consistent than using "must" and "may".

- always_eq / sometimes_eq / always_lt / sometimes_lt

  Similar to the previous one in intent.  It's just a question of which
  wordng is clearer.

- forall_eq / exists_eq / forall_lt / exists_lt etc.

  Matches the usual logic quantifiers.  This seems quite appealing,
  as long as it's obvious that in:

    forall_eq (v0, v1)

  v0 and v1 themselves are already bound: if vi == ai + bi*X then
  what we really saying is:

    forall X, a0 + b0*X == a1 + b1*X 

Which of those sounds best?  Any other suggestions?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-13 23:57                     ` Richard Sandiford
@ 2017-11-14  1:21                       ` Martin Sebor
  2017-11-14  9:46                         ` Richard Sandiford
  2017-11-17  3:31                       ` Jeff Law
  1 sibling, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-11-14  1:21 UTC (permalink / raw)
  To: Jeff Law, gcc-patches, richard.sandiford

On 11/13/2017 04:36 PM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
>> On 11/09/2017 04:06 AM, Richard Sandiford wrote:
>>
>>>> Let me say at the outset that I struggle to comprehend that a few
>>>> instructions is even a consideration when not optimizing, especially
>>>> in light of the bug the macro caused that would have been prevented
>>>> by using a function instead.  But...
>>>
>>> Many people still build at -O0 though.  One of the things I was asked
>>> for was the time it takes to build stage 2 with an -O0 stage 1
>>> (where stage 1 would usually be built by the host compiler).
>> I suspect folks are concerned about this because it potentially affects
>> their daily development cycle times.  So they're looking to see if the
>> introduction of the poly types has a significant impact.  It's a
>> legitimate question, particularly for the introduction of low level
>> infrastructure that potentially gets hit a lot.
>>
>> Richard, what were the results of that test (if it's elsewhere in the
>> thread I'll eventually find it...
>
> On an x86_64 box I got:
>
> real: +7%
> user: +8.6%
>
> for building stage2 with an -O0 -g stage1.  For aarch64 with the
> NUM_POLY_INT_COEFFS==2 change it was:
>
> real: +17%
> user: +20%
>
> That's obviously not ideal, but C++11 would get rid of some of the
> inefficiencies, once we can switch to that.

For the purposes of this discussion, what would the numbers look
like if the macro were replaced with the inline function as I
suggested?

What impact on the numbers would having the default ctor actually
initialize the object have? (As opposed to leaving it uninitialized.)

I don't want to make a bigger deal out of this macro than it
already is.  Unlike the wide int constructors, it's
an implementation detail that, when correct, almost no-one will
have to worry about.  The main reason for my strenuous objections
is not the macro itself but the philosophy that performance,
especially at -O0, should be an overriding consideration.  Code
should be safe first and foremost.  Otherwise, the few cycles we
might save by writing unsafe but fast code will be wasted in
debugging sessions.

Martin

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-14  1:21                       ` Martin Sebor
@ 2017-11-14  9:46                         ` Richard Sandiford
  0 siblings, 0 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-11-14  9:46 UTC (permalink / raw)
  To: Martin Sebor; +Cc: Jeff Law, gcc-patches

Martin Sebor <msebor@gmail.com> writes:
> On 11/13/2017 04:36 PM, Richard Sandiford wrote:
>> Jeff Law <law@redhat.com> writes:
>>> On 11/09/2017 04:06 AM, Richard Sandiford wrote:
>>>
>>>>> Let me say at the outset that I struggle to comprehend that a few
>>>>> instructions is even a consideration when not optimizing, especially
>>>>> in light of the bug the macro caused that would have been prevented
>>>>> by using a function instead.  But...
>>>>
>>>> Many people still build at -O0 though.  One of the things I was asked
>>>> for was the time it takes to build stage 2 with an -O0 stage 1
>>>> (where stage 1 would usually be built by the host compiler).
>>> I suspect folks are concerned about this because it potentially affects
>>> their daily development cycle times.  So they're looking to see if the
>>> introduction of the poly types has a significant impact.  It's a
>>> legitimate question, particularly for the introduction of low level
>>> infrastructure that potentially gets hit a lot.
>>>
>>> Richard, what were the results of that test (if it's elsewhere in the
>>> thread I'll eventually find it...
>>
>> On an x86_64 box I got:
>>
>> real: +7%
>> user: +8.6%
>>
>> for building stage2 with an -O0 -g stage1.  For aarch64 with the
>> NUM_POLY_INT_COEFFS==2 change it was:
>>
>> real: +17%
>> user: +20%
>>
>> That's obviously not ideal, but C++11 would get rid of some of the
>> inefficiencies, once we can switch to that.
>
> For the purposes of this discussion, what would the numbers look
> like if the macro were replaced with the inline function as I
> suggested?
>
> What impact on the numbers would having the default ctor actually
> initialize the object have? (As opposed to leaving it uninitialized.)

I was objecting to that for semantic reasons[*], not performance when
built with -O0.  I realise you don't agree, but I don't think either of
us is going to convince the other here.

> I don't want to make a bigger deal out of this macro than it
> already is.  Unlike the wide int constructors, it's
> an implementation detail that, when correct, almost no-one will
> have to worry about.  The main reason for my strenuous objections
> is not the macro itself but the philosophy that performance,
> especially at -O0, should be an overriding consideration.  Code
> should be safe first and foremost.  Otherwise, the few cycles we
> might save by writing unsafe but fast code will be wasted in
> debugging sessions.

But the macro was originally added to improve release builds,
not -O0 builds.  It replaced plain assignments of the form:

  r.coeffs[0] = ...;

Using an inline function instead of a macro is no better than
the original call to operator=(); see the mem_ref_offset figures
I gave earlier.

Thanks,
Richard


[*] Which were:

1) Not all types used with poly_int have a single meaningful initial
   value (wide_int).

2) It prevents useful static warnings about uninitialised variables.
   The fact that we don't warn in every case doesn't defeat this IMO.

3) Using C++11 "= default" is a good compromise, but making poly_ints
   always initialised by default now would make it too dangerous to
   switch to "= default" in future.

4) Conditionally using "= default" when being built with C++11
   compilers and something else when being built with C++03 compilers
   would be too dangerous, since we don't want the semantics of the
   class to depend on host compiler.

5) AFAIK, the only case that would be handled differently by the
   current "() {}" constructor and "= default" is the use of
   "T ()" to construct a zeroed T.  IMO, "T (0)" is better than "T ()"
   anyway because (a) it makes it obvious that you're initialising it to
   the numerical value 0, rather than simply zeroed memory contents, and
   (b) it will give a compile error if T doesn't have a single zero
   representation (as for wide_int).

Those are all independent of whatever the -O0 performance regression
would be from unconditional initialisation.

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-13 23:57                     ` Richard Sandiford
  2017-11-14  1:21                       ` Martin Sebor
@ 2017-11-17  3:31                       ` Jeff Law
  1 sibling, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-17  3:31 UTC (permalink / raw)
  To: Martin Sebor, gcc-patches, richard.sandiford

On 11/13/2017 04:36 PM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
>> On 11/09/2017 04:06 AM, Richard Sandiford wrote:
>>
>>>> Let me say at the outset that I struggle to comprehend that a few
>>>> instructions is even a consideration when not optimizing, especially
>>>> in light of the bug the macro caused that would have been prevented
>>>> by using a function instead.  But...
>>>
>>> Many people still build at -O0 though.  One of the things I was asked
>>> for was the time it takes to build stage 2 with an -O0 stage 1
>>> (where stage 1 would usually be built by the host compiler).
>> I suspect folks are concerned about this because it potentially affects
>> their daily development cycle times.  So they're looking to see if the
>> introduction of the poly types has a significant impact.  It's a
>> legitimate question, particularly for the introduction of low level
>> infrastructure that potentially gets hit a lot.
>>
>> Richard, what were the results of that test (if it's elsewhere in the
>> thread I'll eventually find it...
> 
> On an x86_64 box I got:
> 
> real: +7%
> user: +8.6%
> 
> for building stage2 with an -O0 -g stage1.  For aarch64 with the
> NUM_POLY_INT_COEFFS==2 change it was:
> 
> real: +17%
> user: +20%
> 
> That's obviously not ideal, but C++11 would get rid of some of the
> inefficiencies, once we can switch to that.
Ouch.  But I guess to some extent it has to be expected given what
you've got to do under the hood.


> 
> You've probably already seen this, but it's compile-time neutral on
> x86_64 in terms of running a gcc built with --enable-checking=release,
> within a margin of about [-0.1%, 0.1%].
Good.  Presumably that's because it all just falls out and turns into
wi:: stuff on targets that don't need the poly stuff.


> 
> For aarch64 with NUM_POLY_INT_COEFFS==2, a gcc built with
> --enable-checking=release is ~1% slower when using -g and ~2%
> slower with -O2 -g.
That's not terrible given what's going on here.


I'm still pondering the whole construction/initialization and temporary
objects issue.    I may try to work through some of the actual patches,
then come back to those issues.


> 
>> I'm just starting to try and make some headway on this kit).
> 
> Thanks :-)  I guess it's going to be a real slog going through them,
> sorry, even despite the attempt to split them up.
No worries.  It's what we sign up for :-)  Your deep testing and long
history with the project really help in that if something goes wrong I
know you're going to be around to fix it.

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [002/nnn] poly_int: IN_TARGET_CODE
  2017-10-23 16:59 ` [002/nnn] poly_int: IN_TARGET_CODE Richard Sandiford
@ 2017-11-17  3:35   ` Jeff Law
  2017-12-15  1:08     ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-11-17  3:35 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 10:58 AM, Richard Sandiford wrote:
> This patch makes each target-specific TU define an IN_TARGET_CODE macro,
> which is used to decide whether poly_int<1, C> should convert to C.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* genattrtab.c (write_header): Define IN_TARGET_CODE to 1 in the
> 	target C file.
> 	* genautomata.c (main): Likewise.
> 	* genconditions.c (write_header): Likewise.
> 	* genemit.c (main): Likewise.
> 	* genextract.c (print_header): Likewise.
> 	* genopinit.c (main): Likewise.
> 	* genoutput.c (output_prologue): Likewise.
> 	* genpeep.c (main): Likewise.
> 	* genpreds.c (write_insn_preds_c): Likewise.
> 	* genrecog.c (writer_header): Likewise.
> 	* config/aarch64/aarch64-builtins.c (IN_TARGET_CODE): Define.
> 	* config/aarch64/aarch64-c.c (IN_TARGET_CODE): Likewise.
> 	* config/aarch64/aarch64.c (IN_TARGET_CODE): Likewise.
> 	* config/aarch64/cortex-a57-fma-steering.c (IN_TARGET_CODE): Likewise.
> 	* config/aarch64/driver-aarch64.c (IN_TARGET_CODE): Likewise.
> 	* config/alpha/alpha.c (IN_TARGET_CODE): Likewise.
> 	* config/alpha/driver-alpha.c (IN_TARGET_CODE): Likewise.
> 	* config/arc/arc-c.c (IN_TARGET_CODE): Likewise.
> 	* config/arc/arc.c (IN_TARGET_CODE): Likewise.
> 	* config/arc/driver-arc.c (IN_TARGET_CODE): Likewise.
> 	* config/arm/aarch-common.c (IN_TARGET_CODE): Likewise.
> 	* config/arm/arm-builtins.c (IN_TARGET_CODE): Likewise.
> 	* config/arm/arm-c.c (IN_TARGET_CODE): Likewise.
> 	* config/arm/arm.c (IN_TARGET_CODE): Likewise.
> 	* config/arm/driver-arm.c (IN_TARGET_CODE): Likewise.
> 	* config/avr/avr-c.c (IN_TARGET_CODE): Likewise.
> 	* config/avr/avr-devices.c (IN_TARGET_CODE): Likewise.
> 	* config/avr/avr-log.c (IN_TARGET_CODE): Likewise.
> 	* config/avr/avr.c (IN_TARGET_CODE): Likewise.
> 	* config/avr/driver-avr.c (IN_TARGET_CODE): Likewise.
> 	* config/avr/gen-avr-mmcu-specs.c (IN_TARGET_CODE): Likewise.
> 	* config/bfin/bfin.c (IN_TARGET_CODE): Likewise.
> 	* config/c6x/c6x.c (IN_TARGET_CODE): Likewise.
> 	* config/cr16/cr16.c (IN_TARGET_CODE): Likewise.
> 	* config/cris/cris.c (IN_TARGET_CODE): Likewise.
> 	* config/darwin.c (IN_TARGET_CODE): Likewise.
> 	* config/epiphany/epiphany.c (IN_TARGET_CODE): Likewise.
> 	* config/epiphany/mode-switch-use.c (IN_TARGET_CODE): Likewise.
> 	* config/epiphany/resolve-sw-modes.c (IN_TARGET_CODE): Likewise.
> 	* config/fr30/fr30.c (IN_TARGET_CODE): Likewise.
> 	* config/frv/frv.c (IN_TARGET_CODE): Likewise.
> 	* config/ft32/ft32.c (IN_TARGET_CODE): Likewise.
> 	* config/h8300/h8300.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/djgpp.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/driver-i386.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/driver-mingw32.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/host-cygwin.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/host-i386-darwin.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/host-mingw32.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/i386-c.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/i386.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/intelmic-mkoffload.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/msformat-c.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/winnt-cxx.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/winnt-stubs.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/winnt.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/x86-tune-sched-atom.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/x86-tune-sched-bd.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/x86-tune-sched-core.c (IN_TARGET_CODE): Likewise.
> 	* config/i386/x86-tune-sched.c (IN_TARGET_CODE): Likewise.
> 	* config/ia64/ia64-c.c (IN_TARGET_CODE): Likewise.
> 	* config/ia64/ia64.c (IN_TARGET_CODE): Likewise.
> 	* config/iq2000/iq2000.c (IN_TARGET_CODE): Likewise.
> 	* config/lm32/lm32.c (IN_TARGET_CODE): Likewise.
> 	* config/m32c/m32c-pragma.c (IN_TARGET_CODE): Likewise.
> 	* config/m32c/m32c.c (IN_TARGET_CODE): Likewise.
> 	* config/m32r/m32r.c (IN_TARGET_CODE): Likewise.
> 	* config/m68k/m68k.c (IN_TARGET_CODE): Likewise.
> 	* config/mcore/mcore.c (IN_TARGET_CODE): Likewise.
> 	* config/microblaze/microblaze-c.c (IN_TARGET_CODE): Likewise.
> 	* config/microblaze/microblaze.c (IN_TARGET_CODE): Likewise.
> 	* config/mips/driver-native.c (IN_TARGET_CODE): Likewise.
> 	* config/mips/frame-header-opt.c (IN_TARGET_CODE): Likewise.
> 	* config/mips/mips.c (IN_TARGET_CODE): Likewise.
> 	* config/mmix/mmix.c (IN_TARGET_CODE): Likewise.
> 	* config/mn10300/mn10300.c (IN_TARGET_CODE): Likewise.
> 	* config/moxie/moxie.c (IN_TARGET_CODE): Likewise.
> 	* config/msp430/driver-msp430.c (IN_TARGET_CODE): Likewise.
> 	* config/msp430/msp430-c.c (IN_TARGET_CODE): Likewise.
> 	* config/msp430/msp430.c (IN_TARGET_CODE): Likewise.
> 	* config/nds32/nds32-cost.c (IN_TARGET_CODE): Likewise.
> 	* config/nds32/nds32-fp-as-gp.c (IN_TARGET_CODE): Likewise.
> 	* config/nds32/nds32-intrinsic.c (IN_TARGET_CODE): Likewise.
> 	* config/nds32/nds32-isr.c (IN_TARGET_CODE): Likewise.
> 	* config/nds32/nds32-md-auxiliary.c (IN_TARGET_CODE): Likewise.
> 	* config/nds32/nds32-memory-manipulation.c (IN_TARGET_CODE): Likewise.
> 	* config/nds32/nds32-pipelines-auxiliary.c (IN_TARGET_CODE): Likewise.
> 	* config/nds32/nds32-predicates.c (IN_TARGET_CODE): Likewise.
> 	* config/nds32/nds32.c (IN_TARGET_CODE): Likewise.
> 	* config/nios2/nios2.c (IN_TARGET_CODE): Likewise.
> 	* config/nvptx/mkoffload.c (IN_TARGET_CODE): Likewise.
> 	* config/nvptx/nvptx.c (IN_TARGET_CODE): Likewise.
> 	* config/pa/pa.c (IN_TARGET_CODE): Likewise.
> 	* config/pdp11/pdp11.c (IN_TARGET_CODE): Likewise.
> 	* config/powerpcspe/driver-powerpcspe.c (IN_TARGET_CODE): Likewise.
> 	* config/powerpcspe/host-darwin.c (IN_TARGET_CODE): Likewise.
> 	* config/powerpcspe/host-ppc64-darwin.c (IN_TARGET_CODE): Likewise.
> 	* config/powerpcspe/powerpcspe-c.c (IN_TARGET_CODE): Likewise.
> 	* config/powerpcspe/powerpcspe-linux.c (IN_TARGET_CODE): Likewise.
> 	* config/powerpcspe/powerpcspe.c (IN_TARGET_CODE): Likewise.
> 	* config/riscv/riscv-builtins.c (IN_TARGET_CODE): Likewise.
> 	* config/riscv/riscv-c.c (IN_TARGET_CODE): Likewise.
> 	* config/riscv/riscv.c (IN_TARGET_CODE): Likewise.
> 	* config/rl78/rl78-c.c (IN_TARGET_CODE): Likewise.
> 	* config/rl78/rl78.c (IN_TARGET_CODE): Likewise.
> 	* config/rs6000/driver-rs6000.c (IN_TARGET_CODE): Likewise.
> 	* config/rs6000/host-darwin.c (IN_TARGET_CODE): Likewise.
> 	* config/rs6000/host-ppc64-darwin.c (IN_TARGET_CODE): Likewise.
> 	* config/rs6000/rs6000-c.c (IN_TARGET_CODE): Likewise.
> 	* config/rs6000/rs6000-linux.c (IN_TARGET_CODE): Likewise.
> 	* config/rs6000/rs6000-p8swap.c (IN_TARGET_CODE): Likewise.
> 	* config/rs6000/rs6000-string.c (IN_TARGET_CODE): Likewise.
> 	* config/rs6000/rs6000.c (IN_TARGET_CODE): Likewise.
> 	* config/rx/rx.c (IN_TARGET_CODE): Likewise.
> 	* config/s390/driver-native.c (IN_TARGET_CODE): Likewise.
> 	* config/s390/s390-c.c (IN_TARGET_CODE): Likewise.
> 	* config/s390/s390.c (IN_TARGET_CODE): Likewise.
> 	* config/sh/sh-c.c (IN_TARGET_CODE): Likewise.
> 	* config/sh/sh-mem.cc (IN_TARGET_CODE): Likewise.
> 	* config/sh/sh.c (IN_TARGET_CODE): Likewise.
> 	* config/sh/sh_optimize_sett_clrt.cc (IN_TARGET_CODE): Likewise.
> 	* config/sh/sh_treg_combine.cc (IN_TARGET_CODE): Likewise.
> 	* config/sparc/driver-sparc.c (IN_TARGET_CODE): Likewise.
> 	* config/sparc/sparc-c.c (IN_TARGET_CODE): Likewise.
> 	* config/sparc/sparc.c (IN_TARGET_CODE): Likewise.
> 	* config/spu/spu-c.c (IN_TARGET_CODE): Likewise.
> 	* config/spu/spu.c (IN_TARGET_CODE): Likewise.
> 	* config/stormy16/stormy16.c (IN_TARGET_CODE): Likewise.
> 	* config/tilegx/mul-tables.c (IN_TARGET_CODE): Likewise.
> 	* config/tilegx/tilegx-c.c (IN_TARGET_CODE): Likewise.
> 	* config/tilegx/tilegx.c (IN_TARGET_CODE): Likewise.
> 	* config/tilepro/mul-tables.c (IN_TARGET_CODE): Likewise.
> 	* config/tilepro/tilepro-c.c (IN_TARGET_CODE): Likewise.
> 	* config/tilepro/tilepro.c (IN_TARGET_CODE): Likewise.
> 	* config/v850/v850-c.c (IN_TARGET_CODE): Likewise.
> 	* config/v850/v850.c (IN_TARGET_CODE): Likewise.
> 	* config/vax/vax.c (IN_TARGET_CODE): Likewise.
> 	* config/visium/visium.c (IN_TARGET_CODE): Likewise.
> 	* config/vms/vms-c.c (IN_TARGET_CODE): Likewise.
> 	* config/vms/vms-f.c (IN_TARGET_CODE): Likewise.
> 	* config/vms/vms.c (IN_TARGET_CODE): Likewise.
> 	* config/xtensa/xtensa.c (IN_TARGET_CODE): Likewise.
ISTM this needs documenting somewhere.

OK with a suitable doc patch.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [003/nnn] poly_int: MACRO_MODE
  2017-10-23 17:00 ` [003/nnn] poly_int: MACRO_MODE Richard Sandiford
@ 2017-11-17  3:36   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-17  3:36 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 10:59 AM, Richard Sandiford wrote:
> This patch uses a MACRO_MODE wrapper for the target macro invocations
> in targhooks.c and address.h, so that macros for non-AArch64 targets
> can continue to treat modes as fixed-size.
> 
> It didn't seem worth converting the address macros to hooks since
> (a) they're heavily used, (b) they should be probably be replaced
> with a different interface rather than converted to hooks as-is,
> and most importantly (c) addresses.h already localises the problem.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* machmode.h (MACRO_MODE): New macro.
> 	* addresses.h (base_reg_class, ok_for_base_p_1): Use it.
> 	* targhooks.c (default_libcall_value, default_secondary_reload)
> 	(default_memory_move_cost, default_register_move_cost)
> 	(default_class_max_nregs): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [004/nnn] poly_int: mode query functions
  2017-10-23 17:00 ` [004/nnn] poly_int: mode query functions Richard Sandiford
@ 2017-11-17  3:37   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-17  3:37 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 10:59 AM, Richard Sandiford wrote:
> This patch changes the bit size and vector count arguments to the
> machmode.h functions from unsigned int to poly_uint64.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* machmode.h (mode_for_size, int_mode_for_size, float_mode_for_size)
> 	(smallest_mode_for_size, smallest_int_mode_for_size): Take the mode
> 	size as a poly_uint64.
> 	(mode_for_vector, mode_for_int_vector): Take the number of vector
> 	elements as a poly_uint64.
> 	* stor-layout.c (mode_for_size, smallest_mode_for_size): Take the mode
> 	size as a poly_uint64.
> 	(mode_for_vector, mode_for_int_vector): Take the number of vector
> 	elements as a poly_uint64.
OK.

I think that in general a change from a integer to a poly_uint64 should
generally be considered OK without the need for review.  Ultimately
those are highly mechanical changes with little risk for mucking
something up badly.

Obviously the changes wouldn't go in until we settled the poly_uint64
questions though.

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [007/nnn] poly_int: dump routines
  2017-10-23 17:02 ` [007/nnn] poly_int: dump routines Richard Sandiford
@ 2017-11-17  3:38   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-17  3:38 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:02 AM, Richard Sandiford wrote:
> Add poly_int routines for the dumpfile.h and pretty-print.h frameworks.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* dumpfile.h (dump_dec): Declare.
> 	* dumpfile.c (dump_dec): New function.
> 	* pretty-print.h (pp_wide_integer): Turn into a function and
> 	declare a poly_int version.
> 	* pretty-print.c (pp_wide_integer): New function for poly_ints.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [008/nnn] poly_int: create_integer_operand
  2017-10-23 17:03 ` [008/nnn] poly_int: create_integer_operand Richard Sandiford
@ 2017-11-17  3:40   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-17  3:40 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:02 AM, Richard Sandiford wrote:
> This patch generalises create_integer_operand so that it accepts
> poly_int64s rather than HOST_WIDE_INTs.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* optabs.h (expand_operand): Add an int_value field.
> 	(create_expand_operand): Add an int_value parameter and use it
> 	to initialize the new expand_operand field.
> 	(create_integer_operand): Replace with a declaration of a function
> 	that accepts poly_int64s.  Move the implementation to...
> 	* optabs.c (create_integer_operand): ...here.
> 	(maybe_legitimize_operand): For EXPAND_INTEGER, check whether the
> 	mode preserves the value of int_value, instead of calling
> 	const_int_operand on the rtx.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [009/nnn] poly_int: TRULY_NOOP_TRUNCATION
  2017-10-23 17:04 ` [009/nnn] poly_int: TRULY_NOOP_TRUNCATION Richard Sandiford
@ 2017-11-17  3:40   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-17  3:40 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:03 AM, Richard Sandiford wrote:
> This patch makes TRULY_NOOP_TRUNCATION take the mode sizes as
> poly_uint64s instead of unsigned ints.  The function bodies
> don't need to change.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* target.def (truly_noop_truncation): Take poly_uint64s instead of
> 	unsigned ints.  Change default to hook_bool_puint64_puint64_true.
> 	* doc/tm.texi: Regenerate.
> 	* hooks.h (hook_bool_uint_uint_true): Delete.
> 	(hook_bool_puint64_puint64_true): Declare.
> 	* hooks.c (hook_bool_uint_uint_true): Delete.
> 	(hook_bool_puint64_puint64_true): New function.
> 	* config/mips/mips.c (mips_truly_noop_truncation): Take poly_uint64s
> 	instead of unsigned ints.
> 	* config/spu/spu.c (spu_truly_noop_truncation): Likewise.
> 	* config/tilegx/tilegx.c (tilegx_truly_noop_truncation): Likewise.
OK
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [010/nnn] poly_int: REG_OFFSET
  2017-10-23 17:04 ` [010/nnn] poly_int: REG_OFFSET Richard Sandiford
@ 2017-11-17  3:41   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-17  3:41 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:04 AM, Richard Sandiford wrote:
> This patch changes the type of the reg_attrs offset field
> from HOST_WIDE_INT to poly_int64 and updates uses accordingly.
> This includes changing reg_attr_hasher::hash to use inchash.
> (Doing this has no effect on code generation since the only
> use of the hasher is to avoid creating duplicate objects.)
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* rtl.h (reg_attrs::offset): Change from HOST_WIDE_INT to poly_int64.
> 	(gen_rtx_REG_offset): Take the offset as a poly_int64.
> 	* inchash.h (inchash::hash::add_poly_hwi): New function.
> 	* gengtype.c (main): Register poly_int64.
> 	* emit-rtl.c (reg_attr_hasher::hash): Use inchash.  Treat the
> 	offset as a poly_int.
> 	(reg_attr_hasher::equal): Use must_eq to compare offsets.
> 	(get_reg_attrs, update_reg_offset, gen_rtx_REG_offset): Take the
> 	offset as a poly_int64.
> 	(set_reg_attrs_from_value): Treat the offset as a poly_int64.
> 	* print-rtl.c (print_poly_int): New function.
> 	(rtx_writer::print_rtx_operand_code_r): Treat REG_OFFSET as
> 	a poly_int.
> 	* var-tracking.c (track_offset_p, get_tracked_reg_offset): New
> 	functions.
> 	(var_reg_set, var_reg_delete_and_set, var_reg_delete): Use them.
> 	(same_variable_part_p, track_loc_p): Take the offset as a poly_int64.
> 	(vt_get_decl_and_offset): Return the offset as a poly_int64.
> 	Enforce track_offset_p for parts of a PARALLEL.
> 	(vt_add_function_parameter): Use const_offset for the final
> 	offset to track.  Use get_tracked_reg_offset for the parts
> 	of a PARALLEL.
> 
OK
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [012/nnn] poly_int: fold_ctor_reference
  2017-10-23 17:05 ` [012/nnn] poly_int: fold_ctor_reference Richard Sandiford
@ 2017-11-17  3:59   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-17  3:59 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:04 AM, Richard Sandiford wrote:
> This patch changes the offset and size arguments to
> fold_ctor_reference from unsigned HOST_WIDE_INT to poly_uint64.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* gimple-fold.h (fold_ctor_reference): Take the offset and size
> 	as poly_uint64 rather than unsigned HOST_WIDE_INT.
> 	* gimple-fold.c (fold_ctor_reference): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [013/nnn] poly_int: same_addr_size_stores_p
  2017-10-23 17:05 ` [013/nnn] poly_int: same_addr_size_stores_p Richard Sandiford
@ 2017-11-17  4:11   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-17  4:11 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:05 AM, Richard Sandiford wrote:
> This patch makes tree-ssa-alias.c:same_addr_size_stores_p handle
> poly_int sizes and offsets.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-ssa-alias.c (same_addr_size_stores_p): Take the offsets and
> 	sizes as poly_int64s rather than HOST_WIDE_INTs.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [005/nnn] poly_int: rtx constants
  2017-10-23 17:01 ` [005/nnn] poly_int: rtx constants Richard Sandiford
@ 2017-11-17  4:17   ` Jeff Law
  2017-12-15  1:25     ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-11-17  4:17 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:00 AM, Richard Sandiford wrote:
> This patch adds an rtl representation of poly_int values.
> There were three possible ways of doing this:
> 
> (1) Add a new rtl code for the poly_ints themselves and store the
>     coefficients as trailing wide_ints.  This would give constants like:
> 
>       (const_poly_int [c0 c1 ... cn])
> 
>     The runtime value would be:
> 
>       c0 + c1 * x1 + ... + cn * xn
> 
> (2) Like (1), but use rtxes for the coefficients.  This would give
>     constants like:
> 
>       (const_poly_int [(const_int c0)
>                        (const_int c1)
>                        ...
>                        (const_int cn)])
> 
>     although the coefficients could be const_wide_ints instead
>     of const_ints where appropriate.
> 
> (3) Add a new rtl code for the polynomial indeterminates,
>     then use them in const wrappers.  A constant like c0 + c1 * x1
>     would then look like:
> 
>       (const:M (plus:M (mult:M (const_param:M x1)
>                                (const_int c1))
>                        (const_int c0)))
> 
> There didn't seem to be that much to choose between them.  The main
> advantage of (1) is that it's a more efficient representation and
> that we can refer to the cofficients directly as wide_int_storage.
Well, and #1 feels more like how we handle CONST_INT :-)
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* doc/rtl.texi (const_poly_int): Document.
> 	* gengenrtl.c (excluded_rtx): Return true for CONST_POLY_INT.
> 	* rtl.h (const_poly_int_def): New struct.
> 	(rtx_def::u): Add a cpi field.
> 	(CASE_CONST_UNIQUE, CASE_CONST_ANY): Add CONST_POLY_INT.
> 	(CONST_POLY_INT_P, CONST_POLY_INT_COEFFS): New macros.
> 	(wi::rtx_to_poly_wide_ref): New typedef
> 	(const_poly_int_value, wi::to_poly_wide, rtx_to_poly_int64)
> 	(poly_int_rtx_p): New functions.
> 	(trunc_int_for_mode): Declare a poly_int64 version.
> 	(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
> 	(immed_wide_int_const): Take a poly_wide_int_ref rather than
> 	a wide_int_ref.
> 	(strip_offset): Declare.
> 	(strip_offset_and_add): New function.
> 	* rtl.def (CONST_POLY_INT): New rtx code.
> 	* rtl.c (rtx_size): Handle CONST_POLY_INT.
> 	(shared_const_p): Use poly_int_rtx_p.
> 	* emit-rtl.h (gen_int_mode): Take a poly_int64 instead of a
> 	HOST_WIDE_INT.
> 	(gen_int_shift_amount): Likewise.
> 	* emit-rtl.c (const_poly_int_hasher): New class.
> 	(const_poly_int_htab): New variable.
> 	(init_emit_once): Initialize it when NUM_POLY_INT_COEFFS > 1.
> 	(const_poly_int_hasher::hash): New function.
> 	(const_poly_int_hasher::equal): Likewise.
> 	(gen_int_mode): Take a poly_int64 instead of a HOST_WIDE_INT.
> 	(immed_wide_int_const): Rename to...
> 	(immed_wide_int_const_1): ...this and make static.
> 	(immed_wide_int_const): New function, taking a poly_wide_int_ref
> 	instead of a wide_int_ref.
> 	(gen_int_shift_amount): Take a poly_int64 instead of a HOST_WIDE_INT.
> 	(gen_lowpart_common): Handle CONST_POLY_INT.
> 	* cse.c (hash_rtx_cb, equiv_constant): Likewise.
> 	* cselib.c (cselib_hash_rtx): Likewise.
> 	* dwarf2out.c (const_ok_for_output_1): Likewise.
> 	* expr.c (convert_modes): Likewise.
> 	* print-rtl.c (rtx_writer::print_rtx, print_value): Likewise.
> 	* rtlhash.c (add_rtx): Likewise.
> 	* explow.c (trunc_int_for_mode): Add a poly_int64 version.
> 	(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
> 	Handle existing CONST_POLY_INT rtxes.
> 	* expmed.h (expand_shift): Take a poly_int64 instead of a
> 	HOST_WIDE_INT.
> 	* expmed.c (expand_shift): Likewise.
> 	* rtlanal.c (strip_offset): New function.
> 	(commutative_operand_precedence): Give CONST_POLY_INT the same
> 	precedence as CONST_DOUBLE and put CONST_WIDE_INT between that
> 	and CONST_INT.
> 	* rtl-tests.c (const_poly_int_tests): New struct.
> 	(rtl_tests_c_tests): Use it.
> 	* simplify-rtx.c (simplify_const_unary_operation): Handle
> 	CONST_POLY_INT.
> 	(simplify_const_binary_operation): Likewise.
> 	(simplify_binary_operation_1): Fold additions of symbolic constants
> 	and CONST_POLY_INTs.
> 	(simplify_subreg): Handle extensions and truncations of
> 	CONST_POLY_INTs.
> 	(simplify_const_poly_int_tests): New struct.
> 	(simplify_rtx_c_tests): Use it.
> 	* wide-int.h (storage_ref): Add default constructor.
> 	(wide_int_ref_storage): Likewise.
> 	(trailing_wide_ints): Use GTY((user)).
> 	(trailing_wide_ints::operator[]): Add a const version.
> 	(trailing_wide_ints::get_precision): New function.
> 	(trailing_wide_ints::extra_size): Likewise.
Do we need to define anything WRT structure sharing in rtl.texi for a
CONST_POLY_INT?



>  
> Index: gcc/rtl.c
> ===================================================================
> --- gcc/rtl.c	2017-10-23 16:52:20.579835373 +0100
> +++ gcc/rtl.c	2017-10-23 17:00:54.443002147 +0100
> @@ -257,9 +261,10 @@ shared_const_p (const_rtx orig)
>  
>    /* CONST can be shared if it contains a SYMBOL_REF.  If it contains
>       a LABEL_REF, it isn't sharable.  */
> +  poly_int64 offset;
>    return (GET_CODE (XEXP (orig, 0)) == PLUS
>  	  && GET_CODE (XEXP (XEXP (orig, 0), 0)) == SYMBOL_REF
> -	  && CONST_INT_P (XEXP (XEXP (orig, 0), 1)));
> +	  && poly_int_rtx_p (XEXP (XEXP (orig, 0), 1), &offset));
Did this just change structure sharing for CONST_WIDE_INT?



> +  /* Create a new rtx.  There's a choice to be made here between installing
> +     the actual mode of the rtx or leaving it as VOIDmode (for consistency
> +     with CONST_INT).  In practice the handling of the codes is different
> +     enough that we get no benefit from using VOIDmode, and various places
> +     assume that VOIDmode implies CONST_INT.  Using the real mode seems like
> +     the right long-term direction anyway.  */
Certainly my preference is to get the mode in there.  I see modeless
CONST_INTs as a long standing wart and I'm not keen to repeat it.



> Index: gcc/wide-int.h
> ===================================================================
> --- gcc/wide-int.h	2017-10-23 17:00:20.923835582 +0100
> +++ gcc/wide-int.h	2017-10-23 17:00:54.445999420 +0100
> @@ -613,6 +613,7 @@ #define SHIFT_FUNCTION \
>       access.  */
>    struct storage_ref
>    {
> +    storage_ref () {}
>      storage_ref (const HOST_WIDE_INT *, unsigned int, unsigned int);
>  
>      const HOST_WIDE_INT *val;
> @@ -944,6 +945,8 @@ struct wide_int_ref_storage : public wi:
>    HOST_WIDE_INT scratch[2];
>  
>  public:
> +  wide_int_ref_storage () {}
> +
>    wide_int_ref_storage (const wi::storage_ref &);
>  
>    template <typename T>
So doesn't this play into the whole question about initialization of
these objects.  So I'll defer on this hunk until we settle that
question, but the rest is OK.


Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-10-23 17:02 ` [006/nnn] poly_int: tree constants Richard Sandiford
  2017-10-25 17:14   ` Martin Sebor
@ 2017-11-17  4:51   ` Jeff Law
  2017-11-18 15:48     ` Richard Sandiford
  1 sibling, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-11-17  4:51 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:00 AM, Richard Sandiford wrote:
> This patch adds a tree representation for poly_ints.  Unlike the
> rtx version, the coefficients are INTEGER_CSTs rather than plain
> integers, so that we can easily access them as poly_widest_ints
> and poly_offset_ints.
> 
> The patch also adjusts some places that previously
> relied on "constant" meaning "INTEGER_CST".  It also makes
> sure that the TYPE_SIZE agrees with the TYPE_SIZE_UNIT for
> vector booleans, given the existing:
> 
> 	/* Several boolean vector elements may fit in a single unit.  */
> 	if (VECTOR_BOOLEAN_TYPE_P (type)
> 	    && type->type_common.mode != BLKmode)
> 	  TYPE_SIZE_UNIT (type)
> 	    = size_int (GET_MODE_SIZE (type->type_common.mode));
> 	else
> 	  TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR,
> 						   TYPE_SIZE_UNIT (innertype),
> 						   size_int (nunits));
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* doc/generic.texi (POLY_INT_CST): Document.
> 	* tree.def (POLY_INT_CST): New tree code.
> 	* treestruct.def (TS_POLY_INT_CST): New tree layout.
> 	* tree-core.h (tree_poly_int_cst): New struct.
> 	(tree_node): Add a poly_int_cst field.
> 	* tree.h (POLY_INT_CST_P, POLY_INT_CST_COEFF): New macros.
> 	(wide_int_to_tree, force_fit_type): Take a poly_wide_int_ref
> 	instead of a wide_int_ref.
> 	(build_int_cst, build_int_cst_type): Take a poly_int64 instead
> 	of a HOST_WIDE_INT.
> 	(build_int_cstu, build_array_type_nelts): Take a poly_uint64
> 	instead of an unsigned HOST_WIDE_INT.
> 	(build_poly_int_cst, tree_fits_poly_int64_p, tree_fits_poly_uint64_p)
> 	(ptrdiff_tree_p): Declare.
> 	(tree_to_poly_int64, tree_to_poly_uint64): Likewise.  Provide
> 	extern inline implementations if the target doesn't use POLY_INT_CST.
> 	(poly_int_tree_p): New function.
> 	(wi::unextended_tree): New class.
> 	(wi::int_traits <unextended_tree>): New override.
> 	(wi::extended_tree): Add a default constructor.
> 	(wi::extended_tree::get_tree): New function.
> 	(wi::widest_extended_tree, wi::offset_extended_tree): New typedefs.
> 	(wi::tree_to_widest_ref, wi::tree_to_offset_ref): Use them.
> 	(wi::tree_to_poly_widest_ref, wi::tree_to_poly_offset_ref)
> 	(wi::tree_to_poly_wide_ref): New typedefs.
> 	(wi::ints_for): Provide overloads for extended_tree and
> 	unextended_tree.
> 	(poly_int_cst_value, wi::to_poly_widest, wi::to_poly_offset)
> 	(wi::to_wide): New functions.
> 	(wi::fits_to_boolean_p, wi::fits_to_tree_p): Handle poly_ints.
> 	* tree.c (poly_int_cst_hasher): New struct.
> 	(poly_int_cst_hash_table): New variable.
> 	(tree_node_structure_for_code, tree_code_size, simple_cst_equal)
> 	(valid_constant_size_p, add_expr, drop_tree_overflow): Handle
> 	POLY_INT_CST.
> 	(initialize_tree_contains_struct): Handle TS_POLY_INT_CST.
> 	(init_ttree): Initialize poly_int_cst_hash_table.
> 	(build_int_cst, build_int_cst_type, build_invariant_address): Take
> 	a poly_int64 instead of a HOST_WIDE_INT.
> 	(build_int_cstu, build_array_type_nelts): Take a poly_uint64
> 	instead of an unsigned HOST_WIDE_INT.
> 	(wide_int_to_tree): Rename to...
> 	(wide_int_to_tree_1): ...this.
> 	(build_new_poly_int_cst, build_poly_int_cst): New functions.
> 	(force_fit_type): Take a poly_wide_int_ref instead of a wide_int_ref.
> 	(wide_int_to_tree): New function that takes a poly_wide_int_ref.
> 	(ptrdiff_tree_p, tree_to_poly_int64, tree_to_poly_uint64)
> 	(tree_fits_poly_int64_p, tree_fits_poly_uint64_p): New functions.
> 	* lto-streamer-out.c (DFS::DFS_write_tree_body, hash_tree): Handle
> 	TS_POLY_INT_CST.
> 	* tree-streamer-in.c (lto_input_ts_poly_tree_pointers): Likewise.
> 	(streamer_read_tree_body): Likewise.
> 	* tree-streamer-out.c (write_ts_poly_tree_pointers): Likewise.
> 	(streamer_write_tree_body): Likewise.
> 	* tree-streamer.c (streamer_check_handled_ts_structures): Likewise.
> 	* asan.c (asan_protect_global): Require the size to be an INTEGER_CST.
> 	* cfgexpand.c (expand_debug_expr): Handle POLY_INT_CST.
> 	* expr.c (const_vector_element, expand_expr_real_1): Likewise.
> 	* gimple-expr.h (is_gimple_constant): Likewise.
> 	* gimplify.c (maybe_with_size_expr): Likewise.
> 	* print-tree.c (print_node): Likewise.
> 	* tree-data-ref.c (data_ref_compare_tree): Likewise.
> 	* tree-pretty-print.c (dump_generic_node): Likewise.
> 	* tree-ssa-address.c (addr_for_mem_ref): Likewise.
> 	* tree-vect-data-refs.c (dr_group_sort_cmp): Likewise.
> 	* tree-vrp.c (compare_values_warnv): Likewise.
> 	* tree-ssa-loop-ivopts.c (determine_base_object, constant_multiple_of)
> 	(get_loop_invariant_expr, add_candidate_1, get_computation_aff_1)
> 	(force_expr_to_var_cost): Likewise.
> 	* tree-ssa-loop.c (for_each_index): Likewise.
> 	* fold-const.h (build_invariant_address, size_int_kind): Take a
> 	poly_int64 instead of a HOST_WIDE_INT.
> 	* fold-const.c (fold_negate_expr_1, const_binop, const_unop)
> 	(fold_convert_const, multiple_of_p, fold_negate_const): Handle
> 	POLY_INT_CST.
> 	(size_binop_loc): Likewise.  Allow int_const_binop_1 to fail.
> 	(int_const_binop_2): New function, split out from...
> 	(int_const_binop_1): ...here.  Handle POLY_INT_CST.
> 	(size_int_kind): Take a poly_int64 instead of a HOST_WIDE_INT.
> 	* expmed.c (make_tree): Handle CONST_POLY_INT_P.
> 	* gimple-ssa-strength-reduction.c (slsr_process_add)
> 	(slsr_process_mul): Check for INTEGER_CSTs before using them
> 	as candidates.
> 	* stor-layout.c (bits_from_bytes): New function.
> 	(bit_from_pos): Use it.
> 	(layout_type): Likewise.  For vectors, multiply the TYPE_SIZE_UNIT
> 	by BITS_PER_UNIT to get the TYPE_SIZE.
> 	* tree-cfg.c (verify_expr, verify_types_in_gimple_reference): Allow
> 	MEM_REF and TARGET_MEM_REF offsets to be a POLY_INT_CST.
> 
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h	2017-10-23 16:52:20.504766418 +0100
> +++ gcc/tree.h	2017-10-23 17:00:57.784962010 +0100
> @@ -5132,6 +5195,29 @@ extern bool anon_aggrname_p (const_tree)
>  /* The tree and const_tree overload templates.   */
>  namespace wi
>  {
> +  class unextended_tree
> +  {
> +  private:
> +    const_tree m_t;
> +
> +  public:
> +    unextended_tree () {}
> +    unextended_tree (const_tree t) : m_t (t) {}
> +
> +    unsigned int get_precision () const;
> +    const HOST_WIDE_INT *get_val () const;
> +    unsigned int get_len () const;
> +    const_tree get_tree () const { return m_t; }
> +  };
> +
> +  template <>
> +  struct int_traits <unextended_tree>
> +  {
> +    static const enum precision_type precision_type = VAR_PRECISION;
> +    static const bool host_dependent_precision = false;
> +    static const bool is_sign_extended = false;
> +  };
> +
>    template <int N>
>    class extended_tree
>    {
> @@ -5139,11 +5225,13 @@ extern bool anon_aggrname_p (const_tree)
>      const_tree m_t;
>  
>    public:
> +    extended_tree () {}
>      extended_tree (const_tree);
>  
>      unsigned int get_precision () const;
>      const HOST_WIDE_INT *get_val () const;
>      unsigned int get_len () const;
> +    const_tree get_tree () const { return m_t; }
>    };
Similarly I'll defer on part of the patch since the empty ctors play
into the initialization question that's still on the table.

Otherwise this is OK.

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [011/nnn] poly_int: DWARF locations
  2017-10-23 17:05 ` [011/nnn] poly_int: DWARF locations Richard Sandiford
@ 2017-11-17 17:40   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-17 17:40 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:04 AM, Richard Sandiford wrote:
> This patch adds support for DWARF location expressions
> that involve polynomial offsets.  It adds a target hook that
> says how the runtime invariants used in the offsets should be
> represented in DWARF.  SVE vectors have to be a multiple of
> 128 bits in size, so the GCC port uses the number of 128-bit
> blocks minus one as the runtime invariant.  However, in DWARF,
> the vector length is exposed via a pseudo "VG" register that
> holds the number of 64-bit elements in a vector.  Thus:
> 
>   indeterminate 1 == (VG / 2) - 1
> 
> The hook needs to be general enough to express this.
> Note that in most cases the division and subtraction fold
> away into surrounding expressions.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* target.def (dwarf_poly_indeterminate_value): New hook.
> 	* targhooks.h (default_dwarf_poly_indeterminate_value): Declare.
> 	* targhooks.c (default_dwarf_poly_indeterminate_value): New function.
> 	* doc/tm.texi.in (TARGET_DWARF_POLY_INDETERMINATE_VALUE): Document.
> 	* doc/tm.texi: Regenerate.
> 	* dwarf2out.h (build_cfa_loc, build_cfa_aligned_loc): Take the
> 	offset as a poly_int64.
> 	* dwarf2out.c (new_reg_loc_descr): Move later in file.  Take the
> 	offset as a poly_int64.
> 	(loc_descr_plus_const, loc_list_plus_const, build_cfa_aligned_loc):
> 	Take the offset as a poly_int64.
> 	(build_cfa_loc): Likewise.  Use loc_descr_plus_const.
> 	(frame_pointer_fb_offset): Change to a poly_int64.
> 	(int_loc_descriptor): Take the offset as a poly_int64.  Use
> 	targetm.dwarf_poly_indeterminate_value for polynomial offsets.
> 	(based_loc_descr): Take the offset as a poly_int64.
> 	Use strip_offset_and_add to handle (plus X (const)).
> 	Use new_reg_loc_descr instead of an open-coded version of the
> 	previous implementation.
> 	(mem_loc_descriptor): Handle CONST_POLY_INT.
> 	(compute_frame_pointer_to_fb_displacement): Take the offset as a
> 	poly_int64.  Use strip_offset_and_add to handle (plus X (const)).
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [014/nnn] poly_int: indirect_refs_may_alias_p
  2017-10-23 17:06 ` [014/nnn] poly_int: indirect_refs_may_alias_p Richard Sandiford
@ 2017-11-17 18:11   ` Jeff Law
  2017-11-20 13:31     ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-11-17 18:11 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:05 AM, Richard Sandiford wrote:
> This patch makes indirect_refs_may_alias_p use ranges_may_overlap_p
> rather than ranges_overlap_p.  Unlike the former, the latter can handle
> negative offsets, so the fix for PR44852 should no longer be necessary.
> It can also handle offset_int, so avoids unchecked truncations to
> HOST_WIDE_INT.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-ssa-alias.c (indirect_ref_may_alias_decl_p)
> 	(indirect_refs_may_alias_p): Use ranges_may_overlap_p
> 	instead of ranges_overlap_p.
OK.

Note that this highlighted a nit in patch 001 -- namely that there's new
function templates that aren't mentioned in the ChangeLog.


Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [015/nnn] poly_int: ao_ref and vn_reference_op_t
  2017-10-23 17:06 ` [015/nnn] poly_int: ao_ref and vn_reference_op_t Richard Sandiford
@ 2017-11-18  4:25   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-18  4:25 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:06 AM, Richard Sandiford wrote:
> This patch changes the offset, size and max_size fields
> of ao_ref from HOST_WIDE_INT to poly_int64 and propagates
> the change through the code that references it.  This includes
> changing the off field of vn_reference_op_struct in the same way.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* inchash.h (inchash::hash::add_poly_int): New function.
> 	* tree-ssa-alias.h (ao_ref::offset, ao_ref::size, ao_ref::max_size):
> 	Use poly_int64 rather than HOST_WIDE_INT.
> 	(ao_ref::max_size_known_p): New function.
> 	* tree-ssa-sccvn.h (vn_reference_op_struct::off): Use poly_int64_pod
> 	rather than HOST_WIDE_INT.
> 	* tree-ssa-alias.c (ao_ref_base): Apply get_ref_base_and_extent
> 	to temporaries until its interface is adjusted to match.
> 	(ao_ref_init_from_ptr_and_size): Handle polynomial offsets and sizes.
> 	(aliasing_component_refs_p, decl_refs_may_alias_p)
> 	(indirect_ref_may_alias_decl_p, indirect_refs_may_alias_p): Take
> 	the offsets and max_sizes as poly_int64s instead of HOST_WIDE_INTs.
> 	(refs_may_alias_p_1, stmt_kills_ref_p): Adjust for changes to
> 	ao_ref fields.
> 	* alias.c (ao_ref_from_mem): Likewise.
> 	* tree-ssa-dce.c (mark_aliased_reaching_defs_necessary_1): Likewise.
> 	* tree-ssa-dse.c (valid_ao_ref_for_dse, normalize_ref)
> 	(clear_bytes_written_by, setup_live_bytes_from_ref, compute_trims)
> 	(maybe_trim_complex_store, maybe_trim_constructor_store)
> 	(live_bytes_read, dse_classify_store): Likewise.
> 	* tree-ssa-sccvn.c (vn_reference_compute_hash, vn_reference_eq):
> 	(copy_reference_ops_from_ref, ao_ref_init_from_vn_reference)
> 	(fully_constant_vn_reference_p, valueize_refs_1): Likewise.
> 	(vn_reference_lookup_3): Likewise.
> 	* tree-ssa-uninit.c (warn_uninitialized_vars): Likewise.
It looks like this patch contains more changes from ranges_overlap_p to
ranges_may_overlap_p.  They aren't noted in the ChangeLog.

As I look at these patches my worry is we're probably going to need some
guidance in our documentation for when to use the poly interfaces.
Certainly the type system helps here, so someone changing existing code
will likely get errors at compile time if they goof.  But in larger
chunks of new code I won't be surprised if problems creep in until folks
adjust existing habits.

As is often the case, there's a certain amount of trust here that you
evaluated the may/must stuff correctly.  It was fairly easy for me to
look at the tree-ssa-dse.c changes and see the intent as I'm reasonably
familiar  with that code.   tree-ssa-sccvn (as an example) is much
harder for me to evaluate.

OK.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [016/nnn] poly_int: dse.c
  2017-10-23 17:07 ` [016/nnn] poly_int: dse.c Richard Sandiford
@ 2017-11-18  4:30   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-18  4:30 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:06 AM, Richard Sandiford wrote:
> This patch makes RTL DSE use poly_int for offsets and sizes.
> The local phase can optimise them normally but the global phase
> treats them as wild accesses.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* dse.c (store_info): Change offset and width from HOST_WIDE_INT
> 	to poly_int64.  Update commentary for positions_needed.large.
> 	(read_info_type): Change offset and width from HOST_WIDE_INT
> 	to poly_int64.
> 	(set_usage_bits): Likewise.
> 	(canon_address): Return the offset as a poly_int64 rather than
> 	a HOST_WIDE_INT.  Use strip_offset_and_add.
> 	(set_all_positions_unneeded, any_positions_needed_p): Use
> 	positions_needed.large to track stores with non-constant widths.
> 	(all_positions_needed_p): Likewise.  Take the offset and width
> 	as poly_int64s rather than ints.  Assert that rhs is nonnull.
> 	(record_store): Cope with non-constant offsets and widths.
> 	Nullify the rhs of an earlier store if we can't tell which bytes
> 	of it are needed.
> 	(find_shift_sequence): Take the access_size and shift as poly_int64s
> 	rather than ints.
> 	(get_stored_val): Take the read_offset and read_width as poly_int64s
> 	rather than HOST_WIDE_INTs.
> 	(check_mem_read_rtx, scan_stores, scan_reads, dse_step5): Handle
> 	non-constant offsets and widths.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [017/nnn] poly_int: rtx_addr_can_trap_p_1
  2017-10-23 17:07 ` [017/nnn] poly_int: rtx_addr_can_trap_p_1 Richard Sandiford
@ 2017-11-18  4:46   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-18  4:46 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:07 AM, Richard Sandiford wrote:
> This patch changes the offset and size arguments of
> rtx_addr_can_trap_p_1 from HOST_WIDE_INT to poly_int64.  It also
> uses a size of -1 rather than 0 to represent an unknown size and
> BLKmode rather than VOIDmode to represent an unknown mode.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* rtlanal.c (rtx_addr_can_trap_p_1): Take the offset and size
> 	as poly_int64s rather than HOST_WIDE_INTs.  Use a size of -1
> 	rather than 0 to represent an unknown size.  Assert that the size
> 	is known when the mode isn't BLKmode.
> 	(may_trap_p_1): Use -1 for unknown sizes.
> 	(rtx_addr_can_trap_p): Likewise.  Pass BLKmode rather than VOIDmode.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [006/nnn] poly_int: tree constants
  2017-11-17  4:51   ` Jeff Law
@ 2017-11-18 15:48     ` Richard Sandiford
  0 siblings, 0 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-11-18 15:48 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 11:00 AM, Richard Sandiford wrote:
>> This patch adds a tree representation for poly_ints.  Unlike the
>> rtx version, the coefficients are INTEGER_CSTs rather than plain
>> integers, so that we can easily access them as poly_widest_ints
>> and poly_offset_ints.
>> 
>> The patch also adjusts some places that previously
>> relied on "constant" meaning "INTEGER_CST".  It also makes
>> sure that the TYPE_SIZE agrees with the TYPE_SIZE_UNIT for
>> vector booleans, given the existing:
>> 
>> 	/* Several boolean vector elements may fit in a single unit.  */
>> 	if (VECTOR_BOOLEAN_TYPE_P (type)
>> 	    && type->type_common.mode != BLKmode)
>> 	  TYPE_SIZE_UNIT (type)
>> 	    = size_int (GET_MODE_SIZE (type->type_common.mode));
>> 	else
>> 	  TYPE_SIZE_UNIT (type) = int_const_binop (MULT_EXPR,
>> 						   TYPE_SIZE_UNIT (innertype),
>> 						   size_int (nunits));
>> 
>> 
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* doc/generic.texi (POLY_INT_CST): Document.
>> 	* tree.def (POLY_INT_CST): New tree code.
>> 	* treestruct.def (TS_POLY_INT_CST): New tree layout.
>> 	* tree-core.h (tree_poly_int_cst): New struct.
>> 	(tree_node): Add a poly_int_cst field.
>> 	* tree.h (POLY_INT_CST_P, POLY_INT_CST_COEFF): New macros.
>> 	(wide_int_to_tree, force_fit_type): Take a poly_wide_int_ref
>> 	instead of a wide_int_ref.
>> 	(build_int_cst, build_int_cst_type): Take a poly_int64 instead
>> 	of a HOST_WIDE_INT.
>> 	(build_int_cstu, build_array_type_nelts): Take a poly_uint64
>> 	instead of an unsigned HOST_WIDE_INT.
>> 	(build_poly_int_cst, tree_fits_poly_int64_p, tree_fits_poly_uint64_p)
>> 	(ptrdiff_tree_p): Declare.
>> 	(tree_to_poly_int64, tree_to_poly_uint64): Likewise.  Provide
>> 	extern inline implementations if the target doesn't use POLY_INT_CST.
>> 	(poly_int_tree_p): New function.
>> 	(wi::unextended_tree): New class.
>> 	(wi::int_traits <unextended_tree>): New override.
>> 	(wi::extended_tree): Add a default constructor.
>> 	(wi::extended_tree::get_tree): New function.
>> 	(wi::widest_extended_tree, wi::offset_extended_tree): New typedefs.
>> 	(wi::tree_to_widest_ref, wi::tree_to_offset_ref): Use them.
>> 	(wi::tree_to_poly_widest_ref, wi::tree_to_poly_offset_ref)
>> 	(wi::tree_to_poly_wide_ref): New typedefs.
>> 	(wi::ints_for): Provide overloads for extended_tree and
>> 	unextended_tree.
>> 	(poly_int_cst_value, wi::to_poly_widest, wi::to_poly_offset)
>> 	(wi::to_wide): New functions.
>> 	(wi::fits_to_boolean_p, wi::fits_to_tree_p): Handle poly_ints.
>> 	* tree.c (poly_int_cst_hasher): New struct.
>> 	(poly_int_cst_hash_table): New variable.
>> 	(tree_node_structure_for_code, tree_code_size, simple_cst_equal)
>> 	(valid_constant_size_p, add_expr, drop_tree_overflow): Handle
>> 	POLY_INT_CST.
>> 	(initialize_tree_contains_struct): Handle TS_POLY_INT_CST.
>> 	(init_ttree): Initialize poly_int_cst_hash_table.
>> 	(build_int_cst, build_int_cst_type, build_invariant_address): Take
>> 	a poly_int64 instead of a HOST_WIDE_INT.
>> 	(build_int_cstu, build_array_type_nelts): Take a poly_uint64
>> 	instead of an unsigned HOST_WIDE_INT.
>> 	(wide_int_to_tree): Rename to...
>> 	(wide_int_to_tree_1): ...this.
>> 	(build_new_poly_int_cst, build_poly_int_cst): New functions.
>> 	(force_fit_type): Take a poly_wide_int_ref instead of a wide_int_ref.
>> 	(wide_int_to_tree): New function that takes a poly_wide_int_ref.
>> 	(ptrdiff_tree_p, tree_to_poly_int64, tree_to_poly_uint64)
>> 	(tree_fits_poly_int64_p, tree_fits_poly_uint64_p): New functions.
>> 	* lto-streamer-out.c (DFS::DFS_write_tree_body, hash_tree): Handle
>> 	TS_POLY_INT_CST.
>> 	* tree-streamer-in.c (lto_input_ts_poly_tree_pointers): Likewise.
>> 	(streamer_read_tree_body): Likewise.
>> 	* tree-streamer-out.c (write_ts_poly_tree_pointers): Likewise.
>> 	(streamer_write_tree_body): Likewise.
>> 	* tree-streamer.c (streamer_check_handled_ts_structures): Likewise.
>> 	* asan.c (asan_protect_global): Require the size to be an INTEGER_CST.
>> 	* cfgexpand.c (expand_debug_expr): Handle POLY_INT_CST.
>> 	* expr.c (const_vector_element, expand_expr_real_1): Likewise.
>> 	* gimple-expr.h (is_gimple_constant): Likewise.
>> 	* gimplify.c (maybe_with_size_expr): Likewise.
>> 	* print-tree.c (print_node): Likewise.
>> 	* tree-data-ref.c (data_ref_compare_tree): Likewise.
>> 	* tree-pretty-print.c (dump_generic_node): Likewise.
>> 	* tree-ssa-address.c (addr_for_mem_ref): Likewise.
>> 	* tree-vect-data-refs.c (dr_group_sort_cmp): Likewise.
>> 	* tree-vrp.c (compare_values_warnv): Likewise.
>> 	* tree-ssa-loop-ivopts.c (determine_base_object, constant_multiple_of)
>> 	(get_loop_invariant_expr, add_candidate_1, get_computation_aff_1)
>> 	(force_expr_to_var_cost): Likewise.
>> 	* tree-ssa-loop.c (for_each_index): Likewise.
>> 	* fold-const.h (build_invariant_address, size_int_kind): Take a
>> 	poly_int64 instead of a HOST_WIDE_INT.
>> 	* fold-const.c (fold_negate_expr_1, const_binop, const_unop)
>> 	(fold_convert_const, multiple_of_p, fold_negate_const): Handle
>> 	POLY_INT_CST.
>> 	(size_binop_loc): Likewise.  Allow int_const_binop_1 to fail.
>> 	(int_const_binop_2): New function, split out from...
>> 	(int_const_binop_1): ...here.  Handle POLY_INT_CST.
>> 	(size_int_kind): Take a poly_int64 instead of a HOST_WIDE_INT.
>> 	* expmed.c (make_tree): Handle CONST_POLY_INT_P.
>> 	* gimple-ssa-strength-reduction.c (slsr_process_add)
>> 	(slsr_process_mul): Check for INTEGER_CSTs before using them
>> 	as candidates.
>> 	* stor-layout.c (bits_from_bytes): New function.
>> 	(bit_from_pos): Use it.
>> 	(layout_type): Likewise.  For vectors, multiply the TYPE_SIZE_UNIT
>> 	by BITS_PER_UNIT to get the TYPE_SIZE.
>> 	* tree-cfg.c (verify_expr, verify_types_in_gimple_reference): Allow
>> 	MEM_REF and TARGET_MEM_REF offsets to be a POLY_INT_CST.
>> 
>> Index: gcc/tree.h
>> ===================================================================
>> --- gcc/tree.h	2017-10-23 16:52:20.504766418 +0100
>> +++ gcc/tree.h	2017-10-23 17:00:57.784962010 +0100
>> @@ -5132,6 +5195,29 @@ extern bool anon_aggrname_p (const_tree)
>>  /* The tree and const_tree overload templates.   */
>>  namespace wi
>>  {
>> +  class unextended_tree
>> +  {
>> +  private:
>> +    const_tree m_t;
>> +
>> +  public:
>> +    unextended_tree () {}
>> +    unextended_tree (const_tree t) : m_t (t) {}
>> +
>> +    unsigned int get_precision () const;
>> +    const HOST_WIDE_INT *get_val () const;
>> +    unsigned int get_len () const;
>> +    const_tree get_tree () const { return m_t; }
>> +  };
>> +
>> +  template <>
>> +  struct int_traits <unextended_tree>
>> +  {
>> +    static const enum precision_type precision_type = VAR_PRECISION;
>> +    static const bool host_dependent_precision = false;
>> +    static const bool is_sign_extended = false;
>> +  };
>> +
>>    template <int N>
>>    class extended_tree
>>    {
>> @@ -5139,11 +5225,13 @@ extern bool anon_aggrname_p (const_tree)
>>      const_tree m_t;
>>  
>>    public:
>> +    extended_tree () {}
>>      extended_tree (const_tree);
>>  
>>      unsigned int get_precision () const;
>>      const HOST_WIDE_INT *get_val () const;
>>      unsigned int get_len () const;
>> +    const_tree get_tree () const { return m_t; }
>>    };
> Similarly I'll defer on part of the patch since the empty ctors play
> into the initialization question that's still on the table.

FWIW, I'd expect these two constructors to go away if we switch
to C++11 in future, rather than become "() = default".  We only
really need them because of C++03 restrictions.

> Otherwise this is OK.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [014/nnn] poly_int: indirect_refs_may_alias_p
  2017-11-17 18:11   ` Jeff Law
@ 2017-11-20 13:31     ` Richard Sandiford
  2017-11-21  0:49       ` Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-11-20 13:31 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 11:05 AM, Richard Sandiford wrote:
>> This patch makes indirect_refs_may_alias_p use ranges_may_overlap_p
>> rather than ranges_overlap_p.  Unlike the former, the latter can handle
>> negative offsets, so the fix for PR44852 should no longer be necessary.
>> It can also handle offset_int, so avoids unchecked truncations to
>> HOST_WIDE_INT.
>> 
>> 
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* tree-ssa-alias.c (indirect_ref_may_alias_decl_p)
>> 	(indirect_refs_may_alias_p): Use ranges_may_overlap_p
>> 	instead of ranges_overlap_p.
> OK.
>
> Note that this highlighted a nit in patch 001 -- namely that there's new
> function templates that aren't mentioned in the ChangeLog.

Do you mean ranges_may_overlap_p?  I can add that and the other new
poly-int.h functions to the changelog if you think it's useful,
but I thought for new files it was more usual just to do:

	* foo.h: New file.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [014/nnn] poly_int: indirect_refs_may_alias_p
  2017-11-20 13:31     ` Richard Sandiford
@ 2017-11-21  0:49       ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-21  0:49 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 11/20/2017 06:00 AM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
>> On 10/23/2017 11:05 AM, Richard Sandiford wrote:
>>> This patch makes indirect_refs_may_alias_p use ranges_may_overlap_p
>>> rather than ranges_overlap_p.  Unlike the former, the latter can handle
>>> negative offsets, so the fix for PR44852 should no longer be necessary.
>>> It can also handle offset_int, so avoids unchecked truncations to
>>> HOST_WIDE_INT.
>>>
>>>
>>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>>> 	    Alan Hayward  <alan.hayward@arm.com>
>>> 	    David Sherwood  <david.sherwood@arm.com>
>>>
>>> gcc/
>>> 	* tree-ssa-alias.c (indirect_ref_may_alias_decl_p)
>>> 	(indirect_refs_may_alias_p): Use ranges_may_overlap_p
>>> 	instead of ranges_overlap_p.
>> OK.
>>
>> Note that this highlighted a nit in patch 001 -- namely that there's new
>> function templates that aren't mentioned in the ChangeLog.
> 
> Do you mean ranges_may_overlap_p?  I can add that and the other new
> poly-int.h functions to the changelog if you think it's useful,
> but I thought for new files it was more usual just to do:
> 
> 	* foo.h: New file.
That's fine.  I was just having trouble finding it when I wanted to look
at it.  My mailer won't unwrap a compressed patch :-)

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [107/nnn] poly_int: GET_MODE_SIZE
  2017-10-23 17:48 ` [107/nnn] poly_int: GET_MODE_SIZE Richard Sandiford
@ 2017-11-21  7:48   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-21  7:48 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:43 AM, Richard Sandiford wrote:
> This patch changes GET_MODE_SIZE from unsigned short to poly_uint16.
> The non-mechanical parts were handled by previous patches.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* machmode.h (mode_size): Change from unsigned short to
> 	poly_uint16_pod.
> 	(mode_to_bytes): Return a poly_uint16 rather than an unsigned short.
> 	(GET_MODE_SIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
> 	or if measurement_type is not polynomial.
> 	(fixed_size_mode::includes_p): Check for constant-sized modes.
> 	* genmodes.c (emit_mode_size_inline): Make mode_size_inline
> 	return a poly_uint16 rather than an unsigned short.
> 	(emit_mode_size): Change the type of mode_size from unsigned short
> 	to poly_uint16_pod.  Use ZERO_COEFFS for the initializer.
> 	(emit_mode_adjustments): Cope with polynomial vector sizes.
> 	* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
> 	for GET_MODE_SIZE.
> 	* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
> 	for GET_MODE_SIZE.
> 	* auto-inc-dec.c (try_merge): Treat GET_MODE_SIZE as polynomial.
> 	* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Likewise.
> 	* caller-save.c (setup_save_areas): Likewise.
> 	(replace_reg_with_saved_mem): Likewise.
> 	* calls.c (emit_library_call_value_1): Likewise.
> 	* combine-stack-adj.c (combine_stack_adjustments_for_block): Likewise.
> 	* combine.c (simplify_set, make_extraction, simplify_shift_const_1)
> 	(gen_lowpart_for_combine): Likewise.
> 	* convert.c (convert_to_integer_1): Likewise.
> 	* cse.c (equiv_constant, cse_insn): Likewise.
> 	* cselib.c (autoinc_split, cselib_hash_rtx): Likewise.
> 	(cselib_subst_to_values): Likewise.
> 	* dce.c (word_dce_process_block): Likewise.
> 	* df-problems.c (df_word_lr_mark_ref): Likewise.
> 	* dwarf2cfi.c (init_one_dwarf_reg_size): Likewise.
> 	* dwarf2out.c (multiple_reg_loc_descriptor, mem_loc_descriptor)
> 	(concat_loc_descriptor, concatn_loc_descriptor, loc_descriptor)
> 	(rtl_for_decl_location): Likewise.
> 	* emit-rtl.c (gen_highpart, widen_memory_access): Likewise.
> 	* expmed.c (extract_bit_field_1, extract_integral_bit_field): Likewise.
> 	* expr.c (emit_group_load_1, clear_storage_hints): Likewise.
> 	(emit_move_complex, emit_move_multi_word, emit_push_insn): Likewise.
> 	(expand_expr_real_1): Likewise.
> 	* function.c (assign_parm_setup_block_p, assign_parm_setup_block)
> 	(pad_below): Likewise.
> 	* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
> 	* gimple-ssa-store-merging.c (rhs_valid_for_store_merging_p): Likewise.
> 	* ira.c (get_subreg_tracking_sizes): Likewise.
> 	* ira-build.c (ira_create_allocno_objects): Likewise.
> 	* ira-color.c (coalesced_pseudo_reg_slot_compare): Likewise.
> 	(ira_sort_regnos_for_alter_reg): Likewise.
> 	* ira-costs.c (record_operand_costs): Likewise.
> 	* lower-subreg.c (interesting_mode_p, simplify_gen_subreg_concatn)
> 	(resolve_simple_move): Likewise.
> 	* lra-constraints.c (get_reload_reg, operands_match_p): Likewise.
> 	(process_addr_reg, simplify_operand_subreg, lra_constraints): Likewise.
> 	(CONST_POOL_OK_P): Reject variable-sized modes.
> 	* lra-spills.c (slot, assign_mem_slot, pseudo_reg_slot_compare)
> 	(add_pseudo_to_slot, lra_spill): Likewise.
> 	* omp-low.c (omp_clause_aligned_alignment): Likewise.
> 	* optabs-query.c (get_best_extraction_insn): Likewise.
> 	* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
> 	* optabs.c (expand_vec_perm, expand_vec_cond_expr): Likewise.
> 	(expand_mult_highpart, valid_multiword_target_p): Likewise.
> 	* recog.c (offsettable_address_addr_space_p): Likewise.
> 	* regcprop.c (maybe_mode_change): Likewise.
> 	* reginfo.c (choose_hard_reg_mode, record_subregs_of_mode): Likewise.
> 	* regrename.c (build_def_use): Likewise.
> 	* regstat.c (dump_reg_info): Likewise.
> 	* reload.c (complex_word_subreg_p, push_reload, find_dummy_reload)
> 	(find_reloads, find_reloads_subreg_address): Likewise.
> 	* reload1.c (eliminate_regs_1): Likewise.
> 	* rtlanal.c (for_each_inc_dec_find_inc_dec, rtx_cost): Likewise.
> 	* simplify-rtx.c (avoid_constant_pool_reference): Likewise.
> 	(simplify_binary_operation_1, simplify_subreg): Likewise.
> 	* targhooks.c (default_function_arg_padding): Likewise.
> 	(default_hard_regno_nregs, default_class_max_nregs): Likewise.
> 	* tree-cfg.c (verify_gimple_assign_binary): Likewise.
> 	(verify_gimple_assign_ternary): Likewise.
> 	* tree-inline.c (estimate_move_cost): Likewise.
> 	* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
> 	* tree-ssa-loop-ivopts.c (add_autoinc_candidates): Likewise.
> 	(get_address_cost_ainc): Likewise.
> 	* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Likewise.
> 	(vect_supportable_dr_alignment): Likewise.
> 	* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
> 	(vectorizable_reduction): Likewise.
> 	* tree-vect-stmts.c (vectorizable_assignment, vectorizable_shift)
> 	(vectorizable_operation, vectorizable_load): Likewise.
> 	* tree.c (build_same_sized_truth_vector_type): Likewise.
> 	* valtrack.c (cleanup_auto_inc_dec): Likewise.
> 	* var-tracking.c (emit_note_insn_var_location): Likewise.
> 	* config/arc/arc.h (ASM_OUTPUT_CASE_END): Use as_a <scalar_int_mode>.
> 	(ADDR_VEC_ALIGN): Likewise.
I'm going to work backwards a bit and see if there's a batch of things I
can easily ack :-)

This is OK.  Obviously it can't go in until if/when the whole it is ack'd.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [106/nnn] poly_int: GET_MODE_BITSIZE
  2017-10-23 17:43 ` [106/nnn] poly_int: GET_MODE_BITSIZE Richard Sandiford
@ 2017-11-21  7:49   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-21  7:49 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:43 AM, Richard Sandiford wrote:
> This patch changes GET_MODE_BITSIZE from an unsigned short
> to a poly_uint16.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* machmode.h (mode_to_bits): Return a poly_uint16 rather than an
> 	unsigned short.
> 	(GET_MODE_BITSIZE): Return a constant if ONLY_FIXED_SIZE_MODES,
> 	or if measurement_type is polynomial.
> 	* calls.c (shift_return_value): Treat GET_MODE_BITSIZE as polynomial.
> 	* combine.c (make_extraction): Likewise.
> 	* dse.c (find_shift_sequence): Likewise.
> 	* dwarf2out.c (mem_loc_descriptor): Likewise.
> 	* expmed.c (store_integral_bit_field, extract_bit_field_1): Likewise.
> 	(extract_bit_field, extract_low_bits): Likewise.
> 	* expr.c (convert_move, convert_modes, emit_move_insn_1): Likewise.
> 	(optimize_bitfield_assignment_op, expand_assignment): Likewise.
> 	(store_field, expand_expr_real_1): Likewise.
> 	* fold-const.c (optimize_bit_field_compare, merge_ranges): Likewise.
> 	* gimple-fold.c (optimize_atomic_compare_exchange_p): Likewise.
> 	* reload.c (find_reloads): Likewise.
> 	* reload1.c (alter_reg): Likewise.
> 	* stor-layout.c (bitwise_mode_for_mode, compute_record_mode): Likewise.
> 	* targhooks.c (default_secondary_memory_needed_mode): Likewise.
> 	* tree-if-conv.c (predicate_mem_writes): Likewise.
> 	* tree-ssa-strlen.c (handle_builtin_memcmp): Likewise.
> 	* tree-vect-patterns.c (adjust_bool_pattern): Likewise.
> 	* tree-vect-stmts.c (vectorizable_simd_clone_call): Likewise.
> 	* valtrack.c (dead_debug_insert_temp): Likewise.
> 	* varasm.c (mergeable_constant_section): Likewise.
> 	* config/sh/sh.h (LOCAL_ALIGNMENT): Use as_a <fixed_size_mode>.
> 
> gcc/ada/
> 	* gcc-interface/misc.c (enumerate_modes): Treat GET_MODE_BITSIZE
> 	as polynomial.
> 
> gcc/c-family/
> 	* c-ubsan.c (ubsan_instrument_shift): Treat GET_MODE_BITSIZE
> 	as polynomial.
This is OK.  Obviously it can't go in until if/when the whole it is ack'd.

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [105/nnn] poly_int: expand_assignment
  2017-10-23 17:43 ` [105/nnn] poly_int: expand_assignment Richard Sandiford
@ 2017-11-21  7:50   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-21  7:50 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:42 AM, Richard Sandiford wrote:
> This patch makes the CONCAT handing in expand_assignment cope with
> polynomial mode sizes.  The mode of the CONCAT must be complex,
> so we can base the tests on the sizes of the real and imaginary
> components.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* expr.c (expand_assignment): Cope with polynomial mode sizes
> 	when assigning to a CONCAT.
This is OK.  Obviously it can't go in until if/when the whole it is ack'd.

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [102/nnn] poly_int: vect_permute_load/store_chain
  2017-10-23 17:42 ` [102/nnn] poly_int: vect_permute_load/store_chain Richard Sandiford
@ 2017-11-21  8:01   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-21  8:01 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:41 AM, Richard Sandiford wrote:
> The GET_MODE_NUNITS patch made vect_grouped_store_supported and
> vect_grouped_load_supported check for a constant number of elements,
> so vect_permute_store_chain and vect_permute_load_chain can assert
> for that.  This patch adds commentary to that effect; the actual
> asserts will be added by a later, more mechanical, patch.
> 
> The patch also reorganises the function so that the asserts
> are linked specifically to code that builds permute vectors
> element-by-element.  This allows a later patch to add support
> for some variable-length permutes.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-data-refs.c (vect_permute_store_chain): Reorganize
> 	so that both the length == 3 and length != 3 cases set up their
> 	own permute vectors.  Add comments explaining why we know the
> 	number of elements is constant.
> 	(vect_permute_load_chain): Likewise.
This is OK.  Obviously it can't go in until if/when the whole it is ack'd.

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [099/nnn] poly_int: struct_value_size
  2017-10-23 17:40 ` [099/nnn] poly_int: struct_value_size Richard Sandiford
@ 2017-11-21  8:14   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-21  8:14 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:40 AM, Richard Sandiford wrote:
> This patch makes calls.c treat struct_value_size (one of the
> operands to a call pattern) as polynomial.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* calls.c (emit_call_1, expand_call): Change struct_value_size from
> 	a HOST_WIDE_INT to a poly_int64.
This is OK.  Obviously it can't go in until if/when the whole it is ack'd.

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [070/nnn] poly_int: vectorizable_reduction
  2017-10-23 17:29 ` [070/nnn] poly_int: vectorizable_reduction Richard Sandiford
@ 2017-11-22 18:11   ` Richard Sandiford
  2017-12-06  0:33     ` Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-11-22 18:11 UTC (permalink / raw)
  To: gcc-patches

Richard Sandiford <richard.sandiford@linaro.org> writes:
> This patch makes vectorizable_reduction cope with variable-length vectors.
> We can handle the simple case of an inner loop reduction for which
> the target has native support for the epilogue operation.  For now we
> punt on other cases, but patches after the main SVE submission allow
> SLP and double reductions too.

Here's an updated version that applies on top of the recent removal
of REDUC_*_EXPR.

Thanks,
Richard


2017-11-22  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree.h (build_index_vector): Declare.
	* tree.c (build_index_vector): New function.
	* tree-vect-loop.c (get_initial_def_for_reduction): Treat the number
	of units as polynomial, forcibly converting it to a constant if
	vectorizable_reduction has already enforced the condition.
	(get_initial_defs_for_reduction): Likewise.
	(vect_create_epilog_for_reduction): Likewise.  Use build_index_vector
	to create a {1,2,3,...} vector.
	(vectorizable_reduction): Treat the number of units as polynomial.
	Choose vectype_in based on the largest scalar element size rather
	than the smallest number of units.  Enforce the restrictions
	relied on above.

Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-11-22 18:02:23.618126313 +0000
+++ gcc/tree.h	2017-11-22 18:02:28.724789004 +0000
@@ -4036,6 +4036,7 @@ extern tree build_vector (tree, vec<tree
 extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
 extern tree build_vector_from_val (tree, tree);
 extern tree build_vec_series (tree, tree, tree);
+extern tree build_index_vector (tree, poly_uint64, poly_uint64);
 extern void recompute_constructor_flags (tree);
 extern void verify_constructor_flags (tree);
 extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-11-22 18:02:23.618126313 +0000
+++ gcc/tree.c	2017-11-22 18:02:28.724789004 +0000
@@ -2027,6 +2027,37 @@ build_vec_series (tree type, tree base,
   return build2 (VEC_SERIES_EXPR, type, base, step);
 }
 
+/* Return a vector with the same number of units and number of bits
+   as VEC_TYPE, but in which the elements are a linear series of unsigned
+   integers { BASE, BASE + STEP, BASE + STEP * 2, ... }.  */
+
+tree
+build_index_vector (tree vec_type, poly_uint64 base, poly_uint64 step)
+{
+  tree index_vec_type = vec_type;
+  tree index_elt_type = TREE_TYPE (vec_type);
+  poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (vec_type);
+  if (!INTEGRAL_TYPE_P (index_elt_type) || !TYPE_UNSIGNED (index_elt_type))
+    {
+      index_elt_type = build_nonstandard_integer_type
+	(GET_MODE_BITSIZE (SCALAR_TYPE_MODE (index_elt_type)), true);
+      index_vec_type = build_vector_type (index_elt_type, nunits);
+    }
+
+  unsigned HOST_WIDE_INT count;
+  if (nunits.is_constant (&count))
+    {
+      auto_vec<tree, 32> v (count);
+      for (unsigned int i = 0; i < count; ++i)
+	v.quick_push (build_int_cstu (index_elt_type, base + i * step));
+      return build_vector (index_vec_type, v);
+    }
+
+  return build_vec_series (index_vec_type,
+			   build_int_cstu (index_elt_type, base),
+			   build_int_cstu (index_elt_type, step));
+}
+
 /* Something has messed with the elements of CONSTRUCTOR C after it was built;
    calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
 
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-11-22 18:02:23.618126313 +0000
+++ gcc/tree-vect-loop.c	2017-11-22 18:02:28.722773349 +0000
@@ -3997,11 +3997,10 @@ get_initial_def_for_reduction (gimple *s
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   tree scalar_type = TREE_TYPE (init_val);
   tree vectype = get_vectype_for_scalar_type (scalar_type);
-  int nunits;
+  poly_uint64 nunits;
   enum tree_code code = gimple_assign_rhs_code (stmt);
   tree def_for_init;
   tree init_def;
-  int i;
   bool nested_in_vect_loop = false;
   REAL_VALUE_TYPE real_init_val = dconst0;
   int int_init_val = 0;
@@ -4082,9 +4081,13 @@ get_initial_def_for_reduction (gimple *s
 	else
 	  {
 	    /* Option2: the first element is INIT_VAL.  */
-	    auto_vec<tree, 32> elts (nunits);
+
+	    /* Enforced by vectorizable_reduction (which disallows double
+	       reductions with variable-length vectors).  */
+	    unsigned int count = nunits.to_constant ();
+	    auto_vec<tree, 32> elts (count);
 	    elts.quick_push (init_val);
-	    for (i = 1; i < nunits; ++i)
+	    for (unsigned int i = 1; i < count; ++i)
 	      elts.quick_push (def_for_init);
 	    init_def = gimple_build_vector (&stmts, vectype, elts);
 	  }
@@ -4144,6 +4147,8 @@ get_initial_defs_for_reduction (slp_tree
 
   vector_type = STMT_VINFO_VECTYPE (stmt_vinfo);
   scalar_type = TREE_TYPE (vector_type);
+  /* vectorizable_reduction has already rejected SLP reductions on
+     variable-length vectors.  */
   nunits = TYPE_VECTOR_SUBPARTS (vector_type);
 
   gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def);
@@ -4510,8 +4515,7 @@ vect_create_epilog_for_reduction (vec<tr
   if (STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION)
     {
       tree indx_before_incr, indx_after_incr;
-      int nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
-      int k;
+      poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype);
 
       gimple *vec_stmt = STMT_VINFO_VEC_STMT (stmt_info);
       gcc_assert (gimple_assign_rhs_code (vec_stmt) == VEC_COND_EXPR);
@@ -4527,10 +4531,7 @@ vect_create_epilog_for_reduction (vec<tr
 	 vector size (STEP).  */
 
       /* Create a {1,2,3,...} vector.  */
-      auto_vec<tree, 32> vtemp (nunits_out);
-      for (k = 0; k < nunits_out; ++k)
-	vtemp.quick_push (build_int_cst (cr_index_scalar_type, k + 1));
-      tree series_vect = build_vector (cr_index_vector_type, vtemp);
+      tree series_vect = build_index_vector (cr_index_vector_type, 1, 1);
 
       /* Create a vector of the step value.  */
       tree step = build_int_cst (cr_index_scalar_type, nunits_out);
@@ -4908,8 +4909,11 @@ vect_create_epilog_for_reduction (vec<tr
       tree data_eltype = TREE_TYPE (TREE_TYPE (new_phi_result));
       tree idx_eltype = TREE_TYPE (TREE_TYPE (induction_index));
       unsigned HOST_WIDE_INT el_size = tree_to_uhwi (TYPE_SIZE (idx_eltype));
-      unsigned HOST_WIDE_INT v_size
-	= el_size * TYPE_VECTOR_SUBPARTS (TREE_TYPE (induction_index));
+      poly_uint64 nunits = TYPE_VECTOR_SUBPARTS (TREE_TYPE (induction_index));
+      /* Enforced by vectorizable_reduction, which ensures we have target
+	 support before allowing a conditional reduction on variable-length
+	 vectors.  */
+      unsigned HOST_WIDE_INT v_size = el_size * nunits.to_constant ();
       tree idx_val = NULL_TREE, val = NULL_TREE;
       for (unsigned HOST_WIDE_INT off = 0; off < v_size; off += el_size)
 	{
@@ -5026,6 +5030,9 @@ vect_create_epilog_for_reduction (vec<tr
     {
       bool reduce_with_shift = have_whole_vector_shift (mode);
       int element_bitsize = tree_to_uhwi (bitsize);
+      /* Enforced by vectorizable_reduction, which disallows SLP reductions
+	 for variable-length vectors and also requires direct target support
+	 for loop reductions.  */
       int vec_size_in_bits = tree_to_uhwi (TYPE_SIZE (vectype));
       tree vec_temp;
 
@@ -5706,10 +5713,10 @@ vectorizable_reduction (gimple *stmt, gi
 	  if (k == 1
 	      && gimple_assign_rhs_code (reduc_stmt) == COND_EXPR)
 	    continue;
-	  tem = get_vectype_for_scalar_type (TREE_TYPE (op));
-	  if (! vectype_in
-	      || TYPE_VECTOR_SUBPARTS (tem) < TYPE_VECTOR_SUBPARTS (vectype_in))
-	    vectype_in = tem;
+	  if (!vectype_in
+	      || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
+		  < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (op)))))
+	    vectype_in = get_vectype_for_scalar_type (TREE_TYPE (op));
 	  break;
 	}
       gcc_assert (vectype_in);
@@ -5875,7 +5882,8 @@ vectorizable_reduction (gimple *stmt, gi
 	  /* To properly compute ncopies we are interested in the widest
 	     input type in case we're looking at a widening accumulation.  */
 	  if (!vectype_in
-	      || TYPE_VECTOR_SUBPARTS (vectype_in) > TYPE_VECTOR_SUBPARTS (tem))
+	      || (GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (vectype_in)))
+		  < GET_MODE_SIZE (SCALAR_TYPE_MODE (TREE_TYPE (tem)))))
 	    vectype_in = tem;
 	}
 
@@ -6022,6 +6030,7 @@ vectorizable_reduction (gimple *stmt, gi
   gcc_assert (ncopies >= 1);
 
   vec_mode = TYPE_MODE (vectype_in);
+  poly_uint64 nunits_out = TYPE_VECTOR_SUBPARTS (vectype_out);
 
   if (code == COND_EXPR)
     {
@@ -6203,14 +6212,23 @@ vectorizable_reduction (gimple *stmt, gi
       int scalar_precision
 	= GET_MODE_PRECISION (SCALAR_TYPE_MODE (scalar_type));
       cr_index_scalar_type = make_unsigned_type (scalar_precision);
-      cr_index_vector_type = build_vector_type
-	(cr_index_scalar_type, TYPE_VECTOR_SUBPARTS (vectype_out));
+      cr_index_vector_type = build_vector_type (cr_index_scalar_type,
+						nunits_out);
 
       if (direct_internal_fn_supported_p (IFN_REDUC_MAX, cr_index_vector_type,
 					  OPTIMIZE_FOR_SPEED))
 	reduc_fn = IFN_REDUC_MAX;
     }
 
+  if (reduc_fn == IFN_LAST && !nunits_out.is_constant ())
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "missing target support for reduction on"
+			 " variable-length vectors.\n");
+      return false;
+    }
+
   if ((double_reduc
        || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) != TREE_CODE_REDUCTION)
       && ncopies > 1)
@@ -6222,6 +6240,27 @@ vectorizable_reduction (gimple *stmt, gi
       return false;
     }
 
+  if (double_reduc && !nunits_out.is_constant ())
+    {
+      /* The current double-reduction code creates the initial value
+	 element-by-element.  */
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "double reduction not supported for variable-length"
+			 " vectors.\n");
+      return false;
+    }
+
+  if (slp_node && !nunits_out.is_constant ())
+    {
+      /* The current SLP code creates the initial value element-by-element.  */
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "SLP reduction not supported for variable-length"
+			 " vectors.\n");
+      return false;
+    }
+
   /* In case of widenning multiplication by a constant, we update the type
      of the constant to be the type of the other operand.  We check that the
      constant fits the type in the pattern recognition pass.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [104/nnn] poly_int: GET_MODE_PRECISION
  2017-10-23 17:43 ` [104/nnn] poly_int: GET_MODE_PRECISION Richard Sandiford
@ 2017-11-28  8:07   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:07 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:42 AM, Richard Sandiford wrote:
> This patch changes GET_MODE_PRECISION from an unsigned short
> to a poly_uint16.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* machmode.h (mode_precision): Change from unsigned short to
> 	poly_uint16_pod.
> 	(mode_to_precision): Return a poly_uint16 rather than an unsigned
> 	short.
> 	(GET_MODE_PRECISION): Return a constant if ONLY_FIXED_SIZE_MODES,
> 	or if measurement_type is not polynomial.
> 	(HWI_COMPUTABLE_MODE_P): Turn into a function.  Optimize the case
> 	in which the mode is already known to be a scalar_int_mode.
> 	* genmodes.c (emit_mode_precision): Change the type of mode_precision
> 	from unsigned short to poly_uint16_pod.  Use ZERO_COEFFS for the
> 	initializer.
> 	* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
> 	for GET_MODE_PRECISION.
> 	* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
> 	for GET_MODE_PRECISION.
> 	* combine.c (update_rsp_from_reg_equal): Treat GET_MODE_PRECISION
> 	as polynomial.
> 	(try_combine, find_split_point, combine_simplify_rtx): Likewise.
> 	(expand_field_assignment, make_extraction): Likewise.
> 	(make_compound_operation_int, record_dead_and_set_regs_1): Likewise.
> 	(get_last_value): Likewise.
> 	* convert.c (convert_to_integer_1): Likewise.
> 	* cse.c (cse_insn): Likewise.
> 	* expr.c (expand_expr_real_1): Likewise.
> 	* lra-constraints.c (simplify_operand_subreg): Likewise.
> 	* optabs-query.c (can_atomic_load_p): Likewise.
> 	* optabs.c (expand_atomic_load): Likewise.
> 	(expand_atomic_store): Likewise.
> 	* ree.c (combine_reaching_defs): Likewise.
> 	* rtl.h (partial_subreg_p, paradoxical_subreg_p): Likewise.
> 	* rtlanal.c (nonzero_bits1, lsb_bitfield_op_p): Likewise.
> 	* tree.h (type_has_mode_precision_p): Likewise.
> 	* ubsan.c (instrument_si_overflow): Likewise.
> 
> gcc/ada/
> 	* gcc-interface/misc.c (enumerate_modes): Treat GET_MODE_PRECISION
> 	as polynomial.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [098/nnn] poly_int: load_register_parameters
  2017-10-23 17:40 ` [098/nnn] poly_int: load_register_parameters Richard Sandiford
@ 2017-11-28  8:08   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:08 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:40 AM, Richard Sandiford wrote:
> This patch makes load_register_parameters cope with polynomial sizes.
> The requirement here is that any register parameters with non-constant
> sizes must either have a specific mode (e.g. a variable-length vector
> mode) or must be represented with a PARALLEL.  This is in practice
> already a requirement for parameters passed in vector registers,
> since the default behaviour of splitting parameters into words doesn't
> make sense for them.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* calls.c (load_register_parameters): Cope with polynomial
> 	mode sizes.  Require a constant size for BLKmode parameters
> 	that aren't described by a PARALLEL.  If BLOCK_REG_PADDING
> 	forces a parameter to be padded at the lsb end in order to
> 	fill a complete number of words, require the parameter size
> 	to be ordered wrt UNITS_PER_WORD.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [097/nnn] poly_int: alter_reg
  2017-10-23 17:40 ` [097/nnn] poly_int: alter_reg Richard Sandiford
@ 2017-11-28  8:08   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:08 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:39 AM, Richard Sandiford wrote:
> This patch makes alter_reg cope with polynomial mode sizes.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* reload1.c (spill_stack_slot_width): Change element type
> 	from unsigned int to poly_uint64_pod.
> 	(alter_reg): Treat mode sizes as polynomial.
OK.
Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [096/nnn] poly_int: reloading complex subregs
  2017-10-23 17:39 ` [096/nnn] poly_int: reloading complex subregs Richard Sandiford
@ 2017-11-28  8:09   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:09 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:39 AM, Richard Sandiford wrote:
> This patch splits out a condition that is common to both push_reload
> and reload_inner_reg_of_subreg.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* reload.c (complex_word_subreg_p): New function.
> 	(reload_inner_reg_of_subreg, push_reload): Use it.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [095/nnn] poly_int: process_alt_operands
  2017-10-23 17:39 ` [095/nnn] poly_int: process_alt_operands Richard Sandiford
@ 2017-11-28  8:14   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:14 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:38 AM, Richard Sandiford wrote:
> This patch makes process_alt_operands check that the mode sizes
> are ordered, so that match_reload can validly treat them as subregs
> of one another.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* lra-constraints.c (process_alt_operands): Reject matched
> 	operands whose sizes aren't ordered.
> 	(match_reload): Refer to this check here.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [094/nnn] poly_int: expand_ifn_atomic_compare_exchange_into_call
  2017-10-23 17:38 ` [094/nnn] poly_int: expand_ifn_atomic_compare_exchange_into_call Richard Sandiford
@ 2017-11-28  8:31   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:31 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:37 AM, Richard Sandiford wrote:
> This patch makes the mode size assumptions in
> expand_ifn_atomic_compare_exchange_into_call a bit more
> explicit, so that a later patch can add a to_constant () call.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* builtins.c (expand_ifn_atomic_compare_exchange_into_call): Assert
> 	that the mode size is in the set {1, 2, 4, 8, 16}.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [093/nnn] poly_int: adjust_mems
  2017-10-23 17:37 ` [093/nnn] poly_int: adjust_mems Richard Sandiford
@ 2017-11-28  8:32   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:32 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:37 AM, Richard Sandiford wrote:
> This patch makes the var-tracking.c handling of autoinc addresses
> cope with polynomial mode sizes.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* var-tracking.c (adjust_mems): Treat mode sizes as polynomial.
> 	Use plus_constant instead of gen_rtx_PLUS.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [091/nnn] poly_int: emit_single_push_insn_1
  2017-10-23 17:37 ` [091/nnn] poly_int: emit_single_push_insn_1 Richard Sandiford
@ 2017-11-28  8:33   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:33 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:36 AM, Richard Sandiford wrote:
> This patch makes emit_single_push_insn_1 cope with polynomial mode sizes.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* expr.c (emit_single_push_insn_1): Treat mode sizes as polynomial.
> 	Use plus_constant instead of gen_rtx_PLUS.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [090/nnn] poly_int: set_inc_state
  2017-10-23 17:36 ` [090/nnn] poly_int: set_inc_state Richard Sandiford
@ 2017-11-28  8:35   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:35 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:36 AM, Richard Sandiford wrote:
> This trivial patch makes auto-inc-dec.c:set_inc_state take a poly_int64.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* auto-inc-dec.c (set_inc_state): Take the mode size as a poly_int64
> 	rather than an int.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [089/nnn] poly_int: expand_expr_real_1
  2017-10-23 17:36 ` [089/nnn] poly_int: expand_expr_real_1 Richard Sandiford
@ 2017-11-28  8:41   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:41 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:35 AM, Richard Sandiford wrote:
> This patch makes the VIEW_CONVERT_EXPR handling in expand_expr_real_1
> cope with polynomial type and mode sizes.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* expr.c (expand_expr_real_1): Use tree_to_poly_uint64
> 	instead of int_size_in_bytes when handling VIEW_CONVERT_EXPRs
> 	via stack temporaries.  Treat the mode size as polynomial too.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [088/nnn] poly_int: expand_expr_real_2
  2017-10-23 17:35 ` [088/nnn] poly_int: expand_expr_real_2 Richard Sandiford
@ 2017-11-28  8:49   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28  8:49 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:35 AM, Richard Sandiford wrote:
> This patch makes expand_expr_real_2 cope with polynomial mode sizes
> when handling conversions involving a union type.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* expr.c (expand_expr_real_2): When handling conversions involving
> 	unions, apply tree_to_poly_uint64 to the TYPE_SIZE rather than
> 	multiplying int_size_in_bytes by BITS_PER_UNIT.  Treat GET_MODE_BISIZE
> 	as a poly_uint64 too.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [092/nnn] poly_int: PUSH_ROUNDING
  2017-10-23 17:37 ` [092/nnn] poly_int: PUSH_ROUNDING Richard Sandiford
@ 2017-11-28 16:21   ` Jeff Law
  2017-11-28 18:01     ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:21 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:37 AM, Richard Sandiford wrote:
> PUSH_ROUNDING is difficult to convert to a hook since there is still
> a lot of conditional code based on it.  It isn't clear that a direct
> conversion with checks for null hooks is the right thing to do.
> 
> Rather than untangle that, this patch converts all implementations
> that do something to out-of-line functions that have the same
> interface as a hook would have.  This should at least help towards
> any future hook conversion.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* config/cr16/cr16-protos.h (cr16_push_rounding): Declare.
> 	* config/cr16/cr16.h (PUSH_ROUNDING): Move implementation to...
> 	* config/cr16/cr16.c (cr16_push_rounding): ...this new function.
> 	* config/h8300/h8300-protos.h (h8300_push_rounding): Declare.
> 	* config/h8300/h8300.h (PUSH_ROUNDING): Move implementation to...
> 	* config/h8300/h8300.c (h8300_push_rounding): ...this new function.
> 	* config/i386/i386-protos.h (ix86_push_rounding): Declare.
> 	* config/i386/i386.h (PUSH_ROUNDING): Move implementation to...
> 	* config/i386/i386.c (ix86_push_rounding): ...this new function.
> 	* config/m32c/m32c-protos.h (m32c_push_rounding): Take and return
> 	a poly_int64.
> 	* config/m32c/m32c.c (m32c_push_rounding): Likewise.
> 	* config/m68k/m68k-protos.h (m68k_push_rounding): Declare.
> 	* config/m68k/m68k.h (PUSH_ROUNDING): Move implementation to...
> 	* config/m68k/m68k.c (m68k_push_rounding): ...this new function.
> 	* config/pdp11/pdp11-protos.h (pdp11_push_rounding): Declare.
> 	* config/pdp11/pdp11.h (PUSH_ROUNDING): Move implementation to...
> 	* config/pdp11/pdp11.c (pdp11_push_rounding): ...this new function.
> 	* config/stormy16/stormy16-protos.h (xstormy16_push_rounding): Declare.
> 	* config/stormy16/stormy16.h (PUSH_ROUNDING): Move implementation to...
> 	* config/stormy16/stormy16.c (xstormy16_push_rounding): ...this new
> 	function.
> 	* expr.c (emit_move_resolve_push): Treat the input and result
> 	of PUSH_ROUNDING as a poly_int64.
> 	(emit_move_complex_push, emit_single_push_insn_1): Likewise.
> 	(emit_push_insn): Likewise.
> 	* lra-eliminations.c (mark_not_eliminable): Likewise.
> 	* recog.c (push_operand): Likewise.
> 	* reload1.c (elimination_effects): Likewise.
> 	* rtlanal.c (nonzero_bits1): Likewise.
> 	* calls.c (store_one_arg): Likewise.  Require the padding to be
> 	known at compile time.
OK.

I so wish PUSH_ROUNDING wasn't needed and that folks could at least keep
their processors consistent (I'm looking at the coldfire designers :(.
For a tale of woe, see BZ68467.

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [087/nnn] poly_int: subreg_get_info
  2017-10-23 17:35 ` [087/nnn] poly_int: subreg_get_info Richard Sandiford
@ 2017-11-28 16:29   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:29 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:35 AM, Richard Sandiford wrote:
> This patch makes subreg_get_info handle polynomial sizes.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* rtlanal.c (subreg_get_info): Handle polynomial mode sizes.
OK.
Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [085/nnn] poly_int: expand_vector_ubsan_overflow
  2017-10-23 17:34 ` [085/nnn] poly_int: expand_vector_ubsan_overflow Richard Sandiford
@ 2017-11-28 16:33   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:33 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:34 AM, Richard Sandiford wrote:
> This patch makes expand_vector_ubsan_overflow cope with a polynomial
> number of elements.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* internal-fn.c (expand_vector_ubsan_overflow): Handle polynomial
> 	numbers of elements.
OK
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [084/nnn] poly_int: folding BIT_FIELD_REFs on vectors
  2017-10-23 17:34 ` [084/nnn] poly_int: folding BIT_FIELD_REFs on vectors Richard Sandiford
@ 2017-11-28 16:33   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:33 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:33 AM, Richard Sandiford wrote:
> This patch makes the:
> 
>   (BIT_FIELD_REF CONSTRUCTOR@0 @1 @2)
> 
> folder cope with polynomial numbers of elements.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* match.pd: Cope with polynomial numbers of vector elements.
ARgh.  It took me a moment of wondering why it didn't look like C code.
It's match.pd :-)

OK.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [083/nnn] poly_int: fold_indirect_ref_1
  2017-10-23 17:34 ` [083/nnn] poly_int: fold_indirect_ref_1 Richard Sandiford
@ 2017-11-28 16:34   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:34 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:33 AM, Richard Sandiford wrote:
> This patch makes fold_indirect_ref_1 handle polynomial offsets in
> a POINTER_PLUS_EXPR.  The specific reason for doing this now is
> to handle:
> 
>  		  (tree_to_uhwi (part_width) / BITS_PER_UNIT
>  		   * TYPE_VECTOR_SUBPARTS (op00type));
> 
> when TYPE_VECTOR_SUBPARTS becomes a poly_int.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* fold-const.c (fold_indirect_ref_1): Handle polynomial offsets
> 	in a POINTER_PLUS_EXPR.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [082/nnn] poly_int: omp-simd-clone.c
  2017-10-23 17:33 ` [082/nnn] poly_int: omp-simd-clone.c Richard Sandiford
@ 2017-11-28 16:36   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:36 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:33 AM, Richard Sandiford wrote:
> This patch adds a wrapper around TYPE_VECTOR_SUBPARTS for omp-simd-clone.c.
> Supporting SIMD clones for variable-length vectors is post GCC8 work.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* omp-simd-clone.c (simd_clone_subparts): New function.
> 	(simd_clone_init_simd_arrays): Use it instead of TYPE_VECTOR_SUBPARTS.
> 	(ipa_simd_modify_function_body): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [078/nnn] poly_int: two-operation SLP
  2017-10-23 17:32 ` [078/nnn] poly_int: two-operation SLP Richard Sandiford
@ 2017-11-28 16:41   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:41 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:31 AM, Richard Sandiford wrote:
> This patch makes two-operation SLP handle but reject variable-length
> vectors.  Adding support for this is a post-GCC8 thing.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-slp.c (vect_build_slp_tree_1): Handle polynomial
> 	numbers of units.
> 	(vect_schedule_slp_instance): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [077/nnn] poly_int: vect_get_constant_vectors
  2017-10-23 17:31 ` [077/nnn] poly_int: vect_get_constant_vectors Richard Sandiford
@ 2017-11-28 16:43   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:43 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:31 AM, Richard Sandiford wrote:
> For now, vect_get_constant_vectors can only cope with constant-length
> vectors, although a patch after the main SVE submission relaxes this.
> This patch adds an appropriate guard for variable-length vectors.
> The TYPE_VECTOR_SUBPARTS use in vect_get_constant_vectors will then
> have a to_constant call when TYPE_VECTOR_SUBPARTS becomes a poly_int.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-slp.c (vect_get_and_check_slp_defs): Reject
> 	constant and extern definitions for variable-length vectors.
> 	(vect_get_constant_vectors): Note that the number of units
> 	is known to be constant.
OK.
jeff

ps.  Sorry about the strange ordering of acks.  I'm trying to work
through the simple stuff and come back to the larger patches.  The only
way to eat an elephant is a bite at a time...

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [076/nnn] poly_int: vectorizable_conversion
  2017-10-23 17:31 ` [076/nnn] poly_int: vectorizable_conversion Richard Sandiford
@ 2017-11-28 16:44   ` Jeff Law
  2017-11-28 18:15     ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:44 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:30 AM, Richard Sandiford wrote:
> This patch makes vectorizable_conversion cope with variable-length
> vectors.  We already require the number of elements in one vector
> to be a multiple of the number of elements in the other vector,
> so the patch uses that to choose between widening and narrowing.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-stmts.c (vectorizable_conversion): Treat the number
> 	of units as polynomial.  Choose between WIDE and NARROW based
> 	on multiple_p.
If I'm reding this right, if nunits_in < nunits_out, but the latter is
not a multiple of the former, we'll choose WIDEN, which is the opposite
of what we'd do before this patch.  Was that intentional?


jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [075/nnn] poly_int: vectorizable_simd_clone_call
  2017-10-23 17:31 ` [075/nnn] poly_int: vectorizable_simd_clone_call Richard Sandiford
@ 2017-11-28 16:45   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:45 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:30 AM, Richard Sandiford wrote:
> This patch makes vectorizable_simd_clone_call cope with variable-length
> vectors.  For now we don't support SIMD clones for variable-length
> vectors; this will be post GCC 8 material.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-stmts.c (simd_clone_subparts): New function.
> 	(vectorizable_simd_clone_call): Use it instead of TYPE_VECTOR_SUBPARTS.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [074/nnn] poly_int: vectorizable_call
  2017-10-23 17:30 ` [074/nnn] poly_int: vectorizable_call Richard Sandiford
@ 2017-11-28 16:46   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:46 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:30 AM, Richard Sandiford wrote:
> This patch makes vectorizable_call handle variable-length vectors.
> The only substantial change is to use build_index_vector for
> IFN_GOMP_SIMD_LANE; this makes no functional difference for
> fixed-length vectors.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-stmts.c (vectorizable_call): Treat the number of
> 	vectors as polynomial.  Use build_index_vector for
> 	IFN_GOMP_SIMD_LANE.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [072/nnn] poly_int: vectorizable_live_operation
  2017-10-23 17:30 ` [072/nnn] poly_int: vectorizable_live_operation Richard Sandiford
@ 2017-11-28 16:47   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:47 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:29 AM, Richard Sandiford wrote:
> This patch makes vectorizable_live_operation cope with variable-length
> vectors.  For now we just handle cases in which we can tell at compile
> time which vector contains the final result.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-loop.c (vectorizable_live_operation): Treat the number
> 	of units as polynomial.  Punt if we can't tell at compile time
> 	which vector contains the final result.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [067/nnn] poly_int: get_mask_mode
  2017-10-23 17:28 ` [067/nnn] poly_int: get_mask_mode Richard Sandiford
@ 2017-11-28 16:48   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:48 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:27 AM, Richard Sandiford wrote:
> This patch makes TARGET_GET_MASK_MODE take polynomial nunits and
> vector_size arguments.  The gcc_assert in default_get_mask_mode
> is now handled by the exact_div call in vector_element_size.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* target.def (get_mask_mode): Take the number of units and length
> 	as poly_uint64s rather than unsigned ints.
> 	* targhooks.h (default_get_mask_mode): Update accordingly.
> 	* targhooks.c (default_get_mask_mode): Likewise.
> 	* config/i386/i386.c (ix86_get_mask_mode): Likewise.
> 	* doc/tm.texi: Regenerate.
>
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [069/nnn] poly_int: vector_alignment_reachable_p
  2017-10-23 17:29 ` [069/nnn] poly_int: vector_alignment_reachable_p Richard Sandiford
@ 2017-11-28 16:48   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:48 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:28 AM, Richard Sandiford wrote:
> This patch makes vector_alignment_reachable_p cope with variable-length
> vectors.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-data-refs.c (vector_alignment_reachable_p): Treat the
> 	number of units as polynomial.

OK
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [061/nnn] poly_int: compute_data_ref_alignment
  2017-10-23 17:25 ` [061/nnn] poly_int: compute_data_ref_alignment Richard Sandiford
@ 2017-11-28 16:49   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:49 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:25 AM, Richard Sandiford wrote:
> This patch makes vect_compute_data_ref_alignment treat DR_INIT as a
> poly_int and handles cases in which the calculated misalignment might
> not be constant.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
> 	Treat drb->init as a poly_int.  Fail if its misalignment wrt
> 	vector_alignment isn't known.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [058/nnn] poly_int: get_binfo_at_offset
  2017-10-23 17:24 ` [058/nnn] poly_int: get_binfo_at_offset Richard Sandiford
@ 2017-11-28 16:50   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:50 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:24 AM, Richard Sandiford wrote:
> This patch changes the offset parameter to get_binfo_at_offset
> from HOST_WIDE_INT to poly_int64.  This function probably doesn't
> need to handle polynomial offsets in practice, but it's easy
> to do and avoids forcing the caller to check first.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree.h (get_binfo_at_offset): Take the offset as a poly_int64
> 	rather than a HOST_WIDE_INT.
> 	* tree.c (get_binfo_at_offset): Likewise.
>
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [057/nnn] poly_int: build_ref_for_offset
  2017-10-23 17:24 ` [057/nnn] poly_int: build_ref_for_offset Richard Sandiford
@ 2017-11-28 16:51   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:51 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:23 AM, Richard Sandiford wrote:
> This patch changes the offset parameter to build_ref_for_offset
> from HOST_WIDE_INT to poly_int64.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* ipa-prop.h (build_ref_for_offset): Take the offset as a poly_int64
> 	rather than a HOST_WIDE_INT.
> 	* tree-sra.c (build_ref_for_offset): Likewise.
> 
OK
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [055/nnn] poly_int: find_bswap_or_nop_load
  2017-10-23 17:23 ` [055/nnn] poly_int: find_bswap_or_nop_load Richard Sandiford
@ 2017-11-28 16:52   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:52 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:23 AM, Richard Sandiford wrote:
> This patch handles polynomial offsets in find_bswap_or_nop_load,
> which could be useful for constant-sized data at a variable offset.
> It is needed for a later patch to compile.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-ssa-math-opts.c (find_bswap_or_nop_load): Track polynomial
> 	offsets for MEM_REFs.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [053/nnn] poly_int: decode_addr_const
  2017-10-23 17:22 ` [053/nnn] poly_int: decode_addr_const Richard Sandiford
@ 2017-11-28 16:53   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:53 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:22 AM, Richard Sandiford wrote:
> This patch makes the varasm-local addr_const track polynomial offsets.
> I'm not sure how useful this is, but it was easier to convert than not.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* varasm.c (addr_const::offset): Change from HOST_WIDE_INT
> 	to poly_int64.
> 	(decode_addr_const): Update accordingly.
> 
OK
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [054/nnn] poly_int: adjust_ptr_info_misalignment
  2017-10-23 17:23 ` [054/nnn] poly_int: adjust_ptr_info_misalignment Richard Sandiford
@ 2017-11-28 16:53   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:53 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:22 AM, Richard Sandiford wrote:
> This patch makes adjust_ptr_info_misalignment take the adjustment
> as a poly_uint64 rather than an unsigned int.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-ssanames.h (adjust_ptr_info_misalignment): Take the increment
> 	as a poly_uint64 rather than an unsigned int.
> 	* tree-ssanames.c (adjust_ptr_info_misalignment): Likewise.
> 
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [050/nnn] poly_int: reload<->ira interface
  2017-10-23 17:21 ` [050/nnn] poly_int: reload<->ira interface Richard Sandiford
@ 2017-11-28 16:55   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 16:55 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:21 AM, Richard Sandiford wrote:
> This patch uses poly_int64 for:
> 
> - ira_reuse_stack_slot
> - ira_mark_new_stack_slot
> - ira_spilled_reg_stack_slot::width
> 
> all of which are part of the IRA/reload interface.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* ira-int.h (ira_spilled_reg_stack_slot::width): Change from
> 	an unsigned int to a poly_uint64.
> 	* ira.h (ira_reuse_stack_slot, ira_mark_new_stack_slot): Take the
> 	sizes as poly_uint64s rather than unsigned ints.
> 	* ira-color.c (ira_reuse_stack_slot, ira_mark_new_stack_slot):
> 	Likewise.
OK
Jeff



^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [049/nnn] poly_int: emit_inc
  2017-10-23 17:21 ` [049/nnn] poly_int: emit_inc Richard Sandiford
@ 2017-11-28 17:30   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 17:30 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:20 AM, Richard Sandiford wrote:
> This patch changes the LRA emit_inc routine so that it takes
> a poly_int64 rather than an int.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* lra-constraints.c (emit_inc): Change inc_amount from an int
> 	to a poly_int64.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [036/nnn] poly_int: get_object_alignment_2
  2017-10-23 17:14 ` [036/nnn] poly_int: get_object_alignment_2 Richard Sandiford
@ 2017-11-28 17:37   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 17:37 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:14 AM, Richard Sandiford wrote:
> This patch makes get_object_alignment_2 track polynomial offsets
> and sizes.  The real work is done by get_inner_reference, but we
> then need to handle the alignment correctly.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* builtins.c (get_object_alignment_2): Track polynomial offsets
> 	and sizes.  Update the alignment handling.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [033/nnn] poly_int: pointer_may_wrap_p
  2017-10-23 17:13 ` [033/nnn] poly_int: pointer_may_wrap_p Richard Sandiford
@ 2017-11-28 17:44   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 17:44 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:13 AM, Richard Sandiford wrote:
> This patch changes the bitpos argument to pointer_may_wrap_p from
> HOST_WIDE_INT to poly_int64.  A later patch makes the callers track
> polynomial offsets.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* fold-const.c (pointer_may_wrap_p): Take the offset as a
> 	HOST_WIDE_INT rather than a poly_int64.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [032/nnn] poly_int: symbolic_number
  2017-10-23 17:13 ` [032/nnn] poly_int: symbolic_number Richard Sandiford
@ 2017-11-28 17:45   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 17:45 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:12 AM, Richard Sandiford wrote:
> This patch changes symbol_number::bytepos from a HOST_WIDE_INT
> to a poly_int64.  perform_symbolic_merge can cope with symbolic
> offsets as long as the difference between the two offsets is
> constant.  (This could happen for a constant-sized field that
> occurs at a variable offset, for example.)
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-ssa-math-opts.c (symbolic_number::bytepos): Change from
> 	HOST_WIDE_INT to poly_int64.
> 	(perform_symbolic_merge): Update accordingly.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [028/nnn] poly_int: ipa_parm_adjustment
  2017-10-23 17:12 ` [028/nnn] poly_int: ipa_parm_adjustment Richard Sandiford
@ 2017-11-28 17:47   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 17:47 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:11 AM, Richard Sandiford wrote:
> This patch changes the type of ipa_parm_adjustment::offset from
> HOST_WIDE_INT to poly_int64 and updates uses accordingly.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* ipa-prop.h (ipa_parm_adjustment::offset): Change from
> 	HOST_WIDE_INT to poly_int64_pod.
> 	* ipa-prop.c (ipa_modify_call_arguments): Track polynomail
> 	parameter offsets.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [026/nnn] poly_int: operand_subword
  2017-10-23 17:11 ` [026/nnn] poly_int: operand_subword Richard Sandiford
@ 2017-11-28 17:51   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 17:51 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:10 AM, Richard Sandiford wrote:
> This patch makes operand_subword and operand_subword_force take
> polynomial offsets.  This is a fairly old-school interface and
> these days should only be used when splitting multiword operations
> into word operations.  It still doesn't hurt to support polynomial
> offsets and it helps make callers easier to write.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* rtl.h (operand_subword, operand_subword_force): Take the offset
> 	as a poly_uint64 an unsigned int.
> 	* emit-rtl.c (operand_subword, operand_subword_force): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [034/nnn] poly_int: get_inner_reference_aff
  2017-10-23 17:14 ` [034/nnn] poly_int: get_inner_reference_aff Richard Sandiford
@ 2017-11-28 17:56   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 17:56 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:13 AM, Richard Sandiford wrote:
> This patch makes get_inner_reference_aff return the size as a
> poly_widest_int rather than a widest_int.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-affine.h (get_inner_reference_aff): Return the size as a
> 	poly_widest_int.
> 	* tree-affine.c (get_inner_reference_aff): Likewise.
> 	* tree-data-ref.c (dr_may_alias_p): Update accordingly.
> 	* tree-ssa-loop-im.c (mem_refs_may_alias_p): Likewise.
> 
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [046/nnn] poly_int: instantiate_virtual_regs
  2017-10-23 17:20 ` [046/nnn] poly_int: instantiate_virtual_regs Richard Sandiford
@ 2017-11-28 18:00   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 18:00 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:19 AM, Richard Sandiford wrote:
> This patch makes the instantiate virtual regs pass track offsets
> as poly_ints.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* function.c (in_arg_offset, var_offset, dynamic_offset)
> 	(out_arg_offset, cfa_offset): Change from int to poly_int64.
> 	(instantiate_new_reg): Return the new offset as a poly_int64_pod
> 	rather than a HOST_WIDE_INT.
> 	(instantiate_virtual_regs_in_rtx): Track polynomial offsets.
> 	(instantiate_virtual_regs_in_insn): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [039/nnn] poly_int: pass_store_merging::execute
  2017-10-23 17:17 ` [039/nnn] poly_int: pass_store_merging::execute Richard Sandiford
@ 2017-11-28 18:00   ` Jeff Law
  2017-12-20 12:59     ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-11-28 18:00 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:17 AM, Richard Sandiford wrote:
> This patch makes pass_store_merging::execute track polynomial sizes
> and offsets.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* gimple-ssa-store-merging.c (pass_store_merging::execute): Track
> 	polynomial sizes and offsets.
OK.  THough I wouldn't be surprised if this needs revamping after
Jakub's work in this space.

It wasn't clear why you moved some of the code which computes invalid vs
where we test invalid, but I don't see any problem with that movement of
code.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [092/nnn] poly_int: PUSH_ROUNDING
  2017-11-28 16:21   ` Jeff Law
@ 2017-11-28 18:01     ` Richard Sandiford
  2017-11-28 18:10       ` PUSH_ROUNDING Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-11-28 18:01 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 11:37 AM, Richard Sandiford wrote:
>> PUSH_ROUNDING is difficult to convert to a hook since there is still
>> a lot of conditional code based on it.  It isn't clear that a direct
>> conversion with checks for null hooks is the right thing to do.
>> 
>> Rather than untangle that, this patch converts all implementations
>> that do something to out-of-line functions that have the same
>> interface as a hook would have.  This should at least help towards
>> any future hook conversion.
>> 
>> 
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* config/cr16/cr16-protos.h (cr16_push_rounding): Declare.
>> 	* config/cr16/cr16.h (PUSH_ROUNDING): Move implementation to...
>> 	* config/cr16/cr16.c (cr16_push_rounding): ...this new function.
>> 	* config/h8300/h8300-protos.h (h8300_push_rounding): Declare.
>> 	* config/h8300/h8300.h (PUSH_ROUNDING): Move implementation to...
>> 	* config/h8300/h8300.c (h8300_push_rounding): ...this new function.
>> 	* config/i386/i386-protos.h (ix86_push_rounding): Declare.
>> 	* config/i386/i386.h (PUSH_ROUNDING): Move implementation to...
>> 	* config/i386/i386.c (ix86_push_rounding): ...this new function.
>> 	* config/m32c/m32c-protos.h (m32c_push_rounding): Take and return
>> 	a poly_int64.
>> 	* config/m32c/m32c.c (m32c_push_rounding): Likewise.
>> 	* config/m68k/m68k-protos.h (m68k_push_rounding): Declare.
>> 	* config/m68k/m68k.h (PUSH_ROUNDING): Move implementation to...
>> 	* config/m68k/m68k.c (m68k_push_rounding): ...this new function.
>> 	* config/pdp11/pdp11-protos.h (pdp11_push_rounding): Declare.
>> 	* config/pdp11/pdp11.h (PUSH_ROUNDING): Move implementation to...
>> 	* config/pdp11/pdp11.c (pdp11_push_rounding): ...this new function.
>> 	* config/stormy16/stormy16-protos.h (xstormy16_push_rounding): Declare.
>> 	* config/stormy16/stormy16.h (PUSH_ROUNDING): Move implementation to...
>> 	* config/stormy16/stormy16.c (xstormy16_push_rounding): ...this new
>> 	function.
>> 	* expr.c (emit_move_resolve_push): Treat the input and result
>> 	of PUSH_ROUNDING as a poly_int64.
>> 	(emit_move_complex_push, emit_single_push_insn_1): Likewise.
>> 	(emit_push_insn): Likewise.
>> 	* lra-eliminations.c (mark_not_eliminable): Likewise.
>> 	* recog.c (push_operand): Likewise.
>> 	* reload1.c (elimination_effects): Likewise.
>> 	* rtlanal.c (nonzero_bits1): Likewise.
>> 	* calls.c (store_one_arg): Likewise.  Require the padding to be
>> 	known at compile time.
> OK.
>
> I so wish PUSH_ROUNDING wasn't needed and that folks could at least keep
> their processors consistent (I'm looking at the coldfire designers :(.
> For a tale of woe, see BZ68467.

Ouch.  Is this also fallout from having different code for libcalls
and normal calls?  That always seemed like an accident waiting to
happen, but I don't remember seeing cases where it caused actual ABI
breakage before.

Thanks as ever for the reviews :-)

Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* PUSH_ROUNDING
  2017-11-28 18:01     ` Richard Sandiford
@ 2017-11-28 18:10       ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 18:10 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 11/28/2017 11:00 AM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:

>>
>> I so wish PUSH_ROUNDING wasn't needed and that folks could at least keep
>> their processors consistent (I'm looking at the coldfire designers :(.
>> For a tale of woe, see BZ68467.
> 
> Ouch.  Is this also fallout from having different code for libcalls
> and normal calls?  That always seemed like an accident waiting to
> happen, but I don't remember seeing cases where it caused actual ABI
> breakage before.
Yup.  Essentially the caller uses a libcall interface where promotions
are not occurring, but there's no way to describe that at the source
level to the implementation of the libcall and the implementation thus
expects the usual argument promotions.  At least that's how it looked
when I started poking a bit.  At that point, I had to stop as I couldn't
justify the time to dig further for an m68k issue...


> 
> Thanks as ever for the reviews :-)
You're welcome.  Still lots to do, but at least some progress whittling
it down.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [076/nnn] poly_int: vectorizable_conversion
  2017-11-28 16:44   ` Jeff Law
@ 2017-11-28 18:15     ` Richard Sandiford
  2017-12-05 17:49       ` Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-11-28 18:15 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 11:30 AM, Richard Sandiford wrote:
>> This patch makes vectorizable_conversion cope with variable-length
>> vectors.  We already require the number of elements in one vector
>> to be a multiple of the number of elements in the other vector,
>> so the patch uses that to choose between widening and narrowing.
>> 
>> 
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* tree-vect-stmts.c (vectorizable_conversion): Treat the number
>> 	of units as polynomial.  Choose between WIDE and NARROW based
>> 	on multiple_p.
> If I'm reding this right, if nunits_in < nunits_out, but the latter is
> not a multiple of the former, we'll choose WIDEN, which is the opposite
> of what we'd do before this patch.  Was that intentional?

That case isn't possible, so we'd assert:

  if (must_eq (nunits_out, nunits_in))
    modifier = NONE;
  else if (multiple_p (nunits_out, nunits_in))
    modifier = NARROW;
  else
    {
      gcc_checking_assert (multiple_p (nunits_in, nunits_out));
      modifier = WIDEN;
    }

We already implicitly rely on this, since we either widen one full
vector to N full vectors or narrow N full vectors to one vector.

Structurally this is enforced by all vectors having the same number of
bytes (current_vector_size) and the number of vector elements being a
power of 2 (or in the case of poly_int, a power of 2 times a runtime
variant, but that's good enough, since the runtime invariant is the same
in both cases).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [024/nnn] poly_int: ira subreg liveness tracking
  2017-10-23 17:10 ` [024/nnn] poly_int: ira subreg liveness tracking Richard Sandiford
@ 2017-11-28 21:10   ` Jeff Law
  2017-12-05 21:54     ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-11-28 21:10 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:09 AM, Richard Sandiford wrote:
> Normmaly the IRA-reload interface tries to track the liveness of
> individual bytes of an allocno if the allocno is sometimes written
> to as a SUBREG.  This isn't possible for variable-sized allocnos,
> but it doesn't matter because targets with variable-sized registers
> should use LRA instead.
> 
> This patch adds a get_subreg_tracking_sizes function for deciding
> whether it is possible to model a partial read or write.  Later
> patches make it return false if anything is variable.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* ira.c (get_subreg_tracking_sizes): New function.
> 	(init_live_subregs): Take an integer size rather than a register.
> 	(build_insn_chain): Use get_subreg_tracking_sizes.  Update calls
> 	to init_live_subregs.
OK.

Note this is starting to get close to the discussion around CLOBBER_HIGH
vs using a self set with a low subreg that we're having with Alan on
another thread in that liveness tracking of subregs of SVE regs could
potentially use some improvements.

When I quickly looked at the subreg handling in the df infrstructure my
first thought was that it might need some updating for SVE.  I can't
immediately call bits for poly_int/SVE in the patches to-date.  Have you
dug in there at all for the poly_int/SVE work?

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [038/nnn] poly_int: fold_comparison
  2017-10-23 17:17 ` [038/nnn] poly_int: fold_comparison Richard Sandiford
@ 2017-11-28 21:47   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 21:47 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:16 AM, Richard Sandiford wrote:
> This patch makes fold_comparison track polynomial offsets when
> folding address comparisons.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* fold-const.c (fold_comparison): Track sizes and offsets as
> 	poly_int64s rather than HOST_WIDE_INTs when folding address
> 	comparisons.
> 
OK.

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [044/nnn] poly_int: push_block/emit_push_insn
  2017-10-23 17:19 ` [044/nnn] poly_int: push_block/emit_push_insn Richard Sandiford
@ 2017-11-28 22:18   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-11-28 22:18 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:19 AM, Richard Sandiford wrote:
> This patch changes the "extra" parameters to push_block and
> emit_push_insn from int to poly_int64.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* expr.h (push_block, emit_push_insn): Change the "extra" parameter
> 	from HOST_WIDE_INT to poly_int64.
> 	* expr.c (push_block, emit_push_insn): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [035/nnn] poly_int: expand_debug_expr
  2017-10-23 17:14 ` [035/nnn] poly_int: expand_debug_expr Richard Sandiford
@ 2017-12-05 17:08   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:08 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:14 AM, Richard Sandiford wrote:
> This patch makes expand_debug_expr track polynomial memory offsets.
> It simplifies the handling of the case in which the reference is not
> to the first byte of the base, which seemed non-trivial enough to
> make it worth splitting out as a separate patch.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree.h (get_inner_reference): Add a version that returns the
> 	offset and size as poly_int64_pods rather than HOST_WIDE_INTs.
> 	* cfgexpand.c (expand_debug_expr): Track polynomial offsets.  Simply
> 	the case in which bitpos is not associated with the first byte.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [041/nnn] poly_int: reload.c
  2017-10-23 17:18 ` [041/nnn] poly_int: reload.c Richard Sandiford
@ 2017-12-05 17:10   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:10 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:18 AM, Richard Sandiford wrote:
> This patch makes a few small poly_int64 changes to reload.c,
> such as in the "decomposition" structure.  In practice, any
> port with polynomial-sized modes should be using LRA rather
> than reload, but it's easier to convert reload anyway than
> to sprinkle to_constants everywhere.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* reload.h (reload::inc): Change from an int to a poly_int64_pod.
> 	* reload.c (combine_reloads, debug_reload_to_stream): Likewise.
> 	(decomposition): Change start and end from HOST_WIDE_INT
> 	to poly_int64_pod.
> 	(decompose, immune_p): Update accordingly.
> 	(find_inc_amount): Return a poly_int64 rather than an int.
> 	* reload1.c (inc_for_reload): Take the inc_amount as a poly_int64
> 	rather than an int.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [042/nnn] poly_int: reload1.c
  2017-10-23 17:18 ` [042/nnn] poly_int: reload1.c Richard Sandiford
@ 2017-12-05 17:23   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:23 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:18 AM, Richard Sandiford wrote:
> This patch makes a few small poly_int64 changes to reload1.c,
> mostly related to eliminations.  Again, there's no real expectation
> that reload will be used for targets that have polynomial-sized modes,
> but it seemed easier to convert it anyway.
And since all the routines are "free", there's nothing that really
prevents them from being called from elsewhere.  So better to go ahead
and convert 'em.

If someone were to look at a refactoring project, pulling all the static
objects into a class and using that to drive turning the free functions
into methods would help give us better isolation of this code.

One could argue we should do this across the board to cut down on the
amount of globals we continue to access and make it clearer what
routines need those globals vs which are truly free standing functions.


> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* reload1.c (elim_table): Change initial_offset, offset and
> 	previous_offset from HOST_WIDE_INT to poly_int64_pod.
> 	(offsets_at): Change the target array's element type from
> 	HOST_WIDE_INT to poly_int64_pod.
> 	(set_label_offsets, eliminate_regs_1, eliminate_regs_in_insn)
> 	(elimination_costs_in_insn, update_eliminable_offsets)
> 	(verify_initial_elim_offsets, set_offsets_for_label)
> 	(init_eliminable_invariants): Update after above changes.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [052/nnn] poly_int: bit_field_size/offset
  2017-10-23 17:22 ` [052/nnn] poly_int: bit_field_size/offset Richard Sandiford
@ 2017-12-05 17:25   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:25 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:22 AM, Richard Sandiford wrote:
> verify_expr ensured that the size and offset in gimple BIT_FIELD_REFs
> satisfied tree_fits_uhwi_p.  This patch extends that so that they can
> be poly_uint64s, and adds helper routines for accessing them when the
> verify_expr requirements apply.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree.h (bit_field_size, bit_field_offset): New functions.
> 	* hsa-gen.c (gen_hsa_addr): Use them.
> 	* tree-ssa-forwprop.c (simplify_bitfield_ref): Likewise.
> 	(simplify_vector_constructor): Likewise.
> 	* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
> 	* tree-cfg.c (verify_expr): Require the sizes and offsets of a
> 	BIT_FIELD_REF to be poly_uint64s rather than uhwis.
> 	* fold-const.c (fold_ternary_loc): Protect tree_to_uhwi with
> 	tree_fits_uhwi_p.
> 
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [059/nnn] poly_int: tree-ssa-loop-ivopts.c:iv_use
  2017-10-23 17:25 ` [059/nnn] poly_int: tree-ssa-loop-ivopts.c:iv_use Richard Sandiford
@ 2017-12-05 17:26   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:26 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:24 AM, Richard Sandiford wrote:
> This patch makes ivopts handle polynomial address offsets
> when recording potential IV uses.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-ssa-loop-ivopts.c (iv_use::addr_offset): Change from
> 	an unsigned HOST_WIDE_INT to a poly_uint64_pod.
> 	(group_compare_offset): Update accordingly.
> 	(split_small_address_groups_p): Likewise.
> 	(record_use): Take addr_offset as a poly_uint64 rather than
> 	an unsigned HOST_WIDE_INT.
> 	(strip_offset): Return the offset as a poly_uint64 rather than
> 	an unsigned HOST_WIDE_INT.
> 	(record_group_use, split_address_groups): Track polynomial offsets.
> 	(add_iv_candidate_for_use): Likewise.
> 	(addr_offset_valid_p): Take the offset as a poly_int64 rather
> 	than a HOST_WIDE_INT.
> 	(strip_offset_1): Return the offset as a poly_int64 rather than
> 	a HOST_WIDE_INT.
OK.
jeff


^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [060/nnn] poly_int: loop versioning threshold
  2017-10-23 17:25 ` [060/nnn] poly_int: loop versioning threshold Richard Sandiford
@ 2017-12-05 17:31   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:31 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:25 AM, Richard Sandiford wrote:
> This patch splits the loop versioning threshold out from the
> cost model threshold so that the former can become a poly_uint64.
> We still use a single test to enforce both limits where possible.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vectorizer.h (_loop_vec_info): Add a versioning_threshold
> 	field.
> 	(LOOP_VINFO_VERSIONING_THRESHOLD): New macro
> 	(vect_loop_versioning): Take the loop versioning threshold as a
> 	separate parameter.
> 	* tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
> 	* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Initialize
> 	versioning_threshold.
> 	(vect_analyze_loop_2): Compute the loop versioning threshold
> 	whenever loop versioning is needed, and store it in the new
> 	field rather than combining it with the cost model threshold.
> 	(vect_transform_loop): Update call to vect_loop_versioning.
> 	Try to combine the loop versioning and cost thresholds here.
So you dropped the tests for PEELING_FOR_GAPS and PEELING_FOR_NITER in
vect_analyze_loop_2.  Was that intentional?

Otherwise it looks fine.  If the drop was intentional, then OK as-is.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [062/nnn] poly_int: prune_runtime_alias_test_list
  2017-10-23 17:26 ` [062/nnn] poly_int: prune_runtime_alias_test_list Richard Sandiford
@ 2017-12-05 17:33   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:33 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:25 AM, Richard Sandiford wrote:
> This patch makes prune_runtime_alias_test_list take the iteration
> factor as a poly_int and tracks polynomial offsets internally
> as well.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-data-ref.h (prune_runtime_alias_test_list): Take the
> 	factor as a poly_uint64 rather than an unsigned HOST_WIDE_INT.
> 	* tree-data-ref.c (prune_runtime_alias_test_list): Likewise.
> 	Track polynomial offsets.
This is OK.  Note that both Richi and Bin have been pretty active in
this general area.  So adjustments may be needed.


Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [065/nnn] poly_int: vect_nunits_for_cost
  2017-10-23 17:27 ` [065/nnn] poly_int: vect_nunits_for_cost Richard Sandiford
@ 2017-12-05 17:35   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:35 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:27 AM, Richard Sandiford wrote:
> This patch adds a function for getting the number of elements in
> a vector for cost purposes, which is always constant.  It makes
> it possible for a later patch to change GET_MODE_NUNITS and
> TYPE_VECTOR_SUBPARTS to a poly_int.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vectorizer.h (vect_nunits_for_cost): New function.
> 	* tree-vect-loop.c (vect_model_reduction_cost): Use it.
> 	* tree-vect-slp.c (vect_analyze_slp_cost_1): Likewise.
> 	(vect_analyze_slp_cost): Likewise.
> 	* tree-vect-stmts.c (vect_model_store_cost): Likewise.
> 	(vect_model_load_cost): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [066/nnn] poly_int: omp_max_vf
  2017-10-23 17:27 ` [066/nnn] poly_int: omp_max_vf Richard Sandiford
@ 2017-12-05 17:40   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:40 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:27 AM, Richard Sandiford wrote:
> This patch makes omp_max_vf return a polynomial vectorization factor.
> We then need to be able to stash a polynomial value in
> OMP_CLAUSE_SAFELEN_EXPR too:
> 
>    /* If max_vf is non-zero, then we can use only a vectorization factor
>       up to the max_vf we chose.  So stick it into the safelen clause.  */
> 
> For now the cfgloop safelen is still constant though.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* omp-general.h (omp_max_vf): Return a poly_uint64 instead of an int.
> 	* omp-general.c (omp_max_vf): Likewise.
> 	* omp-expand.c (omp_adjust_chunk_size): Update call to omp_max_vf.
> 	(expand_omp_simd): Handle polynomial safelen.
> 	* omp-low.c (omplow_simd_context): Add a default constructor.
> 	(omplow_simd_context::max_vf): Change from int to poly_uint64.
> 	(lower_rec_simd_input_clauses): Update accordingly.
> 	(lower_rec_input_clauses): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [064/nnn] poly_int: SLP max_units
  2017-10-23 17:27 ` [064/nnn] poly_int: SLP max_units Richard Sandiford
@ 2017-12-05 17:41   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:41 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:26 AM, Richard Sandiford wrote:
> This match makes tree-vect-slp.c track the maximum number of vector
> units as a poly_uint64 rather than an unsigned int.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-slp.c (vect_record_max_nunits, vect_build_slp_tree_1)
> 	(vect_build_slp_tree_2, vect_build_slp_tree): Change max_nunits
> 	from an unsigned int * to a poly_uint64_pod *.
> 	(calculate_unrolling_factor): New function.
> 	(vect_analyze_slp_instance): Use it.  Track polynomial max_nunits.
OK.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [071/nnn] poly_int: vectorizable_induction
  2017-10-23 17:29 ` [071/nnn] poly_int: vectorizable_induction Richard Sandiford
@ 2017-12-05 17:44   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:44 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:29 AM, Richard Sandiford wrote:
> This patch makes vectorizable_induction cope with variable-length
> vectors.  For now we punt on SLP inductions, but patchees after
> the main SVE submission add support for those too.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-loop.c (vectorizable_induction): Treat the number
> 	of units as polynomial.  Punt on SLP inductions.  Use an integer
> 	VEC_SERIES_EXPR for variable-length integer reductions.  Use a
> 	cast of such a series for variable-length floating-point
> 	reductions.
OK.
jeff


^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [079/nnn] poly_int: vect_no_alias_p
  2017-10-23 17:32 ` [079/nnn] poly_int: vect_no_alias_p Richard Sandiford
@ 2017-12-05 17:46   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:46 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:32 AM, Richard Sandiford wrote:
> This patch replaces the two-state vect_no_alias_p with a three-state
> vect_compile_time_alias that handles polynomial segment lengths.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-data-refs.c (vect_no_alias_p): Replace with...
> 	(vect_compile_time_alias): ...this new function.  Do the calculation
> 	on poly_ints rather than trees.
> 	(vect_prune_runtime_alias_test_list): Update call accordingly.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [080/nnn] poly_int: tree-vect-generic.c
  2017-10-23 17:32 ` [080/nnn] poly_int: tree-vect-generic.c Richard Sandiford
@ 2017-12-05 17:48   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:48 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:32 AM, Richard Sandiford wrote:
> This patch makes tree-vect-generic.c cope with variable-length vectors.
> Decomposition is only supported for constant-length vectors, since we
> should never generate unsupported variable-length operations.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-generic.c (nunits_for_known_piecewise_op): New function.
> 	(expand_vector_piecewise): Use it instead of TYPE_VECTOR_SUBPARTS.
> 	(expand_vector_addition, add_rshift, expand_vector_divmod): Likewise.
> 	(expand_vector_condition, vector_element): Likewise.
> 	(subparts_gt): New function.
> 	(get_compute_type): Use subparts_gt.
> 	(count_type_subparts): Delete.
> 	(expand_vector_operations_1): Use subparts_gt instead of
> 	count_type_subparts.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [076/nnn] poly_int: vectorizable_conversion
  2017-11-28 18:15     ` Richard Sandiford
@ 2017-12-05 17:49       ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 17:49 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 11/28/2017 11:09 AM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
>> On 10/23/2017 11:30 AM, Richard Sandiford wrote:
>>> This patch makes vectorizable_conversion cope with variable-length
>>> vectors.  We already require the number of elements in one vector
>>> to be a multiple of the number of elements in the other vector,
>>> so the patch uses that to choose between widening and narrowing.
>>>
>>>
>>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>>> 	    Alan Hayward  <alan.hayward@arm.com>
>>> 	    David Sherwood  <david.sherwood@arm.com>
>>>
>>> gcc/
>>> 	* tree-vect-stmts.c (vectorizable_conversion): Treat the number
>>> 	of units as polynomial.  Choose between WIDE and NARROW based
>>> 	on multiple_p.
>> If I'm reding this right, if nunits_in < nunits_out, but the latter is
>> not a multiple of the former, we'll choose WIDEN, which is the opposite
>> of what we'd do before this patch.  Was that intentional?
> 
> That case isn't possible, so we'd assert:
> 
>   if (must_eq (nunits_out, nunits_in))
>     modifier = NONE;
>   else if (multiple_p (nunits_out, nunits_in))
>     modifier = NARROW;
>   else
>     {
>       gcc_checking_assert (multiple_p (nunits_in, nunits_out));
>       modifier = WIDEN;
>     }
> 
> We already implicitly rely on this, since we either widen one full
> vector to N full vectors or narrow N full vectors to one vector.
> 
> Structurally this is enforced by all vectors having the same number of
> bytes (current_vector_size) and the number of vector elements being a
> power of 2 (or in the case of poly_int, a power of 2 times a runtime
> variant, but that's good enough, since the runtime invariant is the same
> in both cases).
OK.  THanks for clarifying.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [024/nnn] poly_int: ira subreg liveness tracking
  2017-11-28 21:10   ` Jeff Law
@ 2017-12-05 21:54     ` Richard Sandiford
  0 siblings, 0 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-12-05 21:54 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 11:09 AM, Richard Sandiford wrote:
>> Normmaly the IRA-reload interface tries to track the liveness of
>> individual bytes of an allocno if the allocno is sometimes written
>> to as a SUBREG.  This isn't possible for variable-sized allocnos,
>> but it doesn't matter because targets with variable-sized registers
>> should use LRA instead.
>> 
>> This patch adds a get_subreg_tracking_sizes function for deciding
>> whether it is possible to model a partial read or write.  Later
>> patches make it return false if anything is variable.
>> 
>> 
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* ira.c (get_subreg_tracking_sizes): New function.
>> 	(init_live_subregs): Take an integer size rather than a register.
>> 	(build_insn_chain): Use get_subreg_tracking_sizes.  Update calls
>> 	to init_live_subregs.
> OK.
>
> Note this is starting to get close to the discussion around CLOBBER_HIGH
> vs using a self set with a low subreg that we're having with Alan on
> another thread in that liveness tracking of subregs of SVE regs could
> potentially use some improvements.
>
> When I quickly looked at the subreg handling in the df infrstructure my
> first thought was that it might need some updating for SVE.  I can't
> immediately call bits for poly_int/SVE in the patches to-date.  Have you
> dug in there at all for the poly_int/SVE work?

Yeah, although the subreg tracking in this patch is specific to reload,
I thought we had something similar for LRA.  I couldn't find anything
though, and the static type checking of poly_ints would have forced
the issue.

There is the DF_WORD_LR code, which tracks the liveness of words in a
double-word pseudo.  We didn't extend that to variable-length registers
for two reasons: (1) if we did need it, we'd want it for pseudos
that map to 3 or 4 registers, not just 2, so that LD[234] and ST[234]
are handled consistently; and (2) it's only used for DCE at the moment,
and it's rare for LD[234]/ST[234]s to be dead code.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [037/nnn] poly_int: get_bit_range
  2017-10-23 17:16 ` [037/nnn] poly_int: get_bit_range Richard Sandiford
@ 2017-12-05 23:19   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 23:19 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:14 AM, Richard Sandiford wrote:
> This patch makes get_bit_range return the range and position as poly_ints.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* expr.h (get_bit_range): Return the bitstart and bitend as
> 	poly_uint64s rather than unsigned HOST_WIDE_INTs.  Return the bitpos
> 	as a poly_int64 rather than a HOST_WIDE_INT.
> 	* expr.c (get_bit_range): Likewise.
> 	(expand_assignment): Update call accordingly.
> 	* fold-const.c (optimize_bit_field_compare): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [048/nnn] poly_int: cfgexpand stack variables
  2017-10-23 17:21 ` [048/nnn] poly_int: cfgexpand stack variables Richard Sandiford
@ 2017-12-05 23:22   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 23:22 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:20 AM, Richard Sandiford wrote:
> This patch changes the type of stack_var::size from HOST_WIDE_INT
> to poly_uint64.  The difference in signedness is because the
> field was set by:
> 
>   v->size = tree_to_uhwi (size);
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* cfgexpand.c (stack_var::size): Change from a HOST_WIDE_INT
> 	to a poly_uint64.
> 	(add_stack_var, stack_var_cmp, partition_stack_vars)
> 	(dump_stack_var_partition): Update accordingly.
> 	(alloc_stack_frame_space): Take the size as a poly_int64 rather
> 	than a HOST_WIDE_INT.
> 	(expand_stack_vars, expand_one_stack_var_1): Handle polynomial sizes.
> 	(defer_stack_allocation, estimated_stack_frame_size): Likewise.
> 	(account_stack_vars, expand_one_var): Likewise.  Return a poly_uint64
> 	rather than a HOST_WIDE_INT.
> 
OK
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [051/nnn] poly_int: emit_group_load/store
  2017-10-23 17:22 ` [051/nnn] poly_int: emit_group_load/store Richard Sandiford
@ 2017-12-05 23:26   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 23:26 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:21 AM, Richard Sandiford wrote:
> This patch changes the sizes passed to emit_group_load and
> emit_group_store from int to poly_int64.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* expr.h (emit_group_load, emit_group_load_into_temps)
> 	(emit_group_store): Take the size as a poly_int64 rather than an int.
> 	* expr.c (emit_group_load_1, emit_group_load): Likewise.
> 	(emit_group_load_into_temp, emit_group_store): Likewise.
> 
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [100/nnn] poly_int: memrefs_conflict_p
  2017-10-23 17:41 ` [100/nnn] poly_int: memrefs_conflict_p Richard Sandiford
@ 2017-12-05 23:29   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 23:29 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:40 AM, Richard Sandiford wrote:
> The xsize and ysize arguments to memrefs_conflict_p are encode such
> that:
> 
> - 0 means the size is unknown
> - >0 means the size is known
> - <0 means that the negative of the size is a worst-case size after
>   alignment
> 
> In other words, the sign effectively encodes a boolean; it isn't
> meant to be taken literally.  With poly_ints these correspond to:
> 
> - known_zero (...)
> - may_gt (..., 0)
> - may_lt (..., 0)
> 
> respectively.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* alias.c (addr_side_effect_eval): Take the size as a poly_int64
> 	rather than an int.  Use plus_constant.
> 	(memrefs_conflict_p): Take the sizes as poly_int64s rather than ints.
> 	Take the offset "c" as a poly_int64 rather than a HOST_WIDE_INT.
> 
Not sure why I was dreading this one and kept putting it off.  It really
wasn't too bad to work through.

OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [086/nnn] poly_int: REGMODE_NATURAL_SIZE
  2017-10-23 17:35 ` [086/nnn] poly_int: REGMODE_NATURAL_SIZE Richard Sandiford
@ 2017-12-05 23:33   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 23:33 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:34 AM, Richard Sandiford wrote:
> This patch makes target-independent code that uses REGMODE_NATURAL_SIZE
> treat it as a poly_int rather than a constant.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* combine.c (can_change_dest_mode): Handle polynomial
> 	REGMODE_NATURAL_SIZE.
> 	* expmed.c (store_bit_field_1): Likewise.
> 	* expr.c (store_constructor): Likewise.
> 	* emit-rtl.c (validate_subreg): Operate on polynomial mode sizes
> 	and polynomial REGMODE_NATURAL_SIZE.
> 	(gen_lowpart_common): Likewise.
> 	* reginfo.c (record_subregs_of_mode): Likewise.
> 	* rtlanal.c (read_modify_subreg_p): Likewise.
> 
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [022/nnn] poly_int: C++ bitfield regions
  2017-10-23 17:09 ` [022/nnn] poly_int: C++ bitfield regions Richard Sandiford
@ 2017-12-05 23:39   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 23:39 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:08 AM, Richard Sandiford wrote:
> This patch changes C++ bitregion_start/end values from constants to
> poly_ints.  Although it's unlikely that the size needs to be polynomial
> in practice, the offset could be with future language extensions.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* expmed.h (store_bit_field): Change bitregion_start and
> 	bitregion_end from unsigned HOST_WIDE_INT to poly_uint64.
> 	* expmed.c (adjust_bit_field_mem_for_reg, strict_volatile_bitfield_p)
> 	(store_bit_field_1, store_integral_bit_field, store_bit_field)
> 	(store_fixed_bit_field, store_split_bit_field): Likewise.
> 	* expr.c (store_constructor_field, store_field): Likewise.
> 	(optimize_bitfield_assignment_op): Likewise.  Make the same change
> 	to bitsize and bitpos.
> 	* machmode.h (bit_field_mode_iterator): Change m_bitregion_start
> 	and m_bitregion_end from HOST_WIDE_INT to poly_int64.  Make the
> 	same change in the constructor arguments.
> 	(get_best_mode): Change bitregion_start and bitregion_end from
> 	unsigned HOST_WIDE_INT to poly_uint64.
> 	* stor-layout.c (bit_field_mode_iterator::bit_field_mode_iterator):
> 	Change bitregion_start and bitregion_end from HOST_WIDE_INT to
> 	poly_int64.
> 	(bit_field_mode_iterator::next_mode): Update for new types
> 	of m_bitregion_start and m_bitregion_end.
> 	(get_best_mode): Change bitregion_start and bitregion_end from
> 	unsigned HOST_WIDE_INT to poly_uint64.
> 
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [020/nnn] poly_int: store_bit_field bitrange
  2017-10-23 17:08 ` [020/nnn] poly_int: store_bit_field bitrange Richard Sandiford
@ 2017-12-05 23:43   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 23:43 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:08 AM, Richard Sandiford wrote:
> This patch changes the bitnum and bitsize arguments to
> store_bit_field from unsigned HOST_WIDE_INTs to poly_uint64s.
> The later part of store_bit_field_1 still needs to operate
> on constant bit positions and sizes, so the patch splits
> it out into a subfunction (store_integral_bit_field).
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* expmed.h (store_bit_field): Take bitsize and bitnum as
> 	poly_uint64s rather than unsigned HOST_WIDE_INTs.
> 	* expmed.c (simple_mem_bitfield_p): Likewise.  Add a parameter
> 	that returns the byte size.
> 	(store_bit_field_1): Take bitsize and bitnum as
> 	poly_uint64s rather than unsigned HOST_WIDE_INTs.  Update call
> 	to simple_mem_bitfield_p.  Split the part that can only handle
> 	constant bitsize and bitnum out into...
> 	(store_integral_bit_field): ...this new function.
> 	(store_bit_field): Take bitsize and bitnum as poly_uint64s rather
> 	than unsigned HOST_WIDE_INTs.
> 	(extract_bit_field_1): Update call to simple_mem_bitfield_p.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [021/nnn] poly_int: extract_bit_field bitrange
  2017-10-23 17:09 ` [021/nnn] poly_int: extract_bit_field bitrange Richard Sandiford
@ 2017-12-05 23:46   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 23:46 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:08 AM, Richard Sandiford wrote:
> Similar to the previous store_bit_field patch, but for extractions
> rather than insertions.  The patch splits out the extraction-as-subreg
> handling into a new function (extract_bit_field_as_subreg), both for
> ease of writing and because a later patch will add another caller.
> 
> The simplify_gen_subreg overload is temporary; it goes away
> in a later patch.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* rtl.h (simplify_gen_subreg): Add a temporary overload that
> 	accepts poly_uint64 offsets.
> 	* expmed.h (extract_bit_field): Take bitsize and bitnum as
> 	poly_uint64s rather than unsigned HOST_WIDE_INTs.
> 	* expmed.c (lowpart_bit_field_p): Likewise.
> 	(extract_bit_field_as_subreg): New function, split out from...
> 	(extract_bit_field_1): ...here.  Take bitsize and bitnum as
> 	poly_uint64s rather than unsigned HOST_WIDE_INTs.  For vector
> 	extractions, check that BITSIZE matches the size of the extracted
> 	value and that BITNUM is an exact multiple of that size.
> 	If all else fails, try forcing the value into memory if
> 	BITNUM is variable, and adjusting the address so that the
> 	offset is constant.  Split the part that can only handle constant
> 	bitsize and bitnum out into...
> 	(extract_integral_bit_field): ...this new function.
> 	(extract_bit_field): Take bitsize and bitnum as poly_uint64s
> 	rather than unsigned HOST_WIDE_INTs.
OK.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [023/nnn] poly_int: store_field & co
  2017-10-23 17:09 ` [023/nnn] poly_int: store_field & co Richard Sandiford
@ 2017-12-05 23:49   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-05 23:49 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:09 AM, Richard Sandiford wrote:
> This patch makes store_field and related routines use poly_ints
> for bit positions and sizes.  It keeps the existing choices
> between signed and unsigned types (there are a mixture of both).
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* expr.c (store_constructor_field): Change bitsize from a
> 	unsigned HOST_WIDE_INT to a poly_uint64 and bitpos from a
> 	HOST_WIDE_INT to a poly_int64.
> 	(store_constructor): Change size from a HOST_WIDE_INT to
> 	a poly_int64.
> 	(store_field): Likewise bitsize and bitpos.

OK
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [031/nnn] poly_int: aff_tree
  2017-10-23 17:13 ` [031/nnn] poly_int: aff_tree Richard Sandiford
@ 2017-12-06  0:04   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  0:04 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:12 AM, Richard Sandiford wrote:
> This patch changes the type of aff_tree::offset from widest_int to
> poly_widest_int and adjusts the function interfaces in the same way.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-affine.h (aff_tree::offset): Change from widest_int
> 	to poly_widest_int.
> 	(wide_int_ext_for_comb): Delete.
> 	(aff_combination_const, aff_comb_cannot_overlap_p): Take the
> 	constants as poly_widest_int rather than widest_int.
> 	(aff_combination_constant_multiple_p): Return the multiplier
> 	as a poly_widest_int.
> 	(aff_combination_zero_p, aff_combination_singleton_var_p): Handle
> 	polynomial offsets.
> 	* tree-affine.c (wide_int_ext_for_comb): Make original widest_int
> 	version static and add an overload for poly_widest_int.
> 	(aff_combination_const, aff_combination_add_cst)
> 	(wide_int_constant_multiple_p, aff_comb_cannot_overlap_p): Take
> 	the constants as poly_widest_int rather than widest_int.
> 	(tree_to_aff_combination): Generalize INTEGER_CST case to
> 	poly_int_tree_p.
> 	(aff_combination_to_tree): Track offsets as poly_widest_ints.
> 	(aff_combination_add_product, aff_combination_mult): Handle
> 	polynomial offsets.
> 	(aff_combination_constant_multiple_p): Return the multiplier
> 	as a poly_widest_int.
> 	* tree-predcom.c (determine_offset): Return the offset as a
> 	poly_widest_int.
> 	(split_data_refs_to_components, suitable_component_p): Update
> 	accordingly.
> 	(valid_initializer_p): Update call to
> 	aff_combination_constant_multiple_p.
> 	* tree-ssa-address.c (addr_to_parts): Handle polynomial offsets.
> 	* tree-ssa-loop-ivopts.c (get_address_cost_ainc): Take the step
> 	as a poly_int64 rather than a HOST_WIDE_INT.
> 	(get_address_cost): Handle polynomial offsets.
> 	(iv_elimination_compare_lt): Likewise.
> 	(rewrite_use_nonlinear_expr): Likewise.
OK.
Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [045/nnn] poly_int: REG_ARGS_SIZE
  2017-10-23 17:19 ` [045/nnn] poly_int: REG_ARGS_SIZE Richard Sandiford
@ 2017-12-06  0:10   ` Jeff Law
  2017-12-22 21:56   ` Andreas Schwab
  1 sibling, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  0:10 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:19 AM, Richard Sandiford wrote:
> This patch adds new utility functions for manipulating REG_ARGS_SIZE
> notes and allows the notes to carry polynomial as well as constant sizes.
> 
> The code was inconsistent about whether INT_MIN or HOST_WIDE_INT_MIN
> should be used to represent an unknown size.  The patch uses
> HOST_WIDE_INT_MIN throughout.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* rtl.h (get_args_size, add_args_size_note): New functions.
> 	(find_args_size_adjust): Return a poly_int64 rather than a
> 	HOST_WIDE_INT.
> 	(fixup_args_size_notes): Likewise.  Make the same change to the
> 	end_args_size parameter.
> 	* rtlanal.c (get_args_size, add_args_size_note): New functions.
> 	* builtins.c (expand_builtin_trap): Use add_args_size_note.
> 	* calls.c (emit_call_1): Likewise.
> 	* explow.c (adjust_stack_1): Likewise.
> 	* cfgcleanup.c (old_insns_match_p): Update use of
> 	find_args_size_adjust.
> 	* combine.c (distribute_notes): Track polynomial arg sizes.
> 	* dwarf2cfi.c (dw_trace_info): Change beg_true_args_size,
> 	end_true_args_size, beg_delay_args_size and end_delay_args_size
> 	from HOST_WIDE_INT to poly_int64.
> 	(add_cfi_args_size): Take the args_size as a poly_int64 rather
> 	than a HOST_WIDE_INT.
> 	(notice_args_size, notice_eh_throw, maybe_record_trace_start)
> 	(maybe_record_trace_start_abnormal, scan_trace, connect_traces): Track
> 	polynomial arg sizes.
> 	* emit-rtl.c (try_split): Use get_args_size.
> 	* recog.c (peep2_attempt): Likewise.
> 	* reload1.c (reload_as_needed): Likewise.
> 	* expr.c (find_args_size_adjust): Return the adjustment as a
> 	poly_int64 rather than a HOST_WIDE_INT.
> 	(fixup_args_size_notes): Change end_args_size from a HOST_WIDE_INT
> 	to a poly_int64 and change the return type in the same way.
> 	(emit_single_push_insn): Track polynomial arg sizes.
> 
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [019/nnn] poly_int: lra frame offsets
  2017-10-23 17:08 ` [019/nnn] poly_int: lra frame offsets Richard Sandiford
@ 2017-12-06  0:16   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  0:16 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:07 AM, Richard Sandiford wrote:
> This patch makes LRA use poly_int64s rather than HOST_WIDE_INTs
> to store a frame offset (including in things like eliminations).
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* lra-int.h (lra_reg): Change offset from int to poly_int64.
> 	(lra_insn_recog_data): Change sp_offset from HOST_WIDE_INT
> 	to poly_int64.
> 	(lra_eliminate_regs_1, eliminate_regs_in_insn): Change
> 	update_sp_offset from a HOST_WIDE_INT to a poly_int64.
> 	(lra_update_reg_val_offset, lra_reg_val_equal_p): Take the
> 	offset as a poly_int64 rather than an int.
> 	* lra-assigns.c (find_hard_regno_for_1): Handle poly_int64 offsets.
> 	(setup_live_pseudos_and_spill_after_risky_transforms): Likewise.
> 	* lra-constraints.c (equiv_address_substitution): Track offsets
> 	as poly_int64s.
> 	(emit_inc): Check poly_int_rtx_p instead of CONST_INT_P.
> 	(curr_insn_transform): Handle the new form of sp_offset.
> 	* lra-eliminations.c (lra_elim_table): Change previous_offset
> 	and offset from HOST_WIDE_INT to poly_int64.
> 	(print_elim_table, update_reg_eliminate): Update accordingly.
> 	(self_elim_offsets): Change from HOST_WIDE_INT to poly_int64_pod.
> 	(get_elimination): Update accordingly.
> 	(form_sum): Check poly_int_rtx_p instead of CONST_INT_P.
> 	(lra_eliminate_regs_1, eliminate_regs_in_insn): Change
> 	update_sp_offset from a HOST_WIDE_INT to a poly_int64.  Handle
> 	poly_int64 offsets generally.
> 	(curr_sp_change): Change from HOST_WIDE_INT to poly_int64.
> 	(mark_not_eliminable, init_elimination): Update accordingly.
> 	(remove_reg_equal_offset_note): Return a bool and pass the new
> 	offset back by pointer as a poly_int64.
> 	* lra-remat.c (change_sp_offset): Take sp_offset as a poly_int64
> 	rather than a HOST_WIDE_INT.
> 	(do_remat): Track offsets poly_int64s.
> 	* lra.c (lra_update_insn_recog_data, setup_sp_offset): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [030/nnn] poly_int: get_addr_unit_base_and_extent
  2017-10-23 17:12 ` [030/nnn] poly_int: get_addr_unit_base_and_extent Richard Sandiford
@ 2017-12-06  0:26   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  0:26 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:12 AM, Richard Sandiford wrote:
> This patch changes the values returned by
> get_addr_unit_base_and_extent from HOST_WIDE_INT to poly_int64.
> 
> maxsize in gimple_fold_builtin_memory_op goes from HOST_WIDE_INT
> to poly_uint64 (rather than poly_int) to match the previous use
> of tree_fits_uhwi_p.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-dfa.h (get_addr_base_and_unit_offset_1): Return the offset
> 	as a poly_int64_pod rather than a HOST_WIDE_INT.
> 	(get_addr_base_and_unit_offset): Likewise.
> 	* tree-dfa.c (get_addr_base_and_unit_offset_1): Likewise.
> 	(get_addr_base_and_unit_offset): Likewise.
> 	* doc/match-and-simplify.texi: Change off from HOST_WIDE_INT
> 	to poly_int64 in example.
> 	* fold-const.c (fold_binary_loc): Update call to
> 	get_addr_base_and_unit_offset.
> 	* gimple-fold.c (gimple_fold_builtin_memory_op): Likewise.
> 	(maybe_canonicalize_mem_ref_addr): Likewise.
> 	(gimple_fold_stmt_to_constant_1): Likewise.
> 	* ipa-prop.c (ipa_modify_call_arguments): Likewise.
> 	* match.pd: Likewise.
> 	* omp-low.c (lower_omp_target): Likewise.
> 	* tree-sra.c (build_ref_for_offset): Likewise.
> 	(build_debug_ref_for_model): Likewise.
> 	* tree-ssa-address.c (maybe_fold_tmr): Likewise.
> 	* tree-ssa-alias.c (ao_ref_init_from_ptr_and_size): Likewise.
> 	* tree-ssa-ccp.c (optimize_memcpy): Likewise.
> 	* tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Likewise.
> 	(constant_pointer_difference): Likewise.
> 	* tree-ssa-loop-niter.c (expand_simple_operations): Likewise.
> 	* tree-ssa-phiopt.c (jump_function_from_stmt): Likewise.
> 	* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
> 	* tree-ssa-sccvn.c (vn_reference_fold_indirect): Likewise.
> 	(vn_reference_maybe_forwprop_address, vn_reference_lookup_3): Likewise.
> 	(set_ssa_val_to): Likewise.
> 	* tree-ssa-strlen.c (get_addr_stridx, addr_stridxptr): Likewise.
> 	* tree.c (build_simple_mem_ref_loc): Likewise.
OK.

Note Martin S. has some code that's ready to go into the tree that
likely will require converting some bits to poly_int and hits some of
the same areas.  Given the tree isn't poly_int aware right now there may
be some coordination between yourself and Martin S. that will need to
occur depending on which bits go in first.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [070/nnn] poly_int: vectorizable_reduction
  2017-11-22 18:11   ` Richard Sandiford
@ 2017-12-06  0:33     ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  0:33 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 11/22/2017 11:09 AM, Richard Sandiford wrote:
> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> This patch makes vectorizable_reduction cope with variable-length vectors.
>> We can handle the simple case of an inner loop reduction for which
>> the target has native support for the epilogue operation.  For now we
>> punt on other cases, but patches after the main SVE submission allow
>> SLP and double reductions too.
> 
> Here's an updated version that applies on top of the recent removal
> of REDUC_*_EXPR.
> 
> Thanks,
> Richard
> 
> 
> 2017-11-22  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree.h (build_index_vector): Declare.
> 	* tree.c (build_index_vector): New function.
> 	* tree-vect-loop.c (get_initial_def_for_reduction): Treat the number
> 	of units as polynomial, forcibly converting it to a constant if
> 	vectorizable_reduction has already enforced the condition.
> 	(get_initial_defs_for_reduction): Likewise.
> 	(vect_create_epilog_for_reduction): Likewise.  Use build_index_vector
> 	to create a {1,2,3,...} vector.
> 	(vectorizable_reduction): Treat the number of units as polynomial.
> 	Choose vectype_in based on the largest scalar element size rather
> 	than the smallest number of units.  Enforce the restrictions
> 	relied on above.
I assume you'll work with Richi to address any conflicts with his patch
to allow the target to specify a preferred mode for final reductions
using shifts or extractions.

OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [027/nnn] poly_int: DWARF CFA offsets
  2017-10-23 17:11 ` [027/nnn] poly_int: DWARF CFA offsets Richard Sandiford
@ 2017-12-06  0:40   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  0:40 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:10 AM, Richard Sandiford wrote:
> This patch makes the DWARF code use poly_int64 rather than
> HOST_WIDE_INT for CFA offsets.  The main changes are:
> 
> - to make reg_save use a DW_CFA_expression representation when
>   the offset isn't constant and
> 
> - to record the CFA information alongside a def_cfa_expression
>   if either offset is polynomial, since it's quite difficult
>   to reconstruct the CFA information otherwise.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* gengtype.c (main): Handle poly_int64_pod.
> 	* dwarf2out.h (dw_cfi_oprnd_cfa_loc): New dw_cfi_oprnd_type.
> 	(dw_cfi_oprnd::dw_cfi_cfa_loc): New field.
> 	(dw_cfa_location::offset, dw_cfa_location::base_offset): Change
> 	from HOST_WIDE_INT to poly_int64_pod.
> 	* dwarf2cfi.c (queued_reg_save::cfa_offset): Likewise.
> 	(copy_cfa): New function.
> 	(lookup_cfa_1): Use the cached dw_cfi_cfa_loc, if it exists.
> 	(cfi_oprnd_equal_p): Handle dw_cfi_oprnd_cfa_loc.
> 	(cfa_equal_p, dwarf2out_frame_debug_adjust_cfa)
> 	(dwarf2out_frame_debug_cfa_offset, dwarf2out_frame_debug_expr)
> 	(initial_return_save): Treat offsets as poly_ints.
> 	(def_cfa_0): Likewise.  Cache the CFA in dw_cfi_cfa_loc if either
> 	offset is nonconstant.
> 	(reg_save): Take the offset as a poly_int64.  Fall back to
> 	DW_CFA_expression for nonconstant offsets.
> 	(queue_reg_save): Take the offset as a poly_int64.
> 	* dwarf2out.c (dw_cfi_oprnd2_desc): Handle DW_CFA_def_cfa_expression.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [056/nnn] poly_int: MEM_REF offsets
  2017-10-23 17:24 ` [056/nnn] poly_int: MEM_REF offsets Richard Sandiford
@ 2017-12-06  0:46   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  0:46 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:23 AM, Richard Sandiford wrote:
> This patch allows MEM_REF offsets to be polynomial, with mem_ref_offset
> now returning a poly_offset_int instead of an offset_int.  The
> non-mechanical changes to callers of mem_ref_offset were handled by
> previous patches.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* fold-const.h (mem_ref_offset): Return a poly_offset_int rather
> 	than an offset_int.
> 	* tree.c (mem_ref_offset): Likewise.
> 	* builtins.c (get_object_alignment_2): Treat MEM_REF offsets as
> 	poly_ints.
> 	* expr.c (get_inner_reference, expand_expr_real_1): Likewise.
> 	* gimple-fold.c (get_base_constructor): Likewise.
> 	* gimple-ssa-strength-reduction.c (restructure_reference): Likewise.
> 	* ipa-polymorphic-call.c
> 	(ipa_polymorphic_call_context::ipa_polymorphic_call_context): Likewise.
> 	* ipa-prop.c (compute_complex_assign_jump_func, get_ancestor_addr_info)
> 	(ipa_get_adjustment_candidate): Likewise.
> 	* match.pd: Likewise.
> 	* tree-data-ref.c (dr_analyze_innermost): Likewise.
> 	* tree-dfa.c (get_addr_base_and_unit_offset_1): Likewise.
> 	* tree-eh.c (tree_could_trap_p): Likewise.
> 	* tree-object-size.c (addr_object_size): Likewise.
> 	* tree-ssa-address.c (copy_ref_info): Likewise.
> 	* tree-ssa-alias.c (indirect_ref_may_alias_decl_p): Likewise.
> 	(indirect_refs_may_alias_p): Likewise.
> 	* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
> 	* tree-ssa.c (maybe_rewrite_mem_ref_base): Likewise.
> 	(non_rewritable_mem_ref_base): Likewise.
> 	* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
> 	* tree-vrp.c (search_for_addr_array): Likewise.
> 	* varasm.c (decode_addr_const): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [073/nnn] poly_int: vectorizable_load/store
  2017-10-23 17:30 ` [073/nnn] poly_int: vectorizable_load/store Richard Sandiford
@ 2017-12-06  0:51   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  0:51 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:29 AM, Richard Sandiford wrote:
> This patch makes vectorizable_load and vectorizable_store cope with
> variable-length vectors.  The reverse and permute cases will be
> excluded by the code that checks the permutation mask (although a
> patch after the main SVE submission adds support for the reversed
> case).  Here we also need to exclude VMAT_ELEMENTWISE and
> VMAT_STRIDED_SLP, which split the operation up into a constant
> number of constant-sized operations.  We also don't try to extend
> the current widening gather/scatter support to variable-length
> vectors, since SVE uses a different approach.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vect-stmts.c (get_load_store_type): Treat the number of
> 	units as polynomial.  Reject VMAT_ELEMENTWISE and VMAT_STRIDED_SLP
> 	for variable-length vectors.
> 	(vectorizable_mask_load_store): Treat the number of units as
> 	polynomial, asserting that it is constant if the condition has
> 	already been enforced.
> 	(vectorizable_store, vectorizable_load): Likewise.

OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [068/nnn] poly_int: current_vector_size and TARGET_AUTOVECTORIZE_VECTOR_SIZES
  2017-10-23 17:28 ` [068/nnn] poly_int: current_vector_size and TARGET_AUTOVECTORIZE_VECTOR_SIZES Richard Sandiford
@ 2017-12-06  1:52   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  1:52 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:28 AM, Richard Sandiford wrote:
> This patch changes the type of current_vector_size to poly_uint64.
> It also changes TARGET_AUTOVECTORIZE_VECTOR_SIZES so that it fills
> in a vector of possible sizes (as poly_uint64s) instead of returning
> a bitmask.  The documentation claimed that the hook didn't need to
> include the default vector size (returned by preferred_simd_mode),
> but that wasn't consistent with the omp-low.c usage.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* target.h (vector_sizes, auto_vector_sizes): New typedefs.
> 	* target.def (autovectorize_vector_sizes): Return the vector sizes
> 	by pointer, using vector_sizes rather than a bitmask.
> 	* targhooks.h (default_autovectorize_vector_sizes): Update accordingly.
> 	* targhooks.c (default_autovectorize_vector_sizes): Likewise.
> 	* config/aarch64/aarch64.c (aarch64_autovectorize_vector_sizes):
> 	Likewise.
> 	* config/arc/arc.c (arc_autovectorize_vector_sizes): Likewise.
> 	* config/arm/arm.c (arm_autovectorize_vector_sizes): Likewise.
> 	* config/i386/i386.c (ix86_autovectorize_vector_sizes): Likewise.
> 	* config/mips/mips.c (mips_autovectorize_vector_sizes): Likewise.
> 	* omp-general.c (omp_max_vf): Likewise.
> 	* omp-low.c (omp_clause_aligned_alignment): Likewise.
> 	* optabs-query.c (can_vec_mask_load_store_p): Likewise.
> 	* tree-vect-loop.c (vect_analyze_loop): Likewise.
> 	* tree-vect-slp.c (vect_slp_bb): Likewise.
> 	* doc/tm.texi: Regenerate.
> 	* tree-vectorizer.h (current_vector_size): Change from an unsigned int
> 	to a poly_uint64.
> 	* tree-vect-stmts.c (get_vectype_for_scalar_type_and_size): Take
> 	the vector size as a poly_uint64 rather than an unsigned int.
> 	(current_vector_size): Change from an unsigned int to a poly_uint64.
> 	(get_vectype_for_scalar_type): Update accordingly.
> 	* tree.h (build_truth_vector_type): Take the size and number of
> 	units as a poly_uint64 rather than an unsigned int.
> 	(build_vector_type): Add a temporary overload that takes
> 	the number of units as a poly_uint64 rather than an unsigned int.
> 	* tree.c (make_vector_type): Likewise.
> 	(build_truth_vector_type): Take the number of units as a poly_uint64
> 	rather than an unsigned int.

OK.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [101/nnn] poly_int: GET_MODE_NUNITS
  2017-10-23 17:41 ` [101/nnn] poly_int: GET_MODE_NUNITS Richard Sandiford
@ 2017-12-06  2:05   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  2:05 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:41 AM, Richard Sandiford wrote:
> This patch changes GET_MODE_NUNITS from unsigned char
> to poly_uint16, although it remains a macro when compiling
> target code with NUM_POLY_INT_COEFFS == 1.
> 
> If the number of units isn't known at compile time, we use:
> 
>   (const:M (vec_duplicate:M X))
> 
> to represent a vector in which every element is equal to X.  The code
> ensures that there is only a single instance of each constant, so that
> pointer equality is enough.  (This is a requirement for the constants
> that go in const_tiny_rtx, but we might as well do it for all constants.)
> 
> Similarly we use:
> 
>   (const:M (vec_series:M A B))
> 
> for a linear series starting at A and having step B.
> 
> The to_constant call in make_vector_type goes away in a later patch.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* machmode.h (mode_nunits): Change from unsigned char to
> 	poly_uint16_pod.
> 	(ONLY_FIXED_SIZE_MODES): New macro.
> 	(pod_mode::measurement_type, scalar_int_mode::measurement_type)
> 	(scalar_float_mode::measurement_type, scalar_mode::measurement_type)
> 	(complex_mode::measurement_type, fixed_size_mode::measurement_type):
> 	New typedefs.
> 	(mode_to_nunits): Return a poly_uint16 rather than an unsigned short.
> 	(GET_MODE_NUNITS): Return a constant if ONLY_FIXED_SIZE_MODES,
> 	or if measurement_type is not polynomial.
> 	* genmodes.c (ZERO_COEFFS): New macro.
> 	(emit_mode_nunits_inline): Make mode_nunits_inline return a
> 	poly_uint16.
> 	(emit_mode_nunits): Change the type of mode_nunits to poly_uint16_pod.
> 	Use ZERO_COEFFS when emitting initializers.
> 	* data-streamer.h (bp_pack_poly_value): New function.
> 	(bp_unpack_poly_value): Likewise.
> 	* lto-streamer-in.c (lto_input_mode_table): Use bp_unpack_poly_value
> 	for GET_MODE_NUNITS.
> 	* lto-streamer-out.c (lto_write_mode_table): Use bp_pack_poly_value
> 	for GET_MODE_NUNITS.
> 	* tree.c (make_vector_type): Remove temporary shim and make
> 	the real function take the number of units as a poly_uint64
> 	rather than an int.
> 	(build_vector_type_for_mode): Handle polynomial nunits.
> 	* emit-rtl.c (gen_const_vec_duplicate_1): Likewise.
> 	(gen_const_vec_series, gen_rtx_CONST_VECTOR): Likewise.
> 	* genrecog.c (validate_pattern): Likewise.
> 	* optabs-query.c (can_mult_highpart_p): Likewise.
> 	* optabs-tree.c (expand_vec_cond_expr_p): Likewise.
> 	* optabs.c (expand_vector_broadcast, expand_binop_directly)
> 	(shift_amt_for_vec_perm_mask, expand_vec_perm, expand_vec_cond_expr)
> 	(expand_mult_highpart): Likewise.
> 	* rtlanal.c (subreg_get_info): Likewise.
> 	* simplify-rtx.c (simplify_unary_operation_1): Likewise.
> 	(simplify_const_unary_operation, simplify_binary_operation_1)
> 	(simplify_const_binary_operation, simplify_ternary_operation)
> 	(test_vector_ops_duplicate, test_vector_ops): Likewise.
> 	* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
> 	(vect_grouped_load_supported): Likewise.
> 	* tree-vect-generic.c (type_for_widest_vector_mode): Likewise.
> 	* tree-vect-loop.c (have_whole_vector_shift): Likewise.
> 
> gcc/ada/
> 	* gcc-interface/misc.c (enumerate_modes): Handle polynomial
> 	GET_MODE_NUNITS.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS
  2017-10-23 17:42 ` [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS Richard Sandiford
  2017-10-24  9:06   ` Richard Biener
@ 2017-12-06  2:31   ` Jeff Law
  1 sibling, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  2:31 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:41 AM, Richard Sandiford wrote:
> This patch changes TYPE_VECTOR_SUBPARTS to a poly_uint64.  The value is
> encoded in the 10-bit precision field and was previously always stored
> as a simple log2 value.  The challenge was to use this 10 bits to
> encode the number of elements in variable-length vectors, so that
> we didn't need to increase the size of the tree.
> 
> In practice the number of vector elements should always have the form
> N + N * X (where X is the runtime value), and as for constant-length
> vectors, N must be a power of 2 (even though X itself might not be).
> The patch therefore uses the low bit to select between constant-length
> and variable-length and uses the upper 9 bits to encode log2(N).
> Targets without variable-length vectors continue to use the old scheme.
> 
> A new valid_vector_subparts_p function tests whether a given number
> of elements can be encoded.  This is false for the vector modes that
> represent an LD3 or ST3 vector triple (which we want to treat as arrays
> of vectors rather than single vectors).
> 
> Most of the patch is mechanical; previous patches handled the changes
> that weren't entirely straightforward.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree.h (TYPE_VECTOR_SUBPARTS): Turn into a function and handle
> 	polynomial numbers of units.
> 	(SET_TYPE_VECTOR_SUBPARTS): Likewise.
> 	(valid_vector_subparts_p): New function.
> 	(build_vector_type): Remove temporary shim and take the number
> 	of units as a poly_uint64 rather than an int.
> 	(build_opaque_vector_type): Take the number of units as a
> 	poly_uint64 rather than an int.
> 	* tree.c (build_vector): Handle polynomial TYPE_VECTOR_SUBPARTS.
> 	(build_vector_from_ctor, type_hash_canon_hash): Likewise.
> 	(type_cache_hasher::equal, uniform_vector_p): Likewise.
> 	(vector_type_mode): Likewise.
> 	(build_vector_from_val): If the number of units isn't constant,
> 	use build_vec_duplicate_cst for constant operands and
> 	VEC_DUPLICATE_EXPR otherwise.
> 	(make_vector_type): Remove temporary is_constant ().
> 	(build_vector_type, build_opaque_vector_type): Take the number of
> 	units as a poly_uint64 rather than an int.
> 	* cfgexpand.c (expand_debug_expr): Handle polynomial
> 	TYPE_VECTOR_SUBPARTS.
> 	* expr.c (count_type_elements, store_constructor): Likewise.
> 	* fold-const.c (const_binop, const_unop, fold_convert_const)
> 	(operand_equal_p, fold_view_convert_expr, fold_vec_perm)
> 	(fold_ternary_loc, fold_relational_const): Likewise.
> 	(native_interpret_vector): Likewise.  Change the size from an
> 	int to an unsigned int.
> 	* gimple-fold.c (gimple_fold_stmt_to_constant_1): Handle polynomial
> 	TYPE_VECTOR_SUBPARTS.
> 	(gimple_fold_indirect_ref, gimple_build_vector): Likewise.
> 	(gimple_build_vector_from_val): Use VEC_DUPLICATE_EXPR when
> 	duplicating a non-constant operand into a variable-length vector.
> 	* match.pd: Handle polynomial TYPE_VECTOR_SUBPARTS.
> 	* omp-simd-clone.c (simd_clone_subparts): Likewise.
> 	* print-tree.c (print_node): Likewise.
> 	* stor-layout.c (layout_type): Likewise.
> 	* targhooks.c (default_builtin_vectorization_cost): Likewise.
> 	* tree-cfg.c (verify_gimple_comparison): Likewise.
> 	(verify_gimple_assign_binary): Likewise.
> 	(verify_gimple_assign_ternary): Likewise.
> 	(verify_gimple_assign_single): Likewise.
> 	* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
> 	* tree-vect-data-refs.c (vect_permute_store_chain): Likewise.
> 	(vect_grouped_load_supported, vect_permute_load_chain): Likewise.
> 	(vect_shift_permute_load_chain): Likewise.
> 	* tree-vect-generic.c (nunits_for_known_piecewise_op): Likewise.
> 	(expand_vector_condition, optimize_vector_constructor): Likewise.
> 	(lower_vec_perm, get_compute_type): Likewise.
> 	* tree-vect-loop.c (vect_determine_vectorization_factor): Likewise.
> 	(get_initial_defs_for_reduction, vect_transform_loop): Likewise.
> 	* tree-vect-patterns.c (vect_recog_bool_pattern): Likewise.
> 	(vect_recog_mask_conversion_pattern): Likewise.
> 	* tree-vect-slp.c (vect_supported_load_permutation_p): Likewise.
> 	(vect_get_constant_vectors, vect_transform_slp_perm_load): Likewise.
> 	* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
> 	(get_group_load_store_type, vectorizable_mask_load_store): Likewise.
> 	(vectorizable_bswap, simd_clone_subparts, vectorizable_assignment)
> 	(vectorizable_shift, vectorizable_operation, vectorizable_store)
> 	(vect_gen_perm_mask_any, vectorizable_load, vect_is_simple_cond)
> 	(vectorizable_comparison, supportable_widening_operation): Likewise.
> 	(supportable_narrowing_operation): Likewise.
> 
> gcc/ada/
> 	* gcc-interface/utils.c (gnat_types_compatible_p): Handle
> 	polynomial TYPE_VECTOR_SUBPARTS.
> 
> gcc/brig/
> 	* brigfrontend/brig-to-generic.cc (get_unsigned_int_type): Handle
> 	polynomial TYPE_VECTOR_SUBPARTS.
> 	* brigfrontend/brig-util.h (gccbrig_type_vector_subparts): Likewise.
> 
> gcc/c-family/
> 	* c-common.c (vector_types_convertible_p, c_build_vec_perm_expr)
> 	(convert_vector_to_array_for_subscript): Handle polynomial
> 	TYPE_VECTOR_SUBPARTS.
> 	(c_common_type_for_mode): Check valid_vector_subparts_p.
> 
> gcc/c/
> 	* c-typeck.c (comptypes_internal, build_binary_op): Handle polynomial
> 	TYPE_VECTOR_SUBPARTS.
> 
> gcc/cp/
> 	* call.c (build_conditional_expr_1): Handle polynomial
> 	TYPE_VECTOR_SUBPARTS.
> 	* constexpr.c (cxx_fold_indirect_ref): Likewise.
> 	* decl.c (cp_finish_decomp): Likewise.
> 	* mangle.c (write_type): Likewise.
> 	* typeck.c (structural_comptypes): Likewise.
> 	(cp_build_binary_op): Likewise.
> 	* typeck2.c (process_init_constructor_array): Likewise.
> 
> gcc/fortran/
> 	* trans-types.c (gfc_type_for_mode): Check valid_vector_subparts_p.
> 
> gcc/lto/
> 	* lto-lang.c (lto_type_for_mode): Check valid_vector_subparts_p.
> 	* lto.c (hash_canonical_type): Handle polynomial TYPE_VECTOR_SUBPARTS.
> 
> gcc/go/
> 	* go-lang.c (go_langhook_type_for_mode): Check valid_vector_subparts_p.
My recollection is that the encoding was going to change on this one,
but that shouldn't affect the bulk of this patch.

OK.  I'll trust you'll adjust the encoding per the discussion with Richi.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [063/nnn] poly_int: vectoriser vf and uf
  2017-10-23 17:26 ` [063/nnn] poly_int: vectoriser vf and uf Richard Sandiford
@ 2017-12-06  2:46   ` Jeff Law
  2018-01-03 21:23   ` [PATCH] Fix gcc.dg/vect-opt-info-1.c testcase Jakub Jelinek
  1 sibling, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  2:46 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:26 AM, Richard Sandiford wrote:
> This patch changes the type of the vectorisation factor and SLP
> unrolling factor to poly_uint64.  This in turn required some knock-on
> changes in signedness elsewhere.
> 
> Cost decisions are generally based on estimated_poly_value,
> which for VF is wrapped up as vect_vf_for_cost.
> 
> The patch doesn't on its own enable variable-length vectorisation.
> It just makes the minimum changes necessary for the code to build
> with the new VF and UF types.  Later patches also make the
> vectoriser cope with variable TYPE_VECTOR_SUBPARTS and variable
> GET_MODE_NUNITS, at which point the code really does handle
> variable-length vectors.
> 
> The patch also changes MAX_VECTORIZATION_FACTOR to INT_MAX,
> to avoid hard-coding a particular architectural limit.
> 
> The patch includes a new test because a development version of the patch
> accidentally used file print routines instead of dump_*, which would
> fail with -fopt-info.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-vectorizer.h (_slp_instance::unrolling_factor): Change
> 	from an unsigned int to a poly_uint64.
> 	(_loop_vec_info::slp_unrolling_factor): Likewise.
> 	(_loop_vec_info::vectorization_factor): Change from an int
> 	to a poly_uint64.
> 	(MAX_VECTORIZATION_FACTOR): Bump from 64 to INT_MAX.
> 	(vect_get_num_vectors): New function.
> 	(vect_update_max_nunits, vect_vf_for_cost): Likewise.
> 	(vect_get_num_copies): Use vect_get_num_vectors.
> 	(vect_analyze_data_ref_dependences): Change max_vf from an int *
> 	to an unsigned int *.
> 	(vect_analyze_data_refs): Change min_vf from an int * to a
> 	poly_uint64 *.
> 	(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
> 	than an unsigned HOST_WIDE_INT.
> 	* tree-vect-data-refs.c (vect_analyze_possibly_independent_ddr)
> 	(vect_analyze_data_ref_dependence): Change max_vf from an int *
> 	to an unsigned int *.
> 	(vect_analyze_data_ref_dependences): Likewise.
> 	(vect_compute_data_ref_alignment): Handle polynomial vf.
> 	(vect_enhance_data_refs_alignment): Likewise.
> 	(vect_prune_runtime_alias_test_list): Likewise.
> 	(vect_shift_permute_load_chain): Likewise.
> 	(vect_supportable_dr_alignment): Likewise.
> 	(dependence_distance_ge_vf): Take the vectorization factor as a
> 	poly_uint64 rather than an unsigned HOST_WIDE_INT.
> 	(vect_analyze_data_refs): Change min_vf from an int * to a
> 	poly_uint64 *.
> 	* tree-vect-loop-manip.c (vect_gen_scalar_loop_niters): Take
> 	vfm1 as a poly_uint64 rather than an int.  Make the same change
> 	for the returned bound_scalar.
> 	(vect_gen_vector_loop_niters): Handle polynomial vf.
> 	(vect_do_peeling): Likewise.  Update call to
> 	vect_gen_scalar_loop_niters and handle polynomial bound_scalars.
> 	(vect_gen_vector_loop_niters_mult_vf): Assert that the vf must
> 	be constant.
> 	* tree-vect-loop.c (vect_determine_vectorization_factor)
> 	(vect_update_vf_for_slp, vect_analyze_loop_2): Handle polynomial vf.
> 	(vect_get_known_peeling_cost): Likewise.
> 	(vect_estimate_min_profitable_iters, vectorizable_reduction): Likewise.
> 	(vect_worthwhile_without_simd_p, vectorizable_induction): Likewise.
> 	(vect_transform_loop): Likewise.  Use the lowest possible VF when
> 	updating the upper bounds of the loop.
> 	(vect_min_worthwhile_factor): Make static.  Return an unsigned int
> 	rather than an int.
> 	* tree-vect-slp.c (vect_attempt_slp_rearrange_stmts): Cope with
> 	polynomial unroll factors.
> 	(vect_analyze_slp_cost_1, vect_analyze_slp_instance): Likewise.
> 	(vect_make_slp_decision): Likewise.
> 	(vect_supported_load_permutation_p): Likewise, and polynomial
> 	vf too.
> 	(vect_analyze_slp_cost): Handle polynomial vf.
> 	(vect_slp_analyze_node_operations): Likewise.
> 	(vect_slp_analyze_bb_1): Likewise.
> 	(vect_transform_slp_perm_load): Take the vf as a poly_uint64 rather
> 	than an unsigned HOST_WIDE_INT.
> 	* tree-vect-stmts.c (vectorizable_simd_clone_call, vectorizable_store)
> 	(vectorizable_load): Handle polynomial vf.
> 	* tree-vectorizer.c (simduid_to_vf::vf): Change from an int to
> 	a poly_uint64.
> 	(adjust_simduid_builtins, shrink_simd_arrays): Update accordingly.
> 
> gcc/testsuite/
> 	* gcc.dg/vect-opt-info-1.c: New test.
> 

OK.

Phew.  These are getting bigger...

As I go through this I find myself wondering if as a project we would be
better off moving to a different review model.  A whole lot of this
stuff is pretty straightforward once the basic design is agreed upon --
at which point review isn't adding much.

So I find myself wondering if we ought to (in the future for large work
like this) agree on the overall design, then let the implementer run
with the mechanical stuff and explicitly ask for acks on specific things
that depart from the mechanical changes.   It'd really help something
like this work move forward -- similarly if we want to do something like
introducing wrapping classes like I recently did with all the VRP stuff.


Anyway, as a long term contributor I'd be interested in hearing your
thoughts.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [043/nnn] poly_int: frame allocations
  2017-10-23 17:19 ` [043/nnn] poly_int: frame allocations Richard Sandiford
@ 2017-12-06  3:15   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06  3:15 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:18 AM, Richard Sandiford wrote:
> This patch converts the frame allocation code (mostly in function.c)
> to use poly_int64 rather than HOST_WIDE_INT for frame offsets and
> sizes.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* function.h (frame_space): Change start and length from HOST_WIDE_INT
> 	to poly_int64.
> 	(get_frame_size): Return the size as a poly_int64 rather than a
> 	HOST_WIDE_INT.
> 	(frame_offset_overflow): Take the offset as a poly_int64 rather
> 	than a HOST_WIDE_INT.
> 	(assign_stack_local_1, assign_stack_local, assign_stack_temp_for_type)
> 	(assign_stack_temp): Likewise for the size.
> 	* function.c (get_frame_size): Return a poly_int64 rather than
> 	a HOST_WIDE_INT.
> 	(frame_offset_overflow): Take the offset as a poly_int64 rather
> 	than a HOST_WIDE_INT.
> 	(try_fit_stack_local): Take the start, length and size as poly_int64s
> 	rather than HOST_WIDE_INTs.  Return the offset as a poly_int64_pod
> 	rather than a HOST_WIDE_INT.
> 	(add_frame_space): Take the start and end as poly_int64s rather than
> 	HOST_WIDE_INTs.
> 	(assign_stack_local_1, assign_stack_local, assign_stack_temp_for_type)
> 	(assign_stack_temp): Likewise for the size.
> 	(temp_slot): Change size, base_offset and full_size from HOST_WIDE_INT
> 	to poly_int64.
> 	(find_temp_slot_from_address): Handle polynomial offsets.
> 	(combine_temp_slots): Likewise.
> 	* emit-rtl.h (rtl_data::x_frame_offset): Change from HOST_WIDE_INT
> 	to poly_int64.
> 	* cfgexpand.c (alloc_stack_frame_space): Return the offset as a
> 	poly_int64 rather than a HOST_WIDE_INT.
> 	(expand_one_stack_var_at): Take the offset as a poly_int64 rather
> 	than a HOST_WIDE_INT.
> 	(expand_stack_vars, expand_one_stack_var_1, expand_used_vars): Handle
> 	polynomial frame offsets.
> 	* config/m32r/m32r-protos.h (m32r_compute_frame_size): Take the size
> 	as a poly_int64 rather than an int.
> 	* config/m32r/m32r.c (m32r_compute_frame_size): Likewise.
> 	* config/v850/v850-protos.h (compute_frame_size): Likewise.
> 	* config/v850/v850.c (compute_frame_size): Likewise.
> 	* config/xtensa/xtensa-protos.h (compute_frame_size): Likewise.
> 	* config/xtensa/xtensa.c (compute_frame_size): Likewise.
> 	* config/pa/pa-protos.h (pa_compute_frame_size): Likewise.
> 	* config/pa/pa.c (pa_compute_frame_size): Likewise.
> 	* explow.h (get_dynamic_stack_base): Take the offset as a poly_int64
> 	rather than a HOST_WIDE_INT.
> 	* explow.c (get_dynamic_stack_base): Likewise.
> 	* final.c (final_start_function): Use the constant lower bound
> 	of the frame size for -Wframe-larger-than.
> 	* ira.c (do_reload): Adjust for new get_frame_size return type.
> 	* lra.c (lra): Likewise.
> 	* reload1.c (reload): Likewise.
> 	* config/avr/avr.c (avr_asm_function_end_prologue): Likewise.
> 	* config/pa/pa.h (EXIT_IGNORE_STACK): Likewise.
> 	* rtlanal.c (get_initial_register_offset): Return the offset as
> 	a poly_int64 rather than a HOST_WIDE_INT.
> 

OK
Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [040/nnn] poly_int: get_inner_reference & co.
  2017-10-23 17:18 ` [040/nnn] poly_int: get_inner_reference & co Richard Sandiford
@ 2017-12-06 17:26   ` Jeff Law
  2018-12-21 11:17   ` Thomas Schwinge
  1 sibling, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06 17:26 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:17 AM, Richard Sandiford wrote:
> This patch makes get_inner_reference and ptr_difference_const return the
> bit size and bit position as poly_int64s rather than HOST_WIDE_INTS.
> The non-mechanical changes were handled by previous patches.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree.h (get_inner_reference): Return the bitsize and bitpos
> 	as poly_int64_pods rather than HOST_WIDE_INT.
> 	* fold-const.h (ptr_difference_const): Return the pointer difference
> 	as a poly_int64_pod rather than a HOST_WIDE_INT.
> 	* expr.c (get_inner_reference): Return the bitsize and bitpos
> 	as poly_int64_pods rather than HOST_WIDE_INT.
> 	(expand_expr_addr_expr_1, expand_expr_real_1): Track polynomial
> 	offsets and sizes.
> 	* fold-const.c (make_bit_field_ref): Take the bitpos as a poly_int64
> 	rather than a HOST_WIDE_INT.  Update call to get_inner_reference.
> 	(optimize_bit_field_compare): Update call to get_inner_reference.
> 	(decode_field_reference): Likewise.
> 	(fold_unary_loc): Track polynomial offsets and sizes.
> 	(split_address_to_core_and_offset): Return the bitpos as a
> 	poly_int64_pod rather than a HOST_WIDE_INT.
> 	(ptr_difference_const): Likewise for the pointer difference.
> 	* asan.c (instrument_derefs): Track polynomial offsets and sizes.
> 	* config/mips/mips.c (r10k_safe_mem_expr_p): Likewise.
> 	* dbxout.c (dbxout_expand_expr): Likewise.
> 	* dwarf2out.c (loc_list_for_address_of_addr_expr_of_indirect_ref)
> 	(loc_list_from_tree_1, fortran_common): Likewise.
> 	* gimple-laddress.c (pass_laddress::execute): Likewise.
> 	* gimplify.c (gimplify_scan_omp_clauses): Likewise.
> 	* simplify-rtx.c (delegitimize_mem_from_attrs): Likewise.
> 	* tree-affine.c (tree_to_aff_combination): Likewise.
> 	(get_inner_reference_aff): Likewise.
> 	* tree-data-ref.c (split_constant_offset_1): Likewise.
> 	(dr_analyze_innermost): Likewise.
> 	* tree-scalar-evolution.c (interpret_rhs_expr): Likewise.
> 	* tree-sra.c (ipa_sra_check_caller): Likewise.
> 	* tree-ssa-math-opts.c (find_bswap_or_nop_load): Likewise.
> 	* tree-vect-data-refs.c (vect_check_gather_scatter): Likewise.
> 	* ubsan.c (maybe_instrument_pointer_overflow): Likewise.
> 	(instrument_bool_enum_load, instrument_object_size): Likewise.
> 	* gimple-ssa-strength-reduction.c (slsr_process_ref): Update call
> 	to get_inner_reference.
> 	* hsa-gen.c (gen_hsa_addr): Likewise.
> 	* sanopt.c (maybe_optimize_ubsan_ptr_ifn): Likewise.
> 	* tsan.c (instrument_expr): Likewise.
> 	* match.pd: Update call to ptr_difference_const.
> 
> gcc/ada/
> 	* gcc-interface/trans.c (Attribute_to_gnu): Track polynomial
> 	offsets and sizes.
> 	* gcc-interface/utils2.c (build_unary_op): Likewise.
> 
> gcc/cp/
> 	* constexpr.c (check_automatic_or_tls): Track polynomial
> 	offsets and sizes.
> 
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [018/nnn] poly_int: MEM_OFFSET and MEM_SIZE
  2017-10-23 17:08 ` [018/nnn] poly_int: MEM_OFFSET and MEM_SIZE Richard Sandiford
@ 2017-12-06 18:27   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06 18:27 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:07 AM, Richard Sandiford wrote:
> This patch changes the MEM_OFFSET and MEM_SIZE memory attributes
> from HOST_WIDE_INT to poly_int64.  Most of it is mechanical,
> but there is one nonbovious change in widen_memory_access.
> Previously the main while loop broke with:
> 
>       /* Similarly for the decl.  */
>       else if (DECL_P (attrs.expr)
>                && DECL_SIZE_UNIT (attrs.expr)
>                && TREE_CODE (DECL_SIZE_UNIT (attrs.expr)) == INTEGER_CST
>                && compare_tree_int (DECL_SIZE_UNIT (attrs.expr), size) >= 0
>                && (! attrs.offset_known_p || attrs.offset >= 0))
>         break;
> 
> but it seemed wrong to optimistically assume the best case
> when the offset isn't known (and thus might be negative).
> As it happens, the "! attrs.offset_known_p" condition was
> always false, because we'd already nullified attrs.expr in
> that case:
> 
>   /* If we don't know what offset we were at within the expression, then
>      we can't know if we've overstepped the bounds.  */
>   if (! attrs.offset_known_p)
>     attrs.expr = NULL_TREE;
> 
> The patch therefore drops "! attrs.offset_known_p ||" when
> converting the offset check to the may/must interface.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* rtl.h (mem_attrs): Add a default constructor.  Change size and
> 	offset from HOST_WIDE_INT to poly_int64.
> 	* emit-rtl.h (set_mem_offset, set_mem_size, adjust_address_1)
> 	(adjust_automodify_address_1, set_mem_attributes_minus_bitpos)
> 	(widen_memory_access): Take the sizes and offsets as poly_int64s
> 	rather than HOST_WIDE_INTs.
> 	* alias.c (ao_ref_from_mem): Handle the new form of MEM_OFFSET.
> 	(offset_overlap_p): Take poly_int64s rather than HOST_WIDE_INTs
> 	and ints.
> 	(adjust_offset_for_component_ref): Change the offset from a
> 	HOST_WIDE_INT to a poly_int64.
> 	(nonoverlapping_memrefs_p): Track polynomial offsets and sizes.
> 	* cfgcleanup.c (merge_memattrs): Update after mem_attrs changes.
> 	* dce.c (find_call_stack_args): Likewise.
> 	* dse.c (record_store): Likewise.
> 	* dwarf2out.c (tls_mem_loc_descriptor, dw_sra_loc_expr): Likewise.
> 	* print-rtl.c (rtx_writer::print_rtx): Likewise.
> 	* read-rtl-function.c (test_loading_mem): Likewise.
> 	* rtlanal.c (may_trap_p_1): Likewise.
> 	* simplify-rtx.c (delegitimize_mem_from_attrs): Likewise.
> 	* var-tracking.c (int_mem_offset, track_expr_p): Likewise.
> 	* emit-rtl.c (mem_attrs_eq_p, get_mem_align_offset): Likewise.
> 	(mem_attrs::mem_attrs): New function.
> 	(set_mem_attributes_minus_bitpos): Change bitpos from a
> 	HOST_WIDE_INT to poly_int64.
> 	(set_mem_alias_set, set_mem_addr_space, set_mem_align, set_mem_expr)
> 	(clear_mem_offset, clear_mem_size, change_address)
> 	(get_spill_slot_decl, set_mem_attrs_for_spill): Directly
> 	initialize mem_attrs.
> 	(set_mem_offset, set_mem_size, adjust_address_1)
> 	(adjust_automodify_address_1, offset_address, widen_memory_access):
> 	Likewise.  Take poly_int64s rather than HOST_WIDE_INT.
> 

OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [025/nnn] poly_int: SUBREG_BYTE
  2017-10-23 17:10 ` [025/nnn] poly_int: SUBREG_BYTE Richard Sandiford
@ 2017-12-06 18:50   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06 18:50 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:10 AM, Richard Sandiford wrote:
> This patch changes SUBREG_BYTE from an int to a poly_int.
> Since valid SUBREG_BYTEs must be contained within the mode of the
> SUBREG_REG, the required range is the same as for GET_MODE_SIZE,
> i.e. unsigned short.  The patch therefore uses poly_uint16(_pod)
> for the SUBREG_BYTE.
> 
> Using poly_uint16_pod rtx fields requires a new field code ('p').
> Since there are no other uses of 'p' besides SUBREG_BYTE, the patch
> doesn't add an XPOLY or whatever; all uses should go via SUBREG_BYTE
> instead.
> 
> The patch doesn't bother implementing 'p' support for legacy
> define_peepholes, since none of the remaining ones have subregs
> in their patterns.
> 
> As it happened, the rtl documentation used SUBREG as an example of a
> code with mixed field types, accessed via XEXP (x, 0) and XINT (x, 1).
> Since there's no direct replacement for XINT, and since people should
> never use it even if there were, the patch changes the example to use
> INT_LIST instead.
> 
> The patch also changes subreg-related helper functions so that they too
> take and return polynomial offsets.  This makes the patch quite big, but
> it's mostly mechanical.  The patch generally sticks to existing choices
> wrt signedness.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* doc/rtl.texi: Update documentation of SUBREG_BYTE.  Document the
> 	'p' format code.  Use INT_LIST rather than SUBREG as the example of
> 	a code with an XINT and an XEXP.  Remove the implication that
> 	accessing an rtx field using XINT is expected to work.
> 	* rtl.def (SUBREG): Change format from "ei" to "ep".
> 	* rtl.h (rtunion::rt_subreg): New field.
> 	(XCSUBREG): New macro.
> 	(SUBREG_BYTE): Use it.
> 	(subreg_shape): Change offset from an unsigned int to a poly_uint16.
> 	Update constructor accordingly.
> 	(subreg_shape::operator ==): Update accordingly.
> 	(subreg_shape::unique_id): Return an unsigned HOST_WIDE_INT rather
> 	than an unsigned int.
> 	(subreg_lsb, subreg_lowpart_offset, subreg_highpart_offset): Return
> 	a poly_uint64 rather than an unsigned int.
> 	(subreg_lsb_1): Likewise.  Take the offset as a poly_uint64 rather
> 	than an unsigned int.
> 	(subreg_size_offset_from_lsb, subreg_size_lowpart_offset)
> 	(subreg_size_highpart_offset): Return a poly_uint64 rather than
> 	an unsigned int.  Take the sizes as poly_uint64s.
> 	(subreg_offset_from_lsb): Return a poly_uint64 rather than
> 	an unsigned int.  Take the shift as a poly_uint64 rather than
> 	an unsigned int.
> 	(subreg_regno_offset, subreg_offset_representable_p): Take the offset
> 	as a poly_uint64 rather than an unsigned int.
> 	(simplify_subreg_regno): Likewise.
> 	(byte_lowpart_offset): Return the memory offset as a poly_int64
> 	rather than an int.
> 	(subreg_memory_offset): Likewise.  Take the subreg offset as a
> 	poly_uint64 rather than an unsigned int.
> 	(simplify_subreg, simplify_gen_subreg, subreg_get_info)
> 	(gen_rtx_SUBREG, validate_subreg): Take the subreg offset as a
> 	poly_uint64 rather than an unsigned int.
> 	* rtl.c (rtx_format): Describe 'p' in comment.
> 	(copy_rtx, rtx_equal_p_cb, rtx_equal_p): Handle 'p'.
> 	* emit-rtl.c (validate_subreg, gen_rtx_SUBREG): Take the subreg
> 	offset as a poly_uint64 rather than an unsigned int.
> 	(byte_lowpart_offset): Return the memory offset as a poly_int64
> 	rather than an int.
> 	(subreg_memory_offset): Likewise.  Take the subreg offset as a
> 	poly_uint64 rather than an unsigned int.
> 	(subreg_size_lowpart_offset, subreg_size_highpart_offset): Take the
> 	mode sizes as poly_uint64s rather than unsigned ints.  Return a
> 	poly_uint64 rather than an unsigned int.
> 	(subreg_lowpart_p): Treat subreg offsets as poly_ints.
> 	(copy_insn_1): Handle 'p'.
> 	* rtlanal.c (set_noop_p): Treat subregs offsets as poly_uint64s.
> 	(subreg_lsb_1): Take the subreg offset as a poly_uint64 rather than
> 	an unsigned int.  Return the shift in the same way.
> 	(subreg_lsb): Return the shift as a poly_uint64 rather than an
> 	unsigned int.
> 	(subreg_size_offset_from_lsb): Take the sizes and shift as
> 	poly_uint64s rather than unsigned ints.  Return the offset as
> 	a poly_uint64.
> 	(subreg_get_info, subreg_regno_offset, subreg_offset_representable_p)
> 	(simplify_subreg_regno): Take the offset as a poly_uint64 rather than
> 	an unsigned int.
> 	* rtlhash.c (add_rtx): Handle 'p'.
> 	* genemit.c (gen_exp): Likewise.
> 	* gengenrtl.c (type_from_format, gendef): Likewise.
> 	* gensupport.c (subst_pattern_match, get_alternatives_number)
> 	(collect_insn_data, alter_predicate_for_insn, alter_constraints)
> 	(subst_dup): Likewise.
> 	* gengtype.c (adjust_field_rtx_def): Likewise.
> 	* genrecog.c (find_operand, find_matching_operand, validate_pattern)
> 	(match_pattern_2): Likewise.
> 	(rtx_test::SUBREG_FIELD): New rtx_test::kind_enum.
> 	(rtx_test::subreg_field): New function.
> 	(operator ==, safe_to_hoist_p, transition_parameter_type)
> 	(print_nonbool_test, print_test): Handle SUBREG_FIELD.
> 	* genattrtab.c (attr_rtx_1): Say that 'p' is deliberately not handled.
> 	* genpeep.c (match_rtx): Likewise.
> 	* print-rtl.c (print_poly_int): Include if GENERATOR_FILE too.
> 	(rtx_writer::print_rtx_operand): Handle 'p'.
> 	(print_value): Handle SUBREG.
> 	* read-rtl.c (apply_int_iterator): Likewise.
> 	(rtx_reader::read_rtx_operand): Handle 'p'.
> 	* alias.c (rtx_equal_for_memref_p): Likewise.
> 	* cselib.c (rtx_equal_for_cselib_1, cselib_hash_rtx): Likewise.
> 	* caller-save.c (replace_reg_with_saved_mem): Treat subreg offsets
> 	as poly_ints.
> 	* calls.c (expand_call): Likewise.
> 	* combine.c (combine_simplify_rtx, expand_field_assignment): Likewise.
> 	(make_extraction, gen_lowpart_for_combine): Likewise.
> 	* loop-invariant.c (hash_invariant_expr_1, invariant_expr_equal_p):
> 	Likewise.
> 	* cse.c (remove_invalid_subreg_refs): Take the offset as a poly_uint64
> 	rather than an unsigned int.  Treat subreg offsets as poly_ints.
> 	(exp_equiv_p): Handle 'p'.
> 	(hash_rtx_cb): Likewise.  Treat subreg offsets as poly_ints.
> 	(equiv_constant, cse_insn): Treat subreg offsets as poly_ints.
> 	* dse.c (find_shift_sequence): Likewise.
> 	* dwarf2out.c (rtl_for_decl_location): Likewise.
> 	* expmed.c (extract_low_bits): Likewise.
> 	* expr.c (emit_group_store, undefined_operand_subword_p): Likewise.
> 	(expand_expr_real_2): Likewise.
> 	* final.c (alter_subreg): Likewise.
> 	(leaf_renumber_regs_insn): Handle 'p'.
> 	* function.c (assign_parm_find_stack_rtl, assign_parm_setup_stack):
> 	Treat subreg offsets as poly_ints.
> 	* fwprop.c (forward_propagate_and_simplify): Likewise.
> 	* ifcvt.c (noce_emit_move_insn, noce_emit_cmove): Likewise.
> 	* ira.c (get_subreg_tracking_sizes): Likewise.
> 	* ira-conflicts.c (go_through_subreg): Likewise.
> 	* ira-lives.c (process_single_reg_class_operands): Likewise.
> 	* jump.c (rtx_renumbered_equal_p): Likewise.  Handle 'p'.
> 	* lower-subreg.c (simplify_subreg_concatn): Take the subreg offset
> 	as a poly_uint64 rather than an unsigned int.
> 	(simplify_gen_subreg_concatn, resolve_simple_move): Treat
> 	subreg offsets as poly_ints.
> 	* lra-constraints.c (operands_match_p): Handle 'p'.
> 	(match_reload, curr_insn_transform): Treat subreg offsets as poly_ints.
> 	* lra-spills.c (assign_mem_slot): Likewise.
> 	* postreload.c (move2add_valid_value_p): Likewise.
> 	* recog.c (general_operand, indirect_operand): Likewise.
> 	* regcprop.c (copy_value, maybe_mode_change): Likewise.
> 	(copyprop_hardreg_forward_1): Likewise.
> 	* reginfo.c (simplifiable_subregs_hasher::hash, simplifiable_subregs)
> 	(record_subregs_of_mode): Likewise.
> 	* rtlhooks.c (gen_lowpart_general, gen_lowpart_if_possible): Likewise.
> 	* reload.c (operands_match_p): Handle 'p'.
> 	(find_reloads_subreg_address): Treat subreg offsets as poly_ints.
> 	* reload1.c (alter_reg, choose_reload_regs): Likewise.
> 	(compute_reload_subreg_offset): Likewise, and return an poly_int64.
> 	* simplify-rtx.c (simplify_truncation, simplify_binary_operation_1):
> 	(test_vector_ops_duplicate): Treat subreg offsets as poly_ints.
> 	(simplify_const_poly_int_tests<N>::run): Likewise.
> 	(simplify_subreg, simplify_gen_subreg): Take the subreg offset as
> 	a poly_uint64 rather than an unsigned int.
> 	* valtrack.c (debug_lowpart_subreg): Likewise.
> 	* var-tracking.c (var_lowpart): Likewise.
> 	(loc_cmp): Handle 'p'.
> 

> Index: gcc/rtlanal.c
[ ... Going to assume these bits are right WRT the endianness bits ...]

> -      unsigned int lower_word_part = lower_bytes & -UNITS_PER_WORD;
> -      unsigned int upper_word_part = upper_bytes & -UNITS_PER_WORD;
> +      /* When bytes and words have oppposite endianness, we must be able
Nit.  Oppposite

OK with nit fixed.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [029/nnn] poly_int: get_ref_base_and_extent
  2017-10-23 17:12 ` [029/nnn] poly_int: get_ref_base_and_extent Richard Sandiford
@ 2017-12-06 20:03   ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-06 20:03 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:11 AM, Richard Sandiford wrote:
> This patch changes the types of the bit offsets and sizes returned
> by get_ref_base_and_extent to poly_int64.
> 
> There are some callers that can't sensibly operate on polynomial
> offsets or handle cases where the offset and size aren't known
> exactly.  This includes the IPA devirtualisation code (since
> there's no defined way of having vtables at variable offsets)
> and some parts of the DWARF code.  The patch therefore adds
> a helper function get_ref_base_and_extent_hwi that either returns
> exact HOST_WIDE_INT bit positions and sizes or returns a null
> base to indicate failure.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* tree-dfa.h (get_ref_base_and_extent): Return the base, size and
> 	max_size as poly_int64_pods rather than HOST_WIDE_INTs.
> 	(get_ref_base_and_extent_hwi): Declare.
> 	* tree-dfa.c (get_ref_base_and_extent): Return the base, size and
> 	max_size as poly_int64_pods rather than HOST_WIDE_INTs.
> 	(get_ref_base_and_extent_hwi): New function.
> 	* cfgexpand.c (expand_debug_expr): Update call to
> 	get_ref_base_and_extent.
> 	* dwarf2out.c (add_var_loc_to_decl): Likewise.
> 	* gimple-fold.c (get_base_constructor): Return the offset as a
> 	poly_int64_pod rather than a HOST_WIDE_INT.
> 	(fold_const_aggregate_ref_1): Track polynomial sizes and offsets.
> 	* ipa-polymorphic-call.c
> 	(ipa_polymorphic_call_context::set_by_invariant)
> 	(extr_type_from_vtbl_ptr_store): Track polynomial offsets.
> 	(ipa_polymorphic_call_context::ipa_polymorphic_call_context)
> 	(check_stmt_for_type_change): Use get_ref_base_and_extent_hwi
> 	rather than get_ref_base_and_extent.
> 	(ipa_polymorphic_call_context::get_dynamic_type): Likewise.
> 	* ipa-prop.c (ipa_load_from_parm_agg, compute_complex_assign_jump_func)
> 	(get_ancestor_addr_info, determine_locally_known_aggregate_parts):
> 	Likewise.
> 	(ipa_get_adjustment_candidate): Update call to get_ref_base_and_extent.
> 	* tree-sra.c (create_access, get_access_for_expr): Likewise.
> 	* tree-ssa-alias.c (ao_ref_base, aliasing_component_refs_p)
> 	(stmt_kills_ref_p): Likewise.
> 	* tree-ssa-dce.c (mark_aliased_reaching_defs_necessary_1): Likewise.
> 	* tree-ssa-scopedtables.c (avail_expr_hash, equal_mem_array_ref_p):
> 	Likewise.
> 	* tree-ssa-sccvn.c (vn_reference_lookup_3): Likewise.
> 	Use get_ref_base_and_extent_hwi rather than get_ref_base_and_extent
> 	when calling native_encode_expr.
> 	* tree-ssa-structalias.c (get_constraint_for_component_ref): Update
> 	call to get_ref_base_and_extent.
> 	(do_structure_copy): Use get_ref_base_and_extent_hwi rather than
> 	get_ref_base_and_extent.
> 	* var-tracking.c (track_expr_p): Likewise.
> 

I initially missed some of the additional checks performed by
get_ref_base_and_extend_hwi and thought we had a problem with the
transition to use that routine in various places.  But eventually I saw
the light.



OK.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-11-14  0:42     ` Richard Sandiford
@ 2017-12-06 20:11       ` Jeff Law
  2017-12-07 14:46         ` Richard Biener
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-12-06 20:11 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 11/13/2017 05:04 PM, Richard Sandiford wrote:
> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> Richard Sandiford <richard.sandiford@linaro.org> writes:
>>> This patch adds a new "poly_int" class to represent polynomial integers
>>> of the form:
>>>
>>>   C0 + C1*X1 + C2*X2 ... + Cn*Xn
>>>
>>> It also adds poly_int-based typedefs for offsets and sizes of various
>>> precisions.  In these typedefs, the Ci coefficients are compile-time
>>> constants and the Xi indeterminates are run-time invariants.  The number
>>> of coefficients is controlled by the target and is initially 1 for all
>>> ports.
>>>
>>> Most routines can handle general coefficient counts, but for now a few
>>> are specific to one or two coefficients.  Support for other coefficient
>>> counts can be added when needed.
>>>
>>> The patch also adds a new macro, IN_TARGET_CODE, that can be
>>> set to indicate that a TU contains target-specific rather than
>>> target-independent code.  When this macro is set and the number of
>>> coefficients is 1, the poly-int.h classes define a conversion operator
>>> to a constant.  This allows most existing target code to work without
>>> modification.  The main exceptions are:
>>>
>>> - values passed through ..., which need an explicit conversion to a
>>>   constant
>>>
>>> - ?: expression in which one arm ends up being a polynomial and the
>>>   other remains a constant.  In these cases it would be valid to convert
>>>   the constant to a polynomial and the polynomial to a constant, so a
>>>   cast is needed to break the ambiguity.
>>>
>>> The patch also adds a new target hook to return the estimated
>>> value of a polynomial for costing purposes.
>>>
>>> The patch also adds operator<< on wide_ints (it was already defined
>>> for offset_int and widest_int).  I think this was originally excluded
>>> because >> is ambiguous for wide_int, but << is useful for converting
>>> bytes to bits, etc., so is worth defining on its own.  The patch also
>>> adds operator% and operator/ for offset_int and widest_int, since those
>>> types are always signed.  These changes allow the poly_int interface to
>>> be more predictable.
>>>
>>> I'd originally tried adding the tests as selftests, but that ended up
>>> bloating cc1 by at least a third.  It also took a while to build them
>>> at -O2.  The patch therefore uses plugin tests instead, where we can
>>> force the tests to be built at -O0.  They still run in negligible time
>>> when built that way.
>>
>> Changes in v2:
>>
>> - Drop the controversial known_zero etc. wrapper functions.
>> - Fix the operator<<= bug that Martin found.
>> - Switch from "t" to "type" in SFINAE classes (requested by Martin).
>>
>> Not changed in v2:
>>
>> - Default constructors are still empty.  I agree it makes sense to use
>>   "= default" when we switch to C++11, but it would be dangerous for
>>   that to make "poly_int64 x;" less defined than it is now.
> 
> After talking about this a bit more internally, it was obvious that
> the choice of "must" and "may" for the predicate names was a common
> sticking point.  The idea was to match the names of alias predicates,
> but given my track record with names ("too_empty_p" being a recently
> questioned example :-)), I'd be happy to rename them to something else.
> Some alternatives we came up with were:
I didn't find the must vs may naming problematical as I was going
through the changes.  What I did find much more difficult was
determining if the behavior was correct when we used a "may" predicate.
It really relies a good deal on knowing the surrounding code.

In places where I knew the code reasonably well could tell without much
surrounding context.  In other places I had to look at the code and
deduce proper behavior in the "may" cases -- and often I resorted to
spot checking and relying on your reputation & testing to DTRT.


> 
> - known_eq / maybe_eq / known_lt / maybe_lt etc.
> 
>   Some functions already use "known" and "maybe", so this would arguably
>   be more consistent than using "must" and "may".
> 
> - always_eq / sometimes_eq / always_lt / sometimes_lt
> 
>   Similar to the previous one in intent.  It's just a question of which
>   wordng is clearer.
> 
> - forall_eq / exists_eq / forall_lt / exists_lt etc.
> 
>   Matches the usual logic quantifiers.  This seems quite appealing,
>   as long as it's obvious that in:
> 
>     forall_eq (v0, v1)
> 
>   v0 and v1 themselves are already bound: if vi == ai + bi*X then
>   what we really saying is:
> 
>     forall X, a0 + b0*X == a1 + b1*X 
> 
> Which of those sounds best?  Any other suggestions?
I can live with any of them.  I tend to prefer one of the first two, but
it's not a major concern for me.  So if you or others have a clear
preference, go with it.


jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [047/nnn] poly_int: argument sizes
  2017-10-23 17:20 ` [047/nnn] poly_int: argument sizes Richard Sandiford
@ 2017-12-06 20:57   ` Jeff Law
  2017-12-20 11:37     ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-12-06 20:57 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 11:20 AM, Richard Sandiford wrote:
> This patch changes various bits of state related to argument sizes so
> that they have type poly_int64 rather than HOST_WIDE_INT.  This includes:
> 
> - incoming_args::pops_args and incoming_args::size
> - rtl_data::outgoing_args_size
> - pending_stack_adjust
> - stack_pointer_delta
> - stack_usage::pushed_stack_size
> - args_size::constant
> 
> It also changes TARGET_RETURN_POPS_ARGS so that the size of the
> arguments passed in and the size returned by the hook are both
> poly_int64s.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* target.def (return_pops_args): Treat both the input and output
> 	sizes as poly_int64s rather than HOST_WIDE_INTS.
> 	* targhooks.h (default_return_pops_args): Update accordingly.
> 	* targhooks.c (default_return_pops_args): Likewise.
> 	* doc/tm.texi: Regenerate.
> 	* emit-rtl.h (incoming_args): Change pops_args, size and
> 	outgoing_args_size from int to poly_int64_pod.
> 	* function.h (expr_status): Change x_pending_stack_adjust and
> 	x_stack_pointer_delta from int to poly_int64.
> 	(args_size::constant): Change from HOST_WIDE_INT to poly_int64.
> 	(ARGS_SIZE_RTX): Update accordingly.
> 	* calls.c (highest_outgoing_arg_in_use): Change from int to
> 	unsigned int.
> 	(stack_usage_watermark, stored_args_watermark): New variables.
> 	(stack_region_maybe_used_p, mark_stack_region_used): New functions.
> 	(emit_call_1): Change the stack_size and rounded_stack_size
> 	parameters from HOST_WIDE_INT to poly_int64.  Track n_popped
> 	as a poly_int64.
> 	(save_fixed_argument_area): Check stack_usage_watermark.
> 	(initialize_argument_information): Change old_pending_adj from
> 	a HOST_WIDE_INT * to a poly_int64_pod *.
> 	(compute_argument_block_size): Return the size as a poly_int64
> 	rather than an int.
> 	(finalize_must_preallocate): Track polynomial argument sizes.
> 	(compute_argument_addresses): Likewise.
> 	(internal_arg_pointer_based_exp): Track polynomial offsets.
> 	(mem_overlaps_already_clobbered_arg_p): Rename to...
> 	(mem_might_overlap_already_clobbered_arg_p): ...this and take the
> 	size as a poly_uint64 rather than an unsigned HOST_WIDE_INT.
> 	Check stored_args_used_watermark.
> 	(load_register_parameters): Update accordingly.
> 	(check_sibcall_argument_overlap_1): Likewise.
> 	(combine_pending_stack_adjustment_and_call): Take the unadjusted
> 	args size as a poly_int64 rather than an int.  Return a bool
> 	indicating whether the optimization was possible and return
> 	the new adjustment by reference.
> 	(check_sibcall_argument_overlap): Track polynomail argument sizes.
> 	Update stored_args_watermark.
> 	(can_implement_as_sibling_call_p): Handle polynomial argument sizes.
> 	(expand_call): Likewise.  Maintain stack_usage_watermark and
> 	stored_args_watermark.  Update calls to
> 	combine_pending_stack_adjustment_and_call.
> 	(emit_library_call_value_1): Handle polynomial argument sizes.
> 	Call stack_region_maybe_used_p and mark_stack_region_used.
> 	Maintain stack_usage_watermark.
> 	(store_one_arg): Likewise.  Update call to
> 	mem_overlaps_already_clobbered_arg_p.
> 	* config/arm/arm.c (arm_output_function_prologue): Add a cast to
> 	HOST_WIDE_INT.
> 	* config/avr/avr.c (avr_outgoing_args_size): Likewise.
> 	* config/microblaze/microblaze.c (microblaze_function_prologue):
> 	Likewise.
> 	* config/cr16/cr16.c (cr16_return_pops_args): Update for new
> 	TARGET_RETURN_POPS_ARGS interface.
> 	(cr16_compute_frame, cr16_initial_elimination_offset): Add casts
> 	to HOST_WIDE_INT.
> 	* config/ft32/ft32.c (ft32_compute_frame): Likewise.
> 	* config/i386/i386.c (ix86_return_pops_args): Update for new
> 	TARGET_RETURN_POPS_ARGS interface.
> 	(ix86_expand_split_stack_prologue): Add a cast to HOST_WIDE_INT.
> 	* config/moxie/moxie.c (moxie_compute_frame): Likewise.
> 	* config/m68k/m68k.c (m68k_return_pops_args): Update for new
> 	TARGET_RETURN_POPS_ARGS interface.
> 	* config/vax/vax.c (vax_return_pops_args): Likewise.
> 	* config/pa/pa.h (STACK_POINTER_OFFSET): Add a cast to poly_int64.
> 	(EXIT_IGNORE_STACK): Update reference to crtl->outgoing_args_size.
> 	* config/arm/arm.h (CALLER_INTERWORKING_SLOT_SIZE): Likewise.
> 	* config/powerpcspe/aix.h (STACK_DYNAMIC_OFFSET): Likewise.
> 	* config/powerpcspe/darwin.h (STACK_DYNAMIC_OFFSET): Likewise.
> 	* config/powerpcspe/powerpcspe.h (STACK_DYNAMIC_OFFSET): Likewise.
> 	* config/rs6000/aix.h (STACK_DYNAMIC_OFFSET): Likewise.
> 	* config/rs6000/darwin.h (STACK_DYNAMIC_OFFSET): Likewise.
> 	* config/rs6000/rs6000.h (STACK_DYNAMIC_OFFSET): Likewise.
> 	* dojump.h (saved_pending_stack_adjust): Change x_pending_stack_adjust
> 	and x_stack_pointer_delta from int to poly_int64.
> 	* dojump.c (do_pending_stack_adjust): Update accordingly.
> 	* explow.c (allocate_dynamic_stack_space): Handle polynomial
> 	stack_pointer_deltas.
> 	* function.c (STACK_DYNAMIC_OFFSET): Add a cast to poly_int64.
> 	(pad_to_arg_alignment): Track polynomial offsets.
> 	(assign_parm_find_stack_rtl): Likewise.
> 	(assign_parms, locate_and_pad_parm): Handle polynomial argument sizes.
> 	* toplev.c (output_stack_usage): Update reference to
> 	current_function_pushed_stack_size.
I haven't been able to convince myself that the changes to the
stack_usage_map are correct, particularly in the case where the
UPPER_BOUND is not constant.  But I'm willing to let it go given the
only target that could potentially be affected would be SVE and I'd
expect that if you'd gotten it wrong it would have showed up in your
testing.

I'm also slightly worried about how we handle ARGS_GROW_DOWNWARD targets
(pa, stormy16).  I couldn't convince myself those changes were correct
either.  Again, I'm willing on fall back to lean on your extensive
experience and testing here.

Of all the patches to-date I've looked at, this one worries me the most
(which is why it's next to last according to my records).  The potential
for a goof in the argument setup, padding, stack save area, and the
consequences for such a goof worry me.

I'm going to conditionally ack this -- my condition is that you
re-review the the calls.c/function.c changes as well.

Jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-12-06 20:11       ` Jeff Law
@ 2017-12-07 14:46         ` Richard Biener
  2017-12-07 15:08           ` Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Biener @ 2017-12-07 14:46 UTC (permalink / raw)
  To: Jeff Law; +Cc: GCC Patches, Richard Sandiford

On Wed, Dec 6, 2017 at 9:11 PM, Jeff Law <law@redhat.com> wrote:
> On 11/13/2017 05:04 PM, Richard Sandiford wrote:
>> Richard Sandiford <richard.sandiford@linaro.org> writes:
>>> Richard Sandiford <richard.sandiford@linaro.org> writes:
>>>> This patch adds a new "poly_int" class to represent polynomial integers
>>>> of the form:
>>>>
>>>>   C0 + C1*X1 + C2*X2 ... + Cn*Xn
>>>>
>>>> It also adds poly_int-based typedefs for offsets and sizes of various
>>>> precisions.  In these typedefs, the Ci coefficients are compile-time
>>>> constants and the Xi indeterminates are run-time invariants.  The number
>>>> of coefficients is controlled by the target and is initially 1 for all
>>>> ports.
>>>>
>>>> Most routines can handle general coefficient counts, but for now a few
>>>> are specific to one or two coefficients.  Support for other coefficient
>>>> counts can be added when needed.
>>>>
>>>> The patch also adds a new macro, IN_TARGET_CODE, that can be
>>>> set to indicate that a TU contains target-specific rather than
>>>> target-independent code.  When this macro is set and the number of
>>>> coefficients is 1, the poly-int.h classes define a conversion operator
>>>> to a constant.  This allows most existing target code to work without
>>>> modification.  The main exceptions are:
>>>>
>>>> - values passed through ..., which need an explicit conversion to a
>>>>   constant
>>>>
>>>> - ?: expression in which one arm ends up being a polynomial and the
>>>>   other remains a constant.  In these cases it would be valid to convert
>>>>   the constant to a polynomial and the polynomial to a constant, so a
>>>>   cast is needed to break the ambiguity.
>>>>
>>>> The patch also adds a new target hook to return the estimated
>>>> value of a polynomial for costing purposes.
>>>>
>>>> The patch also adds operator<< on wide_ints (it was already defined
>>>> for offset_int and widest_int).  I think this was originally excluded
>>>> because >> is ambiguous for wide_int, but << is useful for converting
>>>> bytes to bits, etc., so is worth defining on its own.  The patch also
>>>> adds operator% and operator/ for offset_int and widest_int, since those
>>>> types are always signed.  These changes allow the poly_int interface to
>>>> be more predictable.
>>>>
>>>> I'd originally tried adding the tests as selftests, but that ended up
>>>> bloating cc1 by at least a third.  It also took a while to build them
>>>> at -O2.  The patch therefore uses plugin tests instead, where we can
>>>> force the tests to be built at -O0.  They still run in negligible time
>>>> when built that way.
>>>
>>> Changes in v2:
>>>
>>> - Drop the controversial known_zero etc. wrapper functions.
>>> - Fix the operator<<= bug that Martin found.
>>> - Switch from "t" to "type" in SFINAE classes (requested by Martin).
>>>
>>> Not changed in v2:
>>>
>>> - Default constructors are still empty.  I agree it makes sense to use
>>>   "= default" when we switch to C++11, but it would be dangerous for
>>>   that to make "poly_int64 x;" less defined than it is now.
>>
>> After talking about this a bit more internally, it was obvious that
>> the choice of "must" and "may" for the predicate names was a common
>> sticking point.  The idea was to match the names of alias predicates,
>> but given my track record with names ("too_empty_p" being a recently
>> questioned example :-)), I'd be happy to rename them to something else.
>> Some alternatives we came up with were:
> I didn't find the must vs may naming problematical as I was going
> through the changes.  What I did find much more difficult was
> determining if the behavior was correct when we used a "may" predicate.
> It really relies a good deal on knowing the surrounding code.
>
> In places where I knew the code reasonably well could tell without much
> surrounding context.  In other places I had to look at the code and
> deduce proper behavior in the "may" cases -- and often I resorted to
> spot checking and relying on your reputation & testing to DTRT.
>
>
>>
>> - known_eq / maybe_eq / known_lt / maybe_lt etc.
>>
>>   Some functions already use "known" and "maybe", so this would arguably
>>   be more consistent than using "must" and "may".
>>
>> - always_eq / sometimes_eq / always_lt / sometimes_lt
>>
>>   Similar to the previous one in intent.  It's just a question of which
>>   wordng is clearer.
>>
>> - forall_eq / exists_eq / forall_lt / exists_lt etc.
>>
>>   Matches the usual logic quantifiers.  This seems quite appealing,
>>   as long as it's obvious that in:
>>
>>     forall_eq (v0, v1)
>>
>>   v0 and v1 themselves are already bound: if vi == ai + bi*X then
>>   what we really saying is:
>>
>>     forall X, a0 + b0*X == a1 + b1*X
>>
>> Which of those sounds best?  Any other suggestions?
> I can live with any of them.  I tend to prefer one of the first two, but
> it's not a major concern for me.  So if you or others have a clear
> preference, go with it.

Whatever you do use a consistent naming which I guess means
using known_eq / maybe_eq?

Otherwise ok.

Richard.

>
> jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-12-07 14:46         ` Richard Biener
@ 2017-12-07 15:08           ` Jeff Law
  2017-12-07 22:39             ` Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-12-07 15:08 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Richard Sandiford

On 12/07/2017 07:46 AM, Richard Biener wrote:
> On Wed, Dec 6, 2017 at 9:11 PM, Jeff Law <law@redhat.com> wrote:
>> On 11/13/2017 05:04 PM, Richard Sandiford wrote:
>>> Richard Sandiford <richard.sandiford@linaro.org> writes:
>>>> Richard Sandiford <richard.sandiford@linaro.org> writes:
>>>>> This patch adds a new "poly_int" class to represent polynomial integers
>>>>> of the form:
>>>>>
>>>>>   C0 + C1*X1 + C2*X2 ... + Cn*Xn
>>>>>
>>>>> It also adds poly_int-based typedefs for offsets and sizes of various
>>>>> precisions.  In these typedefs, the Ci coefficients are compile-time
>>>>> constants and the Xi indeterminates are run-time invariants.  The number
>>>>> of coefficients is controlled by the target and is initially 1 for all
>>>>> ports.
>>>>>
>>>>> Most routines can handle general coefficient counts, but for now a few
>>>>> are specific to one or two coefficients.  Support for other coefficient
>>>>> counts can be added when needed.
>>>>>
>>>>> The patch also adds a new macro, IN_TARGET_CODE, that can be
>>>>> set to indicate that a TU contains target-specific rather than
>>>>> target-independent code.  When this macro is set and the number of
>>>>> coefficients is 1, the poly-int.h classes define a conversion operator
>>>>> to a constant.  This allows most existing target code to work without
>>>>> modification.  The main exceptions are:
>>>>>
>>>>> - values passed through ..., which need an explicit conversion to a
>>>>>   constant
>>>>>
>>>>> - ?: expression in which one arm ends up being a polynomial and the
>>>>>   other remains a constant.  In these cases it would be valid to convert
>>>>>   the constant to a polynomial and the polynomial to a constant, so a
>>>>>   cast is needed to break the ambiguity.
>>>>>
>>>>> The patch also adds a new target hook to return the estimated
>>>>> value of a polynomial for costing purposes.
>>>>>
>>>>> The patch also adds operator<< on wide_ints (it was already defined
>>>>> for offset_int and widest_int).  I think this was originally excluded
>>>>> because >> is ambiguous for wide_int, but << is useful for converting
>>>>> bytes to bits, etc., so is worth defining on its own.  The patch also
>>>>> adds operator% and operator/ for offset_int and widest_int, since those
>>>>> types are always signed.  These changes allow the poly_int interface to
>>>>> be more predictable.
>>>>>
>>>>> I'd originally tried adding the tests as selftests, but that ended up
>>>>> bloating cc1 by at least a third.  It also took a while to build them
>>>>> at -O2.  The patch therefore uses plugin tests instead, where we can
>>>>> force the tests to be built at -O0.  They still run in negligible time
>>>>> when built that way.
>>>>
>>>> Changes in v2:
>>>>
>>>> - Drop the controversial known_zero etc. wrapper functions.
>>>> - Fix the operator<<= bug that Martin found.
>>>> - Switch from "t" to "type" in SFINAE classes (requested by Martin).
>>>>
>>>> Not changed in v2:
>>>>
>>>> - Default constructors are still empty.  I agree it makes sense to use
>>>>   "= default" when we switch to C++11, but it would be dangerous for
>>>>   that to make "poly_int64 x;" less defined than it is now.
>>>
>>> After talking about this a bit more internally, it was obvious that
>>> the choice of "must" and "may" for the predicate names was a common
>>> sticking point.  The idea was to match the names of alias predicates,
>>> but given my track record with names ("too_empty_p" being a recently
>>> questioned example :-)), I'd be happy to rename them to something else.
>>> Some alternatives we came up with were:
>> I didn't find the must vs may naming problematical as I was going
>> through the changes.  What I did find much more difficult was
>> determining if the behavior was correct when we used a "may" predicate.
>> It really relies a good deal on knowing the surrounding code.
>>
>> In places where I knew the code reasonably well could tell without much
>> surrounding context.  In other places I had to look at the code and
>> deduce proper behavior in the "may" cases -- and often I resorted to
>> spot checking and relying on your reputation & testing to DTRT.
>>
>>
>>>
>>> - known_eq / maybe_eq / known_lt / maybe_lt etc.
>>>
>>>   Some functions already use "known" and "maybe", so this would arguably
>>>   be more consistent than using "must" and "may".
>>>
>>> - always_eq / sometimes_eq / always_lt / sometimes_lt
>>>
>>>   Similar to the previous one in intent.  It's just a question of which
>>>   wordng is clearer.
>>>
>>> - forall_eq / exists_eq / forall_lt / exists_lt etc.
>>>
>>>   Matches the usual logic quantifiers.  This seems quite appealing,
>>>   as long as it's obvious that in:
>>>
>>>     forall_eq (v0, v1)
>>>
>>>   v0 and v1 themselves are already bound: if vi == ai + bi*X then
>>>   what we really saying is:
>>>
>>>     forall X, a0 + b0*X == a1 + b1*X
>>>
>>> Which of those sounds best?  Any other suggestions?
>> I can live with any of them.  I tend to prefer one of the first two, but
>> it's not a major concern for me.  So if you or others have a clear
>> preference, go with it.
> 
> Whatever you do use a consistent naming which I guess means
> using known_eq / maybe_eq?
> 
> Otherwise ok.
So I think that's the final ack on this series.  Richard S. can you
confirm?  I fully expect the trunk has moved some and the patches will
need adjustments -- consider adjustments which work in a manner similar
to the patches to date pre-approved.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-12-07 15:08           ` Jeff Law
@ 2017-12-07 22:39             ` Richard Sandiford
  2017-12-07 22:48               ` Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-12-07 22:39 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Biener, GCC Patches

Jeff Law <law@redhat.com> writes:
> On 12/07/2017 07:46 AM, Richard Biener wrote:
>> On Wed, Dec 6, 2017 at 9:11 PM, Jeff Law <law@redhat.com> wrote:
>>> On 11/13/2017 05:04 PM, Richard Sandiford wrote:
>>>> Richard Sandiford <richard.sandiford@linaro.org> writes:
>>>>> Richard Sandiford <richard.sandiford@linaro.org> writes:
>>>>>> This patch adds a new "poly_int" class to represent polynomial integers
>>>>>> of the form:
>>>>>>
>>>>>>   C0 + C1*X1 + C2*X2 ... + Cn*Xn
>>>>>>
>>>>>> It also adds poly_int-based typedefs for offsets and sizes of various
>>>>>> precisions.  In these typedefs, the Ci coefficients are compile-time
>>>>>> constants and the Xi indeterminates are run-time invariants.  The number
>>>>>> of coefficients is controlled by the target and is initially 1 for all
>>>>>> ports.
>>>>>>
>>>>>> Most routines can handle general coefficient counts, but for now a few
>>>>>> are specific to one or two coefficients.  Support for other coefficient
>>>>>> counts can be added when needed.
>>>>>>
>>>>>> The patch also adds a new macro, IN_TARGET_CODE, that can be
>>>>>> set to indicate that a TU contains target-specific rather than
>>>>>> target-independent code.  When this macro is set and the number of
>>>>>> coefficients is 1, the poly-int.h classes define a conversion operator
>>>>>> to a constant.  This allows most existing target code to work without
>>>>>> modification.  The main exceptions are:
>>>>>>
>>>>>> - values passed through ..., which need an explicit conversion to a
>>>>>>   constant
>>>>>>
>>>>>> - ?: expression in which one arm ends up being a polynomial and the
>>>>>>   other remains a constant.  In these cases it would be valid to convert
>>>>>>   the constant to a polynomial and the polynomial to a constant, so a
>>>>>>   cast is needed to break the ambiguity.
>>>>>>
>>>>>> The patch also adds a new target hook to return the estimated
>>>>>> value of a polynomial for costing purposes.
>>>>>>
>>>>>> The patch also adds operator<< on wide_ints (it was already defined
>>>>>> for offset_int and widest_int).  I think this was originally excluded
>>>>>> because >> is ambiguous for wide_int, but << is useful for converting
>>>>>> bytes to bits, etc., so is worth defining on its own.  The patch also
>>>>>> adds operator% and operator/ for offset_int and widest_int, since those
>>>>>> types are always signed.  These changes allow the poly_int interface to
>>>>>> be more predictable.
>>>>>>
>>>>>> I'd originally tried adding the tests as selftests, but that ended up
>>>>>> bloating cc1 by at least a third.  It also took a while to build them
>>>>>> at -O2.  The patch therefore uses plugin tests instead, where we can
>>>>>> force the tests to be built at -O0.  They still run in negligible time
>>>>>> when built that way.
>>>>>
>>>>> Changes in v2:
>>>>>
>>>>> - Drop the controversial known_zero etc. wrapper functions.
>>>>> - Fix the operator<<= bug that Martin found.
>>>>> - Switch from "t" to "type" in SFINAE classes (requested by Martin).
>>>>>
>>>>> Not changed in v2:
>>>>>
>>>>> - Default constructors are still empty.  I agree it makes sense to use
>>>>>   "= default" when we switch to C++11, but it would be dangerous for
>>>>>   that to make "poly_int64 x;" less defined than it is now.
>>>>
>>>> After talking about this a bit more internally, it was obvious that
>>>> the choice of "must" and "may" for the predicate names was a common
>>>> sticking point.  The idea was to match the names of alias predicates,
>>>> but given my track record with names ("too_empty_p" being a recently
>>>> questioned example :-)), I'd be happy to rename them to something else.
>>>> Some alternatives we came up with were:
>>> I didn't find the must vs may naming problematical as I was going
>>> through the changes.  What I did find much more difficult was
>>> determining if the behavior was correct when we used a "may" predicate.
>>> It really relies a good deal on knowing the surrounding code.
>>>
>>> In places where I knew the code reasonably well could tell without much
>>> surrounding context.  In other places I had to look at the code and
>>> deduce proper behavior in the "may" cases -- and often I resorted to
>>> spot checking and relying on your reputation & testing to DTRT.
>>>
>>>
>>>>
>>>> - known_eq / maybe_eq / known_lt / maybe_lt etc.
>>>>
>>>>   Some functions already use "known" and "maybe", so this would arguably
>>>>   be more consistent than using "must" and "may".
>>>>
>>>> - always_eq / sometimes_eq / always_lt / sometimes_lt
>>>>
>>>>   Similar to the previous one in intent.  It's just a question of which
>>>>   wordng is clearer.
>>>>
>>>> - forall_eq / exists_eq / forall_lt / exists_lt etc.
>>>>
>>>>   Matches the usual logic quantifiers.  This seems quite appealing,
>>>>   as long as it's obvious that in:
>>>>
>>>>     forall_eq (v0, v1)
>>>>
>>>>   v0 and v1 themselves are already bound: if vi == ai + bi*X then
>>>>   what we really saying is:
>>>>
>>>>     forall X, a0 + b0*X == a1 + b1*X
>>>>
>>>> Which of those sounds best?  Any other suggestions?
>>> I can live with any of them.  I tend to prefer one of the first two, but
>>> it's not a major concern for me.  So if you or others have a clear
>>> preference, go with it.
>> 
>> Whatever you do use a consistent naming which I guess means
>> using known_eq / maybe_eq?
>> 
>> Otherwise ok.
> So I think that's the final ack on this series.

Thanks to both of you, really appreciate it!

> Richard S. can you confirm?  I fully expect the trunk has moved some
> and the patches will need adjustments -- consider adjustments which
> work in a manner similar to the patches to date pre-approved.

Yeah, that's now all of the poly_int patches.  I still owe you replies
to some of them -- I'll get to that as soon as I can.

I'll make the name changes and propagate through the series and then
commit this first patch.  I was thinking that for the rest it would
make sense to commit them individually, with individual testing of
each patch, so that it's easier to bisect.  I'll try to make sure
I don't repeat the merge mistake in the machine-mode series.

I think it'd also make sense to divide the commits up into groups rather
than do them all at once, since it's easier to do the individual testing
that way.  Does that sound OK?

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-12-07 22:39             ` Richard Sandiford
@ 2017-12-07 22:48               ` Jeff Law
  2017-12-15  3:40                 ` Martin Sebor
  0 siblings, 1 reply; 302+ messages in thread
From: Jeff Law @ 2017-12-07 22:48 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, richard.sandiford

On 12/07/2017 03:38 PM, Richard Sandiford wrote:

>> So I think that's the final ack on this series.
> 
> Thanks to both of you, really appreciate it!
Sorry it took so long.

> 
>> Richard S. can you confirm?  I fully expect the trunk has moved some
>> and the patches will need adjustments -- consider adjustments which
>> work in a manner similar to the patches to date pre-approved.
> 
> Yeah, that's now all of the poly_int patches.  I still owe you replies
> to some of them -- I'll get to that as soon as I can.
NP.  I don't think any of the questions were all that significant.
Those which were I think you already responded to.

> 
> I'll make the name changes and propagate through the series and then
> commit this first patch.  I was thinking that for the rest it would
> make sense to commit them individually, with individual testing of
> each patch, so that it's easier to bisect.  I'll try to make sure
> I don't repeat the merge mistake in the machine-mode series.
> 
> I think it'd also make sense to divide the commits up into groups rather
> than do them all at once, since it's easier to do the individual testing
> that way.  Does that sound OK?
Your call on the best way to stage in.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [002/nnn] poly_int: IN_TARGET_CODE
  2017-11-17  3:35   ` Jeff Law
@ 2017-12-15  1:08     ` Richard Sandiford
  2017-12-15 15:22       ` Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-12-15  1:08 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 10:58 AM, Richard Sandiford wrote:
>> This patch makes each target-specific TU define an IN_TARGET_CODE macro,
>> which is used to decide whether poly_int<1, C> should convert to C.
>> 
>> 
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* genattrtab.c (write_header): Define IN_TARGET_CODE to 1 in the
>> 	target C file.
>> 	* genautomata.c (main): Likewise.
>> 	* genconditions.c (write_header): Likewise.
>> 	* genemit.c (main): Likewise.
>> 	* genextract.c (print_header): Likewise.
>> 	* genopinit.c (main): Likewise.
>> 	* genoutput.c (output_prologue): Likewise.
>> 	* genpeep.c (main): Likewise.
>> 	* genpreds.c (write_insn_preds_c): Likewise.
>> 	* genrecog.c (writer_header): Likewise.
>> 	* config/aarch64/aarch64-builtins.c (IN_TARGET_CODE): Define.
>> 	* config/aarch64/aarch64-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/aarch64/aarch64.c (IN_TARGET_CODE): Likewise.
>> 	* config/aarch64/cortex-a57-fma-steering.c (IN_TARGET_CODE): Likewise.
>> 	* config/aarch64/driver-aarch64.c (IN_TARGET_CODE): Likewise.
>> 	* config/alpha/alpha.c (IN_TARGET_CODE): Likewise.
>> 	* config/alpha/driver-alpha.c (IN_TARGET_CODE): Likewise.
>> 	* config/arc/arc-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/arc/arc.c (IN_TARGET_CODE): Likewise.
>> 	* config/arc/driver-arc.c (IN_TARGET_CODE): Likewise.
>> 	* config/arm/aarch-common.c (IN_TARGET_CODE): Likewise.
>> 	* config/arm/arm-builtins.c (IN_TARGET_CODE): Likewise.
>> 	* config/arm/arm-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/arm/arm.c (IN_TARGET_CODE): Likewise.
>> 	* config/arm/driver-arm.c (IN_TARGET_CODE): Likewise.
>> 	* config/avr/avr-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/avr/avr-devices.c (IN_TARGET_CODE): Likewise.
>> 	* config/avr/avr-log.c (IN_TARGET_CODE): Likewise.
>> 	* config/avr/avr.c (IN_TARGET_CODE): Likewise.
>> 	* config/avr/driver-avr.c (IN_TARGET_CODE): Likewise.
>> 	* config/avr/gen-avr-mmcu-specs.c (IN_TARGET_CODE): Likewise.
>> 	* config/bfin/bfin.c (IN_TARGET_CODE): Likewise.
>> 	* config/c6x/c6x.c (IN_TARGET_CODE): Likewise.
>> 	* config/cr16/cr16.c (IN_TARGET_CODE): Likewise.
>> 	* config/cris/cris.c (IN_TARGET_CODE): Likewise.
>> 	* config/darwin.c (IN_TARGET_CODE): Likewise.
>> 	* config/epiphany/epiphany.c (IN_TARGET_CODE): Likewise.
>> 	* config/epiphany/mode-switch-use.c (IN_TARGET_CODE): Likewise.
>> 	* config/epiphany/resolve-sw-modes.c (IN_TARGET_CODE): Likewise.
>> 	* config/fr30/fr30.c (IN_TARGET_CODE): Likewise.
>> 	* config/frv/frv.c (IN_TARGET_CODE): Likewise.
>> 	* config/ft32/ft32.c (IN_TARGET_CODE): Likewise.
>> 	* config/h8300/h8300.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/djgpp.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/driver-i386.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/driver-mingw32.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/host-cygwin.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/host-i386-darwin.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/host-mingw32.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/i386-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/i386.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/intelmic-mkoffload.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/msformat-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/winnt-cxx.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/winnt-stubs.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/winnt.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/x86-tune-sched-atom.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/x86-tune-sched-bd.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/x86-tune-sched-core.c (IN_TARGET_CODE): Likewise.
>> 	* config/i386/x86-tune-sched.c (IN_TARGET_CODE): Likewise.
>> 	* config/ia64/ia64-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/ia64/ia64.c (IN_TARGET_CODE): Likewise.
>> 	* config/iq2000/iq2000.c (IN_TARGET_CODE): Likewise.
>> 	* config/lm32/lm32.c (IN_TARGET_CODE): Likewise.
>> 	* config/m32c/m32c-pragma.c (IN_TARGET_CODE): Likewise.
>> 	* config/m32c/m32c.c (IN_TARGET_CODE): Likewise.
>> 	* config/m32r/m32r.c (IN_TARGET_CODE): Likewise.
>> 	* config/m68k/m68k.c (IN_TARGET_CODE): Likewise.
>> 	* config/mcore/mcore.c (IN_TARGET_CODE): Likewise.
>> 	* config/microblaze/microblaze-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/microblaze/microblaze.c (IN_TARGET_CODE): Likewise.
>> 	* config/mips/driver-native.c (IN_TARGET_CODE): Likewise.
>> 	* config/mips/frame-header-opt.c (IN_TARGET_CODE): Likewise.
>> 	* config/mips/mips.c (IN_TARGET_CODE): Likewise.
>> 	* config/mmix/mmix.c (IN_TARGET_CODE): Likewise.
>> 	* config/mn10300/mn10300.c (IN_TARGET_CODE): Likewise.
>> 	* config/moxie/moxie.c (IN_TARGET_CODE): Likewise.
>> 	* config/msp430/driver-msp430.c (IN_TARGET_CODE): Likewise.
>> 	* config/msp430/msp430-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/msp430/msp430.c (IN_TARGET_CODE): Likewise.
>> 	* config/nds32/nds32-cost.c (IN_TARGET_CODE): Likewise.
>> 	* config/nds32/nds32-fp-as-gp.c (IN_TARGET_CODE): Likewise.
>> 	* config/nds32/nds32-intrinsic.c (IN_TARGET_CODE): Likewise.
>> 	* config/nds32/nds32-isr.c (IN_TARGET_CODE): Likewise.
>> 	* config/nds32/nds32-md-auxiliary.c (IN_TARGET_CODE): Likewise.
>> 	* config/nds32/nds32-memory-manipulation.c (IN_TARGET_CODE): Likewise.
>> 	* config/nds32/nds32-pipelines-auxiliary.c (IN_TARGET_CODE): Likewise.
>> 	* config/nds32/nds32-predicates.c (IN_TARGET_CODE): Likewise.
>> 	* config/nds32/nds32.c (IN_TARGET_CODE): Likewise.
>> 	* config/nios2/nios2.c (IN_TARGET_CODE): Likewise.
>> 	* config/nvptx/mkoffload.c (IN_TARGET_CODE): Likewise.
>> 	* config/nvptx/nvptx.c (IN_TARGET_CODE): Likewise.
>> 	* config/pa/pa.c (IN_TARGET_CODE): Likewise.
>> 	* config/pdp11/pdp11.c (IN_TARGET_CODE): Likewise.
>> 	* config/powerpcspe/driver-powerpcspe.c (IN_TARGET_CODE): Likewise.
>> 	* config/powerpcspe/host-darwin.c (IN_TARGET_CODE): Likewise.
>> 	* config/powerpcspe/host-ppc64-darwin.c (IN_TARGET_CODE): Likewise.
>> 	* config/powerpcspe/powerpcspe-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/powerpcspe/powerpcspe-linux.c (IN_TARGET_CODE): Likewise.
>> 	* config/powerpcspe/powerpcspe.c (IN_TARGET_CODE): Likewise.
>> 	* config/riscv/riscv-builtins.c (IN_TARGET_CODE): Likewise.
>> 	* config/riscv/riscv-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/riscv/riscv.c (IN_TARGET_CODE): Likewise.
>> 	* config/rl78/rl78-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/rl78/rl78.c (IN_TARGET_CODE): Likewise.
>> 	* config/rs6000/driver-rs6000.c (IN_TARGET_CODE): Likewise.
>> 	* config/rs6000/host-darwin.c (IN_TARGET_CODE): Likewise.
>> 	* config/rs6000/host-ppc64-darwin.c (IN_TARGET_CODE): Likewise.
>> 	* config/rs6000/rs6000-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/rs6000/rs6000-linux.c (IN_TARGET_CODE): Likewise.
>> 	* config/rs6000/rs6000-p8swap.c (IN_TARGET_CODE): Likewise.
>> 	* config/rs6000/rs6000-string.c (IN_TARGET_CODE): Likewise.
>> 	* config/rs6000/rs6000.c (IN_TARGET_CODE): Likewise.
>> 	* config/rx/rx.c (IN_TARGET_CODE): Likewise.
>> 	* config/s390/driver-native.c (IN_TARGET_CODE): Likewise.
>> 	* config/s390/s390-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/s390/s390.c (IN_TARGET_CODE): Likewise.
>> 	* config/sh/sh-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/sh/sh-mem.cc (IN_TARGET_CODE): Likewise.
>> 	* config/sh/sh.c (IN_TARGET_CODE): Likewise.
>> 	* config/sh/sh_optimize_sett_clrt.cc (IN_TARGET_CODE): Likewise.
>> 	* config/sh/sh_treg_combine.cc (IN_TARGET_CODE): Likewise.
>> 	* config/sparc/driver-sparc.c (IN_TARGET_CODE): Likewise.
>> 	* config/sparc/sparc-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/sparc/sparc.c (IN_TARGET_CODE): Likewise.
>> 	* config/spu/spu-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/spu/spu.c (IN_TARGET_CODE): Likewise.
>> 	* config/stormy16/stormy16.c (IN_TARGET_CODE): Likewise.
>> 	* config/tilegx/mul-tables.c (IN_TARGET_CODE): Likewise.
>> 	* config/tilegx/tilegx-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/tilegx/tilegx.c (IN_TARGET_CODE): Likewise.
>> 	* config/tilepro/mul-tables.c (IN_TARGET_CODE): Likewise.
>> 	* config/tilepro/tilepro-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/tilepro/tilepro.c (IN_TARGET_CODE): Likewise.
>> 	* config/v850/v850-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/v850/v850.c (IN_TARGET_CODE): Likewise.
>> 	* config/vax/vax.c (IN_TARGET_CODE): Likewise.
>> 	* config/visium/visium.c (IN_TARGET_CODE): Likewise.
>> 	* config/vms/vms-c.c (IN_TARGET_CODE): Likewise.
>> 	* config/vms/vms-f.c (IN_TARGET_CODE): Likewise.
>> 	* config/vms/vms.c (IN_TARGET_CODE): Likewise.
>> 	* config/xtensa/xtensa.c (IN_TARGET_CODE): Likewise.
> ISTM this needs documenting somewhere.
>
> OK with a suitable doc patch.

OK.  I couldn't find anywhere that was an obvious fit, so in the end
I went for the "Anatomy of a target back end".  (The effect on
IN_TARGET_CODE on poly-int is already documented in poly-int.texi.)

Does this look OK?

Thanks,
Richard


2017-12-15  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/sourcebuild.texi: Document IN_TARGET_CODE.
	[...]

Index: gcc/doc/sourcebuild.texi
===================================================================
--- gcc/doc/sourcebuild.texi	2017-12-05 14:24:53.348999076 +0000
+++ gcc/doc/sourcebuild.texi	2017-12-15 01:04:55.748638203 +0000
@@ -822,6 +822,17 @@ manual needs to be installed as info for
 chapter of this manual.
 @end itemize
 
+GCC uses the macro @code{IN_TARGET_CODE} to distinguish between
+machine-specific @file{.c} and @file{.cc} files and
+machine-independent @file{.c} and @file{.cc} files.  Machine-specific
+files should use the directive:
+
+@example
+#define IN_TARGET_CODE 1
+@end example
+
+before including @code{config.h}.
+
 If the back end is added to the official GCC source repository, the
 following are also necessary:
 
[...]

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [005/nnn] poly_int: rtx constants
  2017-11-17  4:17   ` Jeff Law
@ 2017-12-15  1:25     ` Richard Sandiford
  2017-12-19  4:52       ` Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-12-15  1:25 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 11:00 AM, Richard Sandiford wrote:
>> This patch adds an rtl representation of poly_int values.
>> There were three possible ways of doing this:
>> 
>> (1) Add a new rtl code for the poly_ints themselves and store the
>>     coefficients as trailing wide_ints.  This would give constants like:
>> 
>>       (const_poly_int [c0 c1 ... cn])
>> 
>>     The runtime value would be:
>> 
>>       c0 + c1 * x1 + ... + cn * xn
>> 
>> (2) Like (1), but use rtxes for the coefficients.  This would give
>>     constants like:
>> 
>>       (const_poly_int [(const_int c0)
>>                        (const_int c1)
>>                        ...
>>                        (const_int cn)])
>> 
>>     although the coefficients could be const_wide_ints instead
>>     of const_ints where appropriate.
>> 
>> (3) Add a new rtl code for the polynomial indeterminates,
>>     then use them in const wrappers.  A constant like c0 + c1 * x1
>>     would then look like:
>> 
>>       (const:M (plus:M (mult:M (const_param:M x1)
>>                                (const_int c1))
>>                        (const_int c0)))
>> 
>> There didn't seem to be that much to choose between them.  The main
>> advantage of (1) is that it's a more efficient representation and
>> that we can refer to the cofficients directly as wide_int_storage.
> Well, and #1 feels more like how we handle CONST_INT :-)
>> 
>> 
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* doc/rtl.texi (const_poly_int): Document.
>> 	* gengenrtl.c (excluded_rtx): Return true for CONST_POLY_INT.
>> 	* rtl.h (const_poly_int_def): New struct.
>> 	(rtx_def::u): Add a cpi field.
>> 	(CASE_CONST_UNIQUE, CASE_CONST_ANY): Add CONST_POLY_INT.
>> 	(CONST_POLY_INT_P, CONST_POLY_INT_COEFFS): New macros.
>> 	(wi::rtx_to_poly_wide_ref): New typedef
>> 	(const_poly_int_value, wi::to_poly_wide, rtx_to_poly_int64)
>> 	(poly_int_rtx_p): New functions.
>> 	(trunc_int_for_mode): Declare a poly_int64 version.
>> 	(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
>> 	(immed_wide_int_const): Take a poly_wide_int_ref rather than
>> 	a wide_int_ref.
>> 	(strip_offset): Declare.
>> 	(strip_offset_and_add): New function.
>> 	* rtl.def (CONST_POLY_INT): New rtx code.
>> 	* rtl.c (rtx_size): Handle CONST_POLY_INT.
>> 	(shared_const_p): Use poly_int_rtx_p.
>> 	* emit-rtl.h (gen_int_mode): Take a poly_int64 instead of a
>> 	HOST_WIDE_INT.
>> 	(gen_int_shift_amount): Likewise.
>> 	* emit-rtl.c (const_poly_int_hasher): New class.
>> 	(const_poly_int_htab): New variable.
>> 	(init_emit_once): Initialize it when NUM_POLY_INT_COEFFS > 1.
>> 	(const_poly_int_hasher::hash): New function.
>> 	(const_poly_int_hasher::equal): Likewise.
>> 	(gen_int_mode): Take a poly_int64 instead of a HOST_WIDE_INT.
>> 	(immed_wide_int_const): Rename to...
>> 	(immed_wide_int_const_1): ...this and make static.
>> 	(immed_wide_int_const): New function, taking a poly_wide_int_ref
>> 	instead of a wide_int_ref.
>> 	(gen_int_shift_amount): Take a poly_int64 instead of a HOST_WIDE_INT.
>> 	(gen_lowpart_common): Handle CONST_POLY_INT.
>> 	* cse.c (hash_rtx_cb, equiv_constant): Likewise.
>> 	* cselib.c (cselib_hash_rtx): Likewise.
>> 	* dwarf2out.c (const_ok_for_output_1): Likewise.
>> 	* expr.c (convert_modes): Likewise.
>> 	* print-rtl.c (rtx_writer::print_rtx, print_value): Likewise.
>> 	* rtlhash.c (add_rtx): Likewise.
>> 	* explow.c (trunc_int_for_mode): Add a poly_int64 version.
>> 	(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
>> 	Handle existing CONST_POLY_INT rtxes.
>> 	* expmed.h (expand_shift): Take a poly_int64 instead of a
>> 	HOST_WIDE_INT.
>> 	* expmed.c (expand_shift): Likewise.
>> 	* rtlanal.c (strip_offset): New function.
>> 	(commutative_operand_precedence): Give CONST_POLY_INT the same
>> 	precedence as CONST_DOUBLE and put CONST_WIDE_INT between that
>> 	and CONST_INT.
>> 	* rtl-tests.c (const_poly_int_tests): New struct.
>> 	(rtl_tests_c_tests): Use it.
>> 	* simplify-rtx.c (simplify_const_unary_operation): Handle
>> 	CONST_POLY_INT.
>> 	(simplify_const_binary_operation): Likewise.
>> 	(simplify_binary_operation_1): Fold additions of symbolic constants
>> 	and CONST_POLY_INTs.
>> 	(simplify_subreg): Handle extensions and truncations of
>> 	CONST_POLY_INTs.
>> 	(simplify_const_poly_int_tests): New struct.
>> 	(simplify_rtx_c_tests): Use it.
>> 	* wide-int.h (storage_ref): Add default constructor.
>> 	(wide_int_ref_storage): Likewise.
>> 	(trailing_wide_ints): Use GTY((user)).
>> 	(trailing_wide_ints::operator[]): Add a const version.
>> 	(trailing_wide_ints::get_precision): New function.
>> 	(trailing_wide_ints::extra_size): Likewise.
> Do we need to define anything WRT structure sharing in rtl.texi for a
> CONST_POLY_INT?

Good catch.  Fixed in the patch below.

>> Index: gcc/rtl.c
>> ===================================================================
>> --- gcc/rtl.c	2017-10-23 16:52:20.579835373 +0100
>> +++ gcc/rtl.c	2017-10-23 17:00:54.443002147 +0100
>> @@ -257,9 +261,10 @@ shared_const_p (const_rtx orig)
>>  
>>    /* CONST can be shared if it contains a SYMBOL_REF.  If it contains
>>       a LABEL_REF, it isn't sharable.  */
>> +  poly_int64 offset;
>>    return (GET_CODE (XEXP (orig, 0)) == PLUS
>>  	  && GET_CODE (XEXP (XEXP (orig, 0), 0)) == SYMBOL_REF
>> -	  && CONST_INT_P (XEXP (XEXP (orig, 0), 1)));
>> +	  && poly_int_rtx_p (XEXP (XEXP (orig, 0), 1), &offset));
> Did this just change structure sharing for CONST_WIDE_INT?

No, we'd only use CONST_WIDE_INT for things that don't fit in
poly_int64.

>> +  /* Create a new rtx.  There's a choice to be made here between installing
>> +     the actual mode of the rtx or leaving it as VOIDmode (for consistency
>> +     with CONST_INT).  In practice the handling of the codes is different
>> +     enough that we get no benefit from using VOIDmode, and various places
>> +     assume that VOIDmode implies CONST_INT.  Using the real mode seems like
>> +     the right long-term direction anyway.  */
> Certainly my preference is to get the mode in there.  I see modeless
> CONST_INTs as a long standing wart and I'm not keen to repeat it.

Yeah.  Still regularly hit problems related to modeless CONST_INTs
today (including the gen_int_shift_amount patch).

>> Index: gcc/wide-int.h
>> ===================================================================
>> --- gcc/wide-int.h	2017-10-23 17:00:20.923835582 +0100
>> +++ gcc/wide-int.h	2017-10-23 17:00:54.445999420 +0100
>> @@ -613,6 +613,7 @@ #define SHIFT_FUNCTION \
>>       access.  */
>>    struct storage_ref
>>    {
>> +    storage_ref () {}
>>      storage_ref (const HOST_WIDE_INT *, unsigned int, unsigned int);
>>  
>>      const HOST_WIDE_INT *val;
>> @@ -944,6 +945,8 @@ struct wide_int_ref_storage : public wi:
>>    HOST_WIDE_INT scratch[2];
>>  
>>  public:
>> +  wide_int_ref_storage () {}
>> +
>>    wide_int_ref_storage (const wi::storage_ref &);
>>  
>>    template <typename T>
> So doesn't this play into the whole question about initialization of
> these objects.  So I'll defer on this hunk until we settle that
> question, but the rest is OK.

Any more thoughts on this?  In the end the 001 patch went in with
the empty constructors.  Like I say, I'm happy to switch to C++-11
"= default;" once we require C++11, but I think having well-defined
implicit construction would make switching to "= default" harder
in future.

Thanks,
Richard


2017-11-15  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/rtl.texi (const_poly_int): Document.  Also document the
	rtl sharing behavior.
	* gengenrtl.c (excluded_rtx): Return true for CONST_POLY_INT.
	* rtl.h (const_poly_int_def): New struct.
	(rtx_def::u): Add a cpi field.
	(CASE_CONST_UNIQUE, CASE_CONST_ANY): Add CONST_POLY_INT.
	(CONST_POLY_INT_P, CONST_POLY_INT_COEFFS): New macros.
	(wi::rtx_to_poly_wide_ref): New typedef
	(const_poly_int_value, wi::to_poly_wide, rtx_to_poly_int64)
	(poly_int_rtx_p): New functions.
	(trunc_int_for_mode): Declare a poly_int64 version.
	(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
	(immed_wide_int_const): Take a poly_wide_int_ref rather than
	a wide_int_ref.
	(strip_offset): Declare.
	(strip_offset_and_add): New function.
	* rtl.def (CONST_POLY_INT): New rtx code.
	* rtl.c (rtx_size): Handle CONST_POLY_INT.
	(shared_const_p): Use poly_int_rtx_p.
	* emit-rtl.h (gen_int_mode): Take a poly_int64 instead of a
	HOST_WIDE_INT.
	(gen_int_shift_amount): Likewise.
	* emit-rtl.c (const_poly_int_hasher): New class.
	(const_poly_int_htab): New variable.
	(init_emit_once): Initialize it when NUM_POLY_INT_COEFFS > 1.
	(const_poly_int_hasher::hash): New function.
	(const_poly_int_hasher::equal): Likewise.
	(gen_int_mode): Take a poly_int64 instead of a HOST_WIDE_INT.
	(immed_wide_int_const): Rename to...
	(immed_wide_int_const_1): ...this and make static.
	(immed_wide_int_const): New function, taking a poly_wide_int_ref
	instead of a wide_int_ref.
	(gen_int_shift_amount): Take a poly_int64 instead of a HOST_WIDE_INT.
	(gen_lowpart_common): Handle CONST_POLY_INT.
	* cse.c (hash_rtx_cb, equiv_constant): Likewise.
	* cselib.c (cselib_hash_rtx): Likewise.
	* dwarf2out.c (const_ok_for_output_1): Likewise.
	* expr.c (convert_modes): Likewise.
	* print-rtl.c (rtx_writer::print_rtx, print_value): Likewise.
	* rtlhash.c (add_rtx): Likewise.
	* explow.c (trunc_int_for_mode): Add a poly_int64 version.
	(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
	Handle existing CONST_POLY_INT rtxes.
	* expmed.h (expand_shift): Take a poly_int64 instead of a
	HOST_WIDE_INT.
	* expmed.c (expand_shift): Likewise.
	* rtlanal.c (strip_offset): New function.
	(commutative_operand_precedence): Give CONST_POLY_INT the same
	precedence as CONST_DOUBLE and put CONST_WIDE_INT between that
	and CONST_INT.
	* rtl-tests.c (const_poly_int_tests): New struct.
	(rtl_tests_c_tests): Use it.
	* simplify-rtx.c (simplify_const_unary_operation): Handle
	CONST_POLY_INT.
	(simplify_const_binary_operation): Likewise.
	(simplify_binary_operation_1): Fold additions of symbolic constants
	and CONST_POLY_INTs.
	(simplify_subreg): Handle extensions and truncations of
	CONST_POLY_INTs.
	(simplify_const_poly_int_tests): New struct.
	(simplify_rtx_c_tests): Use it.
	* wide-int.h (storage_ref): Add default constructor.
	(wide_int_ref_storage): Likewise.
	(trailing_wide_ints): Use GTY((user)).
	(trailing_wide_ints::operator[]): Add a const version.
	(trailing_wide_ints::get_precision): New function.
	(trailing_wide_ints::extra_size): Likewise.

Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi	2017-12-15 01:16:50.894351263 +0000
+++ gcc/doc/rtl.texi	2017-12-15 01:16:51.235339239 +0000
@@ -1633,6 +1633,15 @@ is accessed with the macro @code{CONST_F
 data is accessed with @code{CONST_FIXED_VALUE_HIGH}; the low part is
 accessed with @code{CONST_FIXED_VALUE_LOW}.
 
+@findex const_poly_int
+@item (const_poly_int:@var{m} [@var{c0} @var{c1} @dots{}])
+Represents a @code{poly_int}-style polynomial integer with coefficients
+@var{c0}, @var{c1}, @dots{}.  The coefficients are @code{wide_int}-based
+integers rather than rtxes.  @code{CONST_POLY_INT_COEFFS} gives the
+values of individual coefficients (which is mostly only useful in
+low-level routines) and @code{const_poly_int_value} gives the full
+@code{poly_int} value.
+
 @findex const_vector
 @item (const_vector:@var{m} [@var{x0} @var{x1} @dots{}])
 Represents a vector constant.  The square brackets stand for the vector
@@ -4236,6 +4245,11 @@ referring to it.
 @item
 All @code{const_int} expressions with equal values are shared.
 
+@cindex @code{const_poly_int}, RTL sharing
+@item
+All @code{const_poly_int} expressions with equal modes and values
+are shared.
+
 @cindex @code{pc}, RTL sharing
 @item
 There is only one @code{pc} expression.
Index: gcc/gengenrtl.c
===================================================================
--- gcc/gengenrtl.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/gengenrtl.c	2017-12-15 01:16:51.240339063 +0000
@@ -157,6 +157,7 @@ excluded_rtx (int idx)
   return (strcmp (defs[idx].enumname, "VAR_LOCATION") == 0
 	  || strcmp (defs[idx].enumname, "CONST_DOUBLE") == 0
 	  || strcmp (defs[idx].enumname, "CONST_WIDE_INT") == 0
+	  || strcmp (defs[idx].enumname, "CONST_POLY_INT") == 0
 	  || strcmp (defs[idx].enumname, "CONST_FIXED") == 0);
 }
 
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-12-15 01:16:50.894351263 +0000
+++ gcc/rtl.h	2017-12-15 01:16:51.241339028 +0000
@@ -280,6 +280,10 @@ #define CWI_GET_NUM_ELEM(RTX)					\
 #define CWI_PUT_NUM_ELEM(RTX, NUM)					\
   (RTL_FLAG_CHECK1("CWI_PUT_NUM_ELEM", (RTX), CONST_WIDE_INT)->u2.num_elem = (NUM))
 
+struct GTY((variable_size)) const_poly_int_def {
+  trailing_wide_ints<NUM_POLY_INT_COEFFS> coeffs;
+};
+
 /* RTL expression ("rtx").  */
 
 /* The GTY "desc" and "tag" options below are a kludge: we need a desc
@@ -424,6 +428,7 @@ struct GTY((desc("0"), tag("0"),
     struct real_value rv;
     struct fixed_value fv;
     struct hwivec_def hwiv;
+    struct const_poly_int_def cpi;
   } GTY ((special ("rtx_def"), desc ("GET_CODE (&%0)"))) u;
 };
 
@@ -734,6 +739,7 @@ #define CASE_CONST_SCALAR_INT \
 #define CASE_CONST_UNIQUE \
    case CONST_INT: \
    case CONST_WIDE_INT: \
+   case CONST_POLY_INT: \
    case CONST_DOUBLE: \
    case CONST_FIXED
 
@@ -741,6 +747,7 @@ #define CASE_CONST_UNIQUE \
 #define CASE_CONST_ANY \
    case CONST_INT: \
    case CONST_WIDE_INT: \
+   case CONST_POLY_INT: \
    case CONST_DOUBLE: \
    case CONST_FIXED: \
    case CONST_VECTOR
@@ -773,6 +780,11 @@ #define CONST_INT_P(X) (GET_CODE (X) ==
 /* Predicate yielding nonzero iff X is an rtx for a constant integer.  */
 #define CONST_WIDE_INT_P(X) (GET_CODE (X) == CONST_WIDE_INT)
 
+/* Predicate yielding nonzero iff X is an rtx for a polynomial constant
+   integer.  */
+#define CONST_POLY_INT_P(X) \
+  (NUM_POLY_INT_COEFFS > 1 && GET_CODE (X) == CONST_POLY_INT)
+
 /* Predicate yielding nonzero iff X is an rtx for a constant fixed-point.  */
 #define CONST_FIXED_P(X) (GET_CODE (X) == CONST_FIXED)
 
@@ -1914,6 +1926,12 @@ #define CONST_WIDE_INT_VEC(RTX) HWIVEC_C
 #define CONST_WIDE_INT_NUNITS(RTX) CWI_GET_NUM_ELEM (RTX)
 #define CONST_WIDE_INT_ELT(RTX, N) CWI_ELT (RTX, N)
 
+/* For a CONST_POLY_INT, CONST_POLY_INT_COEFFS gives access to the
+   individual coefficients, in the form of a trailing_wide_ints structure.  */
+#define CONST_POLY_INT_COEFFS(RTX) \
+  (RTL_FLAG_CHECK1("CONST_POLY_INT_COEFFS", (RTX), \
+		   CONST_POLY_INT)->u.cpi.coeffs)
+
 /* For a CONST_DOUBLE:
 #if TARGET_SUPPORTS_WIDE_INT == 0
    For a VOIDmode, there are two integers CONST_DOUBLE_LOW is the
@@ -2227,6 +2245,84 @@ wi::max_value (machine_mode mode, signop
   return max_value (GET_MODE_PRECISION (as_a <scalar_mode> (mode)), sgn);
 }
 
+namespace wi
+{
+  typedef poly_int<NUM_POLY_INT_COEFFS,
+		   generic_wide_int <wide_int_ref_storage <false, false> > >
+    rtx_to_poly_wide_ref;
+  rtx_to_poly_wide_ref to_poly_wide (const_rtx, machine_mode);
+}
+
+/* Return the value of a CONST_POLY_INT in its native precision.  */
+
+inline wi::rtx_to_poly_wide_ref
+const_poly_int_value (const_rtx x)
+{
+  poly_int<NUM_POLY_INT_COEFFS, WIDE_INT_REF_FOR (wide_int)> res;
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    res.coeffs[i] = CONST_POLY_INT_COEFFS (x)[i];
+  return res;
+}
+
+/* Return true if X is a scalar integer or a CONST_POLY_INT.  The value
+   can then be extracted using wi::to_poly_wide.  */
+
+inline bool
+poly_int_rtx_p (const_rtx x)
+{
+  return CONST_SCALAR_INT_P (x) || CONST_POLY_INT_P (x);
+}
+
+/* Access X (which satisfies poly_int_rtx_p) as a poly_wide_int.
+   MODE is the mode of X.  */
+
+inline wi::rtx_to_poly_wide_ref
+wi::to_poly_wide (const_rtx x, machine_mode mode)
+{
+  if (CONST_POLY_INT_P (x))
+    return const_poly_int_value (x);
+  return rtx_mode_t (const_cast<rtx> (x), mode);
+}
+
+/* Return the value of X as a poly_int64.  */
+
+inline poly_int64
+rtx_to_poly_int64 (const_rtx x)
+{
+  if (CONST_POLY_INT_P (x))
+    {
+      poly_int64 res;
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	res.coeffs[i] = CONST_POLY_INT_COEFFS (x)[i].to_shwi ();
+      return res;
+    }
+  return INTVAL (x);
+}
+
+/* Return true if arbitrary value X is an integer constant that can
+   be represented as a poly_int64.  Store the value in *RES if so,
+   otherwise leave it unmodified.  */
+
+inline bool
+poly_int_rtx_p (const_rtx x, poly_int64_pod *res)
+{
+  if (CONST_INT_P (x))
+    {
+      *res = INTVAL (x);
+      return true;
+    }
+  if (CONST_POLY_INT_P (x))
+    {
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	if (!wi::fits_shwi_p (CONST_POLY_INT_COEFFS (x)[i]))
+	  return false;
+      for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	res->coeffs[i] = CONST_POLY_INT_COEFFS (x)[i].to_shwi ();
+      return true;
+    }
+  return false;
+}
+
 extern void init_rtlanal (void);
 extern int rtx_cost (rtx, machine_mode, enum rtx_code, int, bool);
 extern int address_cost (rtx, machine_mode, addr_space_t, bool);
@@ -2764,7 +2860,8 @@ #define EXTRACT_ARGS_IN_RANGE(SIZE, POS,
 
 /* In explow.c */
 extern HOST_WIDE_INT trunc_int_for_mode	(HOST_WIDE_INT, machine_mode);
-extern rtx plus_constant (machine_mode, rtx, HOST_WIDE_INT, bool = false);
+extern poly_int64 trunc_int_for_mode (poly_int64, machine_mode);
+extern rtx plus_constant (machine_mode, rtx, poly_int64, bool = false);
 extern HOST_WIDE_INT get_stack_check_protect (void);
 
 /* In rtl.c */
@@ -3075,13 +3172,11 @@ extern void end_sequence (void);
 extern double_int rtx_to_double_int (const_rtx);
 #endif
 extern void cwi_output_hex (FILE *, const_rtx);
-#ifndef GENERATOR_FILE
-extern rtx immed_wide_int_const (const wide_int_ref &, machine_mode);
-#endif
 #if TARGET_SUPPORTS_WIDE_INT == 0
 extern rtx immed_double_const (HOST_WIDE_INT, HOST_WIDE_INT,
 			       machine_mode);
 #endif
+extern rtx immed_wide_int_const (const poly_wide_int_ref &, machine_mode);
 
 /* In varasm.c  */
 extern rtx force_const_mem (machine_mode, rtx);
@@ -3269,6 +3364,7 @@ extern HOST_WIDE_INT get_integer_term (c
 extern rtx get_related_value (const_rtx);
 extern bool offset_within_block_p (const_rtx, HOST_WIDE_INT);
 extern void split_const (rtx, rtx *, rtx *);
+extern rtx strip_offset (rtx, poly_int64_pod *);
 extern bool unsigned_reg_p (rtx);
 extern int reg_mentioned_p (const_rtx, const_rtx);
 extern int count_occurrences (const_rtx, const_rtx, int);
@@ -4203,6 +4299,21 @@ load_extend_op (machine_mode mode)
   return UNKNOWN;
 }
 
+/* If X is a PLUS of a base and a constant offset, add the constant to *OFFSET
+   and return the base.  Return X otherwise.  */
+
+inline rtx
+strip_offset_and_add (rtx x, poly_int64_pod *offset)
+{
+  if (GET_CODE (x) == PLUS)
+    {
+      poly_int64 suboffset;
+      x = strip_offset (x, &suboffset);
+      *offset += suboffset;
+    }
+  return x;
+}
+
 /* gtype-desc.c.  */
 extern void gt_ggc_mx (rtx &);
 extern void gt_pch_nx (rtx &);
Index: gcc/rtl.def
===================================================================
--- gcc/rtl.def	2017-12-15 01:16:50.894351263 +0000
+++ gcc/rtl.def	2017-12-15 01:16:51.240339063 +0000
@@ -348,6 +348,9 @@ DEF_RTL_EXPR(CONST_INT, "const_int", "w"
 /* numeric integer constant */
 DEF_RTL_EXPR(CONST_WIDE_INT, "const_wide_int", "", RTX_CONST_OBJ)
 
+/* An rtx representation of a poly_wide_int.  */
+DEF_RTL_EXPR(CONST_POLY_INT, "const_poly_int", "", RTX_CONST_OBJ)
+
 /* fixed-point constant */
 DEF_RTL_EXPR(CONST_FIXED, "const_fixed", "www", RTX_CONST_OBJ)
 
Index: gcc/rtl.c
===================================================================
--- gcc/rtl.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/rtl.c	2017-12-15 01:16:51.240339063 +0000
@@ -189,6 +189,10 @@ rtx_size (const_rtx x)
 	    + sizeof (struct hwivec_def)
 	    + ((CONST_WIDE_INT_NUNITS (x) - 1)
 	       * sizeof (HOST_WIDE_INT)));
+  if (CONST_POLY_INT_P (x))
+    return (RTX_HDR_SIZE
+	    + sizeof (struct const_poly_int_def)
+	    + CONST_POLY_INT_COEFFS (x).extra_size ());
   if (GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_HAS_BLOCK_INFO_P (x))
     return RTX_HDR_SIZE + sizeof (struct block_symbol);
   return RTX_CODE_SIZE (GET_CODE (x));
@@ -257,9 +261,10 @@ shared_const_p (const_rtx orig)
 
   /* CONST can be shared if it contains a SYMBOL_REF.  If it contains
      a LABEL_REF, it isn't sharable.  */
+  poly_int64 offset;
   return (GET_CODE (XEXP (orig, 0)) == PLUS
 	  && GET_CODE (XEXP (XEXP (orig, 0), 0)) == SYMBOL_REF
-	  && CONST_INT_P (XEXP (XEXP (orig, 0), 1)));
+	  && poly_int_rtx_p (XEXP (XEXP (orig, 0), 1), &offset));
 }
 
 
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-12-15 01:16:50.894351263 +0000
+++ gcc/emit-rtl.h	2017-12-15 01:16:51.238339134 +0000
@@ -362,14 +362,14 @@ extern rtvec gen_rtvec (int, ...);
 extern rtx copy_insn_1 (rtx);
 extern rtx copy_insn (rtx);
 extern rtx_insn *copy_delay_slot_insn (rtx_insn *);
-extern rtx gen_int_mode (HOST_WIDE_INT, machine_mode);
+extern rtx gen_int_mode (poly_int64, machine_mode);
 extern rtx_insn *emit_copy_of_insn_after (rtx_insn *, rtx_insn *);
 extern void set_reg_attrs_from_value (rtx, rtx);
 extern void set_reg_attrs_for_parm (rtx, rtx);
 extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
 extern void adjust_reg_mode (rtx, machine_mode);
 extern int mem_expr_equal_p (const_tree, const_tree);
-extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
+extern rtx gen_int_shift_amount (machine_mode, poly_int64);
 
 extern bool need_atomic_barrier_p (enum memmodel, bool);
 
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/emit-rtl.c	2017-12-15 01:16:51.238339134 +0000
@@ -148,6 +148,16 @@ struct const_wide_int_hasher : ggc_cache
 
 static GTY ((cache)) hash_table<const_wide_int_hasher> *const_wide_int_htab;
 
+struct const_poly_int_hasher : ggc_cache_ptr_hash<rtx_def>
+{
+  typedef std::pair<machine_mode, poly_wide_int_ref> compare_type;
+
+  static hashval_t hash (rtx x);
+  static bool equal (rtx x, const compare_type &y);
+};
+
+static GTY ((cache)) hash_table<const_poly_int_hasher> *const_poly_int_htab;
+
 /* A hash table storing register attribute structures.  */
 struct reg_attr_hasher : ggc_cache_ptr_hash<reg_attrs>
 {
@@ -257,6 +267,31 @@ const_wide_int_hasher::equal (rtx x, rtx
 }
 #endif
 
+/* Returns a hash code for CONST_POLY_INT X.  */
+
+hashval_t
+const_poly_int_hasher::hash (rtx x)
+{
+  inchash::hash h;
+  h.add_int (GET_MODE (x));
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    h.add_wide_int (CONST_POLY_INT_COEFFS (x)[i]);
+  return h.end ();
+}
+
+/* Returns nonzero if CONST_POLY_INT X is an rtx representation of Y.  */
+
+bool
+const_poly_int_hasher::equal (rtx x, const compare_type &y)
+{
+  if (GET_MODE (x) != y.first)
+    return false;
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    if (CONST_POLY_INT_COEFFS (x)[i] != y.second.coeffs[i])
+      return false;
+  return true;
+}
+
 /* Returns a hash code for X (which is really a CONST_DOUBLE).  */
 hashval_t
 const_double_hasher::hash (rtx x)
@@ -520,9 +555,13 @@ gen_rtx_CONST_INT (machine_mode mode ATT
 }
 
 rtx
-gen_int_mode (HOST_WIDE_INT c, machine_mode mode)
+gen_int_mode (poly_int64 c, machine_mode mode)
 {
-  return GEN_INT (trunc_int_for_mode (c, mode));
+  c = trunc_int_for_mode (c, mode);
+  if (c.is_constant ())
+    return GEN_INT (c.coeffs[0]);
+  unsigned int prec = GET_MODE_PRECISION (as_a <scalar_mode> (mode));
+  return immed_wide_int_const (poly_wide_int::from (c, prec, SIGNED), mode);
 }
 
 /* CONST_DOUBLEs might be created from pairs of integers, or from
@@ -626,8 +665,8 @@ lookup_const_wide_int (rtx wint)
    a CONST_DOUBLE (if !TARGET_SUPPORTS_WIDE_INT) or a CONST_WIDE_INT
    (if TARGET_SUPPORTS_WIDE_INT).  */
 
-rtx
-immed_wide_int_const (const wide_int_ref &v, machine_mode mode)
+static rtx
+immed_wide_int_const_1 (const wide_int_ref &v, machine_mode mode)
 {
   unsigned int len = v.get_len ();
   /* Not scalar_int_mode because we also allow pointer bound modes.  */
@@ -714,6 +753,53 @@ immed_double_const (HOST_WIDE_INT i0, HO
 }
 #endif
 
+/* Return an rtx representation of C in mode MODE.  */
+
+rtx
+immed_wide_int_const (const poly_wide_int_ref &c, machine_mode mode)
+{
+  if (c.is_constant ())
+    return immed_wide_int_const_1 (c.coeffs[0], mode);
+
+  /* Not scalar_int_mode because we also allow pointer bound modes.  */
+  unsigned int prec = GET_MODE_PRECISION (as_a <scalar_mode> (mode));
+
+  /* Allow truncation but not extension since we do not know if the
+     number is signed or unsigned.  */
+  gcc_assert (prec <= c.coeffs[0].get_precision ());
+  poly_wide_int newc = poly_wide_int::from (c, prec, SIGNED);
+
+  /* See whether we already have an rtx for this constant.  */
+  inchash::hash h;
+  h.add_int (mode);
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    h.add_wide_int (newc.coeffs[i]);
+  const_poly_int_hasher::compare_type typed_value (mode, newc);
+  rtx *slot = const_poly_int_htab->find_slot_with_hash (typed_value,
+							h.end (), INSERT);
+  rtx x = *slot;
+  if (x)
+    return x;
+
+  /* Create a new rtx.  There's a choice to be made here between installing
+     the actual mode of the rtx or leaving it as VOIDmode (for consistency
+     with CONST_INT).  In practice the handling of the codes is different
+     enough that we get no benefit from using VOIDmode, and various places
+     assume that VOIDmode implies CONST_INT.  Using the real mode seems like
+     the right long-term direction anyway.  */
+  typedef trailing_wide_ints<NUM_POLY_INT_COEFFS> twi;
+  size_t extra_size = twi::extra_size (prec);
+  x = rtx_alloc_v (CONST_POLY_INT,
+		   sizeof (struct const_poly_int_def) + extra_size);
+  PUT_MODE (x, mode);
+  CONST_POLY_INT_COEFFS (x).set_precision (prec);
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    CONST_POLY_INT_COEFFS (x)[i] = newc.coeffs[i];
+
+  *slot = x;
+  return x;
+}
+
 rtx
 gen_rtx_REG (machine_mode mode, unsigned int regno)
 {
@@ -1517,7 +1603,8 @@ gen_lowpart_common (machine_mode mode, r
     }
   else if (GET_CODE (x) == SUBREG || REG_P (x)
 	   || GET_CODE (x) == CONCAT || const_vec_p (x)
-	   || CONST_DOUBLE_AS_FLOAT_P (x) || CONST_SCALAR_INT_P (x))
+	   || CONST_DOUBLE_AS_FLOAT_P (x) || CONST_SCALAR_INT_P (x)
+	   || CONST_POLY_INT_P (x))
     return lowpart_subreg (mode, x, innermode);
 
   /* Otherwise, we can't do this.  */
@@ -6124,6 +6211,9 @@ init_emit_once (void)
 #endif
   const_double_htab = hash_table<const_double_hasher>::create_ggc (37);
 
+  if (NUM_POLY_INT_COEFFS > 1)
+    const_poly_int_htab = hash_table<const_poly_int_hasher>::create_ggc (37);
+
   const_fixed_htab = hash_table<const_fixed_hasher>::create_ggc (37);
 
   reg_attrs_htab = hash_table<reg_attr_hasher>::create_ggc (37);
@@ -6517,7 +6607,7 @@ need_atomic_barrier_p (enum memmodel mod
    by VALUE bits.  */
 
 rtx
-gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
+gen_int_shift_amount (machine_mode mode, poly_int64 value)
 {
   /* ??? Using the inner mode should be wide enough for all useful
      cases (e.g. QImode usually has 8 shiftable bits, while a QImode
Index: gcc/cse.c
===================================================================
--- gcc/cse.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/cse.c	2017-12-15 01:16:51.234339275 +0000
@@ -2323,6 +2323,15 @@ hash_rtx_cb (const_rtx x, machine_mode m
 	hash += CONST_WIDE_INT_ELT (x, i);
       return hash;
 
+    case CONST_POLY_INT:
+      {
+	inchash::hash h;
+	h.add_int (hash);
+	for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	  h.add_wide_int (CONST_POLY_INT_COEFFS (x)[i]);
+	return h.end ();
+      }
+
     case CONST_DOUBLE:
       /* This is like the general case, except that it only counts
 	 the integers representing the constant.  */
@@ -3781,6 +3790,8 @@ equiv_constant (rtx x)
       /* See if we previously assigned a constant value to this SUBREG.  */
       if ((new_rtx = lookup_as_function (x, CONST_INT)) != 0
 	  || (new_rtx = lookup_as_function (x, CONST_WIDE_INT)) != 0
+	  || (NUM_POLY_INT_COEFFS > 1
+	      && (new_rtx = lookup_as_function (x, CONST_POLY_INT)) != 0)
           || (new_rtx = lookup_as_function (x, CONST_DOUBLE)) != 0
           || (new_rtx = lookup_as_function (x, CONST_FIXED)) != 0)
         return new_rtx;
Index: gcc/cselib.c
===================================================================
--- gcc/cselib.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/cselib.c	2017-12-15 01:16:51.234339275 +0000
@@ -1128,6 +1128,15 @@ cselib_hash_rtx (rtx x, int create, mach
 	hash += CONST_WIDE_INT_ELT (x, i);
       return hash;
 
+    case CONST_POLY_INT:
+      {
+	inchash::hash h;
+	h.add_int (hash);
+	for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	  h.add_wide_int (CONST_POLY_INT_COEFFS (x)[i]);
+	return h.end ();
+      }
+
     case CONST_DOUBLE:
       /* This is like the general case, except that it only counts
 	 the integers representing the constant.  */
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/dwarf2out.c	2017-12-15 01:16:51.237339169 +0000
@@ -13781,6 +13781,16 @@ const_ok_for_output_1 (rtx rtl)
       return false;
     }
 
+  if (CONST_POLY_INT_P (rtl))
+    return false;
+
+  if (targetm.const_not_ok_for_debug_p (rtl))
+    {
+      expansion_failed (NULL_TREE, rtl,
+			"Expression rejected for debug by the backend.\n");
+      return false;
+    }
+
   /* FIXME: Refer to PR60655. It is possible for simplification
      of rtl expressions in var tracking to produce such expressions.
      We should really identify / validate expressions
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/expr.c	2017-12-15 01:16:51.239339098 +0000
@@ -692,6 +692,7 @@ convert_modes (machine_mode mode, machin
       && is_int_mode (oldmode, &int_oldmode)
       && GET_MODE_PRECISION (int_mode) <= GET_MODE_PRECISION (int_oldmode)
       && ((MEM_P (x) && !MEM_VOLATILE_P (x) && direct_load[(int) int_mode])
+	  || CONST_POLY_INT_P (x)
           || (REG_P (x)
               && (!HARD_REGISTER_P (x)
 		  || targetm.hard_regno_mode_ok (REGNO (x), int_mode))
Index: gcc/print-rtl.c
===================================================================
--- gcc/print-rtl.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/print-rtl.c	2017-12-15 01:16:51.240339063 +0000
@@ -908,6 +908,17 @@ rtx_writer::print_rtx (const_rtx in_rtx)
       fprintf (m_outfile, " ");
       cwi_output_hex (m_outfile, in_rtx);
       break;
+
+    case CONST_POLY_INT:
+      fprintf (m_outfile, " [");
+      print_dec (CONST_POLY_INT_COEFFS (in_rtx)[0], m_outfile, SIGNED);
+      for (unsigned int i = 1; i < NUM_POLY_INT_COEFFS; ++i)
+	{
+	  fprintf (m_outfile, ", ");
+	  print_dec (CONST_POLY_INT_COEFFS (in_rtx)[i], m_outfile, SIGNED);
+	}
+      fprintf (m_outfile, "]");
+      break;
 #endif
 
     case CODE_LABEL:
@@ -1595,6 +1606,17 @@ print_value (pretty_printer *pp, const_r
       }
       break;
 
+    case CONST_POLY_INT:
+      pp_left_bracket (pp);
+      pp_wide_int (pp, CONST_POLY_INT_COEFFS (x)[0], SIGNED);
+      for (unsigned int i = 1; i < NUM_POLY_INT_COEFFS; ++i)
+	{
+	  pp_string (pp, ", ");
+	  pp_wide_int (pp, CONST_POLY_INT_COEFFS (x)[i], SIGNED);
+	}
+      pp_right_bracket (pp);
+      break;
+
     case CONST_DOUBLE:
       if (FLOAT_MODE_P (GET_MODE (x)))
 	{
Index: gcc/rtlhash.c
===================================================================
--- gcc/rtlhash.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/rtlhash.c	2017-12-15 01:16:51.241339028 +0000
@@ -55,6 +55,10 @@ add_rtx (const_rtx x, hash &hstate)
       for (i = 0; i < CONST_WIDE_INT_NUNITS (x); i++)
 	hstate.add_object (CONST_WIDE_INT_ELT (x, i));
       return;
+    case CONST_POLY_INT:
+      for (i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+	hstate.add_wide_int (CONST_POLY_INT_COEFFS (x)[i]);
+      break;
     case SYMBOL_REF:
       if (XSTR (x, 0))
 	hstate.add (XSTR (x, 0), strlen (XSTR (x, 0)) + 1);
Index: gcc/explow.c
===================================================================
--- gcc/explow.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/explow.c	2017-12-15 01:16:51.238339134 +0000
@@ -77,13 +77,23 @@ trunc_int_for_mode (HOST_WIDE_INT c, mac
   return c;
 }
 
+/* Likewise for polynomial values, using the sign-extended representation
+   for each individual coefficient.  */
+
+poly_int64
+trunc_int_for_mode (poly_int64 x, machine_mode mode)
+{
+  for (unsigned int i = 0; i < NUM_POLY_INT_COEFFS; ++i)
+    x.coeffs[i] = trunc_int_for_mode (x.coeffs[i], mode);
+  return x;
+}
+
 /* Return an rtx for the sum of X and the integer C, given that X has
    mode MODE.  INPLACE is true if X can be modified inplace or false
    if it must be treated as immutable.  */
 
 rtx
-plus_constant (machine_mode mode, rtx x, HOST_WIDE_INT c,
-	       bool inplace)
+plus_constant (machine_mode mode, rtx x, poly_int64 c, bool inplace)
 {
   RTX_CODE code;
   rtx y;
@@ -92,7 +102,7 @@ plus_constant (machine_mode mode, rtx x,
 
   gcc_assert (GET_MODE (x) == VOIDmode || GET_MODE (x) == mode);
 
-  if (c == 0)
+  if (known_eq (c, 0))
     return x;
 
  restart:
@@ -180,10 +190,12 @@ plus_constant (machine_mode mode, rtx x,
       break;
 
     default:
+      if (CONST_POLY_INT_P (x))
+	return immed_wide_int_const (const_poly_int_value (x) + c, mode);
       break;
     }
 
-  if (c != 0)
+  if (maybe_ne (c, 0))
     x = gen_rtx_PLUS (mode, x, gen_int_mode (c, mode));
 
   if (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF)
Index: gcc/expmed.h
===================================================================
--- gcc/expmed.h	2017-12-15 01:16:50.894351263 +0000
+++ gcc/expmed.h	2017-12-15 01:16:51.239339098 +0000
@@ -712,8 +712,8 @@ extern unsigned HOST_WIDE_INT choose_mul
 #ifdef TREE_CODE
 extern rtx expand_variable_shift (enum tree_code, machine_mode,
 				  rtx, tree, rtx, int);
-extern rtx expand_shift (enum tree_code, machine_mode, rtx, int, rtx,
-			     int);
+extern rtx expand_shift (enum tree_code, machine_mode, rtx, poly_int64, rtx,
+			 int);
 extern rtx expand_divmod (int, enum tree_code, machine_mode, rtx, rtx,
 			  rtx, int);
 #endif
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/expmed.c	2017-12-15 01:16:51.239339098 +0000
@@ -2541,7 +2541,7 @@ expand_shift_1 (enum tree_code code, mac
 
 rtx
 expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
-	      int amount, rtx target, int unsignedp)
+	      poly_int64 amount, rtx target, int unsignedp)
 {
   return expand_shift_1 (code, mode, shifted,
 			 gen_int_shift_amount (mode, amount),
Index: gcc/rtlanal.c
===================================================================
--- gcc/rtlanal.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/rtlanal.c	2017-12-15 01:16:51.241339028 +0000
@@ -915,6 +915,28 @@ split_const (rtx x, rtx *base_out, rtx *
   *base_out = x;
   *offset_out = const0_rtx;
 }
+
+/* Express integer value X as some value Y plus a polynomial offset,
+   where Y is either const0_rtx, X or something within X (as opposed
+   to a new rtx).  Return the Y and store the offset in *OFFSET_OUT.  */
+
+rtx
+strip_offset (rtx x, poly_int64_pod *offset_out)
+{
+  rtx base = const0_rtx;
+  rtx test = x;
+  if (GET_CODE (test) == CONST)
+    test = XEXP (test, 0);
+  if (GET_CODE (test) == PLUS)
+    {
+      base = XEXP (test, 0);
+      test = XEXP (test, 1);
+    }
+  if (poly_int_rtx_p (test, offset_out))
+    return base;
+  *offset_out = 0;
+  return x;
+}
 \f
 /* Return the number of places FIND appears within X.  If COUNT_DEST is
    zero, we do not count occurrences inside the destination of a SET.  */
@@ -3406,13 +3428,15 @@ commutative_operand_precedence (rtx op)
 
   /* Constants always become the second operand.  Prefer "nice" constants.  */
   if (code == CONST_INT)
-    return -8;
+    return -10;
   if (code == CONST_WIDE_INT)
-    return -7;
+    return -9;
+  if (code == CONST_POLY_INT)
+    return -8;
   if (code == CONST_DOUBLE)
-    return -7;
+    return -8;
   if (code == CONST_FIXED)
-    return -7;
+    return -8;
   op = avoid_constant_pool_reference (op);
   code = GET_CODE (op);
 
@@ -3420,13 +3444,15 @@ commutative_operand_precedence (rtx op)
     {
     case RTX_CONST_OBJ:
       if (code == CONST_INT)
-        return -6;
+	return -7;
       if (code == CONST_WIDE_INT)
-        return -6;
+	return -6;
+      if (code == CONST_POLY_INT)
+	return -5;
       if (code == CONST_DOUBLE)
-        return -5;
+	return -5;
       if (code == CONST_FIXED)
-        return -5;
+	return -5;
       return -4;
 
     case RTX_EXTRA:
Index: gcc/rtl-tests.c
===================================================================
--- gcc/rtl-tests.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/rtl-tests.c	2017-12-15 01:16:51.240339063 +0000
@@ -228,6 +228,62 @@ test_uncond_jump ()
 		      jump_insn);
 }
 
+template<unsigned int N>
+struct const_poly_int_tests
+{
+  static void run ();
+};
+
+template<>
+struct const_poly_int_tests<1>
+{
+  static void run () {}
+};
+
+/* Test various CONST_POLY_INT properties.  */
+
+template<unsigned int N>
+void
+const_poly_int_tests<N>::run ()
+{
+  rtx x1 = gen_int_mode (poly_int64 (1, 1), QImode);
+  rtx x255 = gen_int_mode (poly_int64 (1, 255), QImode);
+
+  /* Test that constants are unique.  */
+  ASSERT_EQ (x1, gen_int_mode (poly_int64 (1, 1), QImode));
+  ASSERT_NE (x1, gen_int_mode (poly_int64 (1, 1), HImode));
+  ASSERT_NE (x1, x255);
+
+  /* Test const_poly_int_value.  */
+  ASSERT_KNOWN_EQ (const_poly_int_value (x1), poly_int64 (1, 1));
+  ASSERT_KNOWN_EQ (const_poly_int_value (x255), poly_int64 (1, -1));
+
+  /* Test rtx_to_poly_int64.  */
+  ASSERT_KNOWN_EQ (rtx_to_poly_int64 (x1), poly_int64 (1, 1));
+  ASSERT_KNOWN_EQ (rtx_to_poly_int64 (x255), poly_int64 (1, -1));
+  ASSERT_MAYBE_NE (rtx_to_poly_int64 (x255), poly_int64 (1, 255));
+
+  /* Test plus_constant of a symbol.  */
+  rtx symbol = gen_rtx_SYMBOL_REF (Pmode, "foo");
+  rtx offset1 = gen_int_mode (poly_int64 (9, 11), Pmode);
+  rtx sum1 = gen_rtx_CONST (Pmode, gen_rtx_PLUS (Pmode, symbol, offset1));
+  ASSERT_RTX_EQ (plus_constant (Pmode, symbol, poly_int64 (9, 11)), sum1);
+
+  /* Test plus_constant of a CONST.  */
+  rtx offset2 = gen_int_mode (poly_int64 (12, 20), Pmode);
+  rtx sum2 = gen_rtx_CONST (Pmode, gen_rtx_PLUS (Pmode, symbol, offset2));
+  ASSERT_RTX_EQ (plus_constant (Pmode, sum1, poly_int64 (3, 9)), sum2);
+
+  /* Test a cancelling plus_constant.  */
+  ASSERT_EQ (plus_constant (Pmode, sum2, poly_int64 (-12, -20)), symbol);
+
+  /* Test plus_constant on integer constants.  */
+  ASSERT_EQ (plus_constant (QImode, const1_rtx, poly_int64 (4, -2)),
+	     gen_int_mode (poly_int64 (5, -2), QImode));
+  ASSERT_EQ (plus_constant (QImode, x1, poly_int64 (4, -2)),
+	     gen_int_mode (poly_int64 (5, -1), QImode));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -238,6 +294,7 @@ rtl_tests_c_tests ()
   test_dumping_rtx_reuse ();
   test_single_set ();
   test_uncond_jump ();
+  const_poly_int_tests<NUM_POLY_INT_COEFFS>::run ();
 
   /* Purge state.  */
   set_first_insn (NULL);
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-12-15 01:16:50.894351263 +0000
+++ gcc/simplify-rtx.c	2017-12-15 01:16:51.242338992 +0000
@@ -2038,6 +2038,26 @@ simplify_const_unary_operation (enum rtx
 	}
     }
 
+  /* Handle polynomial integers.  */
+  else if (CONST_POLY_INT_P (op))
+    {
+      poly_wide_int result;
+      switch (code)
+	{
+	case NEG:
+	  result = -const_poly_int_value (op);
+	  break;
+
+	case NOT:
+	  result = ~const_poly_int_value (op);
+	  break;
+
+	default:
+	  return NULL_RTX;
+	}
+      return immed_wide_int_const (result, mode);
+    }
+
   return NULL_RTX;
 }
 \f
@@ -2218,6 +2238,7 @@ simplify_binary_operation_1 (enum rtx_co
   rtx tem, reversed, opleft, opright, elt0, elt1;
   HOST_WIDE_INT val;
   scalar_int_mode int_mode, inner_mode;
+  poly_int64 offset;
 
   /* Even if we can't compute a constant result,
      there are some cases worth simplifying.  */
@@ -2530,6 +2551,12 @@ simplify_binary_operation_1 (enum rtx_co
 	    return simplify_gen_binary (MINUS, mode, tem, XEXP (op0, 0));
 	}
 
+      if ((GET_CODE (op0) == CONST
+	   || GET_CODE (op0) == SYMBOL_REF
+	   || GET_CODE (op0) == LABEL_REF)
+	  && poly_int_rtx_p (op1, &offset))
+	return plus_constant (mode, op0, trunc_int_for_mode (-offset, mode));
+
       /* Don't let a relocatable value get a negative coeff.  */
       if (CONST_INT_P (op1) && GET_MODE (op0) != VOIDmode)
 	return simplify_gen_binary (PLUS, mode,
@@ -4327,6 +4354,57 @@ simplify_const_binary_operation (enum rt
       return immed_wide_int_const (result, int_mode);
     }
 
+  /* Handle polynomial integers.  */
+  if (NUM_POLY_INT_COEFFS > 1
+      && is_a <scalar_int_mode> (mode, &int_mode)
+      && poly_int_rtx_p (op0)
+      && poly_int_rtx_p (op1))
+    {
+      poly_wide_int result;
+      switch (code)
+	{
+	case PLUS:
+	  result = wi::to_poly_wide (op0, mode) + wi::to_poly_wide (op1, mode);
+	  break;
+
+	case MINUS:
+	  result = wi::to_poly_wide (op0, mode) - wi::to_poly_wide (op1, mode);
+	  break;
+
+	case MULT:
+	  if (CONST_SCALAR_INT_P (op1))
+	    result = wi::to_poly_wide (op0, mode) * rtx_mode_t (op1, mode);
+	  else
+	    return NULL_RTX;
+	  break;
+
+	case ASHIFT:
+	  if (CONST_SCALAR_INT_P (op1))
+	    {
+	      wide_int shift = rtx_mode_t (op1, mode);
+	      if (SHIFT_COUNT_TRUNCATED)
+		shift = wi::umod_trunc (shift, GET_MODE_PRECISION (int_mode));
+	      else if (wi::geu_p (shift, GET_MODE_PRECISION (int_mode)))
+		return NULL_RTX;
+	      result = wi::to_poly_wide (op0, mode) << shift;
+	    }
+	  else
+	    return NULL_RTX;
+	  break;
+
+	case IOR:
+	  if (!CONST_SCALAR_INT_P (op1)
+	      || !can_ior_p (wi::to_poly_wide (op0, mode),
+			     rtx_mode_t (op1, mode), &result))
+	    return NULL_RTX;
+	  break;
+
+	default:
+	  return NULL_RTX;
+	}
+      return immed_wide_int_const (result, int_mode);
+    }
+
   return NULL_RTX;
 }
 
@@ -6370,13 +6448,27 @@ simplify_subreg (machine_mode outermode,
   scalar_int_mode int_outermode, int_innermode;
   if (is_a <scalar_int_mode> (outermode, &int_outermode)
       && is_a <scalar_int_mode> (innermode, &int_innermode)
-      && (GET_MODE_PRECISION (int_outermode)
-	  < GET_MODE_PRECISION (int_innermode))
       && byte == subreg_lowpart_offset (int_outermode, int_innermode))
     {
-      rtx tem = simplify_truncation (int_outermode, op, int_innermode);
-      if (tem)
-	return tem;
+      /* Handle polynomial integers.  The upper bits of a paradoxical
+	 subreg are undefined, so this is safe regardless of whether
+	 we're truncating or extending.  */
+      if (CONST_POLY_INT_P (op))
+	{
+	  poly_wide_int val
+	    = poly_wide_int::from (const_poly_int_value (op),
+				   GET_MODE_PRECISION (int_outermode),
+				   SIGNED);
+	  return immed_wide_int_const (val, int_outermode);
+	}
+
+      if (GET_MODE_PRECISION (int_outermode)
+	  < GET_MODE_PRECISION (int_innermode))
+	{
+	  rtx tem = simplify_truncation (int_outermode, op, int_innermode);
+	  if (tem)
+	    return tem;
+	}
     }
 
   return NULL_RTX;
@@ -6685,12 +6777,60 @@ test_vector_ops ()
     }
 }
 
+template<unsigned int N>
+struct simplify_const_poly_int_tests
+{
+  static void run ();
+};
+
+template<>
+struct simplify_const_poly_int_tests<1>
+{
+  static void run () {}
+};
+
+/* Test various CONST_POLY_INT properties.  */
+
+template<unsigned int N>
+void
+simplify_const_poly_int_tests<N>::run ()
+{
+  rtx x1 = gen_int_mode (poly_int64 (1, 1), QImode);
+  rtx x2 = gen_int_mode (poly_int64 (-80, 127), QImode);
+  rtx x3 = gen_int_mode (poly_int64 (-79, -128), QImode);
+  rtx x4 = gen_int_mode (poly_int64 (5, 4), QImode);
+  rtx x5 = gen_int_mode (poly_int64 (30, 24), QImode);
+  rtx x6 = gen_int_mode (poly_int64 (20, 16), QImode);
+  rtx x7 = gen_int_mode (poly_int64 (7, 4), QImode);
+  rtx x8 = gen_int_mode (poly_int64 (30, 24), HImode);
+  rtx x9 = gen_int_mode (poly_int64 (-30, -24), HImode);
+  rtx x10 = gen_int_mode (poly_int64 (-31, -24), HImode);
+  rtx two = GEN_INT (2);
+  rtx six = GEN_INT (6);
+  HOST_WIDE_INT offset = subreg_lowpart_offset (QImode, HImode);
+
+  /* These tests only try limited operation combinations.  Fuller arithmetic
+     testing is done directly on poly_ints.  */
+  ASSERT_EQ (simplify_unary_operation (NEG, HImode, x8, HImode), x9);
+  ASSERT_EQ (simplify_unary_operation (NOT, HImode, x8, HImode), x10);
+  ASSERT_EQ (simplify_unary_operation (TRUNCATE, QImode, x8, HImode), x5);
+  ASSERT_EQ (simplify_binary_operation (PLUS, QImode, x1, x2), x3);
+  ASSERT_EQ (simplify_binary_operation (MINUS, QImode, x3, x1), x2);
+  ASSERT_EQ (simplify_binary_operation (MULT, QImode, x4, six), x5);
+  ASSERT_EQ (simplify_binary_operation (MULT, QImode, six, x4), x5);
+  ASSERT_EQ (simplify_binary_operation (ASHIFT, QImode, x4, two), x6);
+  ASSERT_EQ (simplify_binary_operation (IOR, QImode, x4, two), x7);
+  ASSERT_EQ (simplify_subreg (HImode, x5, QImode, 0), x8);
+  ASSERT_EQ (simplify_subreg (QImode, x8, HImode, offset), x5);
+}
+
 /* Run all of the selftests within this file.  */
 
 void
 simplify_rtx_c_tests ()
 {
   test_vector_ops ();
+  simplify_const_poly_int_tests<NUM_POLY_INT_COEFFS>::run ();
 }
 
 } // namespace selftest
Index: gcc/wide-int.h
===================================================================
--- gcc/wide-int.h	2017-12-15 01:16:50.894351263 +0000
+++ gcc/wide-int.h	2017-12-15 01:16:51.242338992 +0000
@@ -613,6 +613,7 @@ #define SHIFT_FUNCTION \
      access.  */
   struct storage_ref
   {
+    storage_ref () {}
     storage_ref (const HOST_WIDE_INT *, unsigned int, unsigned int);
 
     const HOST_WIDE_INT *val;
@@ -944,6 +945,8 @@ struct wide_int_ref_storage : public wi:
   HOST_WIDE_INT scratch[2];
 
 public:
+  wide_int_ref_storage () {}
+
   wide_int_ref_storage (const wi::storage_ref &);
 
   template <typename T>
@@ -1323,7 +1326,7 @@ typedef generic_wide_int <trailing_wide_
    bytes beyond the sizeof need to be allocated.  Use set_precision
    to initialize the structure.  */
 template <int N>
-class GTY(()) trailing_wide_ints
+class GTY((user)) trailing_wide_ints
 {
 private:
   /* The shared precision of each number.  */
@@ -1340,9 +1343,14 @@ class GTY(()) trailing_wide_ints
   HOST_WIDE_INT m_val[1];
 
 public:
+  typedef WIDE_INT_REF_FOR (trailing_wide_int_storage) const_reference;
+
   void set_precision (unsigned int);
+  unsigned int get_precision () const { return m_precision; }
   trailing_wide_int operator [] (unsigned int);
+  const_reference operator [] (unsigned int) const;
   static size_t extra_size (unsigned int);
+  size_t extra_size () const { return extra_size (m_precision); }
 };
 
 inline trailing_wide_int_storage::
@@ -1414,6 +1422,14 @@ trailing_wide_ints <N>::operator [] (uns
 				    &m_val[index * m_max_len]);
 }
 
+template <int N>
+inline typename trailing_wide_ints <N>::const_reference
+trailing_wide_ints <N>::operator [] (unsigned int index) const
+{
+  return wi::storage_ref (&m_val[index * m_max_len],
+			  m_len[index], m_precision);
+}
+
 /* Return how many extra bytes need to be added to the end of the structure
    in order to handle N wide_ints of precision PRECISION.  */
 template <int N>

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-12-07 22:48               ` Jeff Law
@ 2017-12-15  3:40                 ` Martin Sebor
  2017-12-15  9:08                   ` Richard Biener
  0 siblings, 1 reply; 302+ messages in thread
From: Martin Sebor @ 2017-12-15  3:40 UTC (permalink / raw)
  To: Jeff Law, Richard Biener, GCC Patches, richard.sandiford

On 12/07/2017 03:48 PM, Jeff Law wrote:
> On 12/07/2017 03:38 PM, Richard Sandiford wrote:
>
>>> So I think that's the final ack on this series.
>>
>> Thanks to both of you, really appreciate it!
> Sorry it took so long.
>
>>
>>> Richard S. can you confirm?  I fully expect the trunk has moved some
>>> and the patches will need adjustments -- consider adjustments which
>>> work in a manner similar to the patches to date pre-approved.
>>
>> Yeah, that's now all of the poly_int patches.  I still owe you replies
>> to some of them -- I'll get to that as soon as I can.
> NP.  I don't think any of the questions were all that significant.
> Those which were I think you already responded to.

I am disappointed that the no-op ctor issue hasn't been adequately
addressed.  No numbers were presented as to the difference it makes
to have the ctor do the expected thing (i.e., initialize the object).
In my view, the choice seems arbitrarily in favor of a hypothetical
performance improvement at -O0 without regard to the impact on
correctness.  We have recently seen the adverse effects of similar
choices in other areas: the hash table insertion[*] and the related
offset_int initialization.

Martin

[*] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82977

PS To be clear, the numbers I asked for were those showing
the difference between a no-op ctor and one that initializes
the object to some determinate state, whatever that is.  IIUC
the numbers in the following post show the aggregate slowdown
for many or most of the changes in the series, not just
the ctor.  If the numbers were significant, I suggested
a solution to explicitly request a non-op ctor to make
the default safe and eliminate the overhead where it mattered.

https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01028.html

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-12-15  3:40                 ` Martin Sebor
@ 2017-12-15  9:08                   ` Richard Biener
  2017-12-15 15:19                     ` Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Biener @ 2017-12-15  9:08 UTC (permalink / raw)
  To: Martin Sebor; +Cc: Jeff Law, GCC Patches, Richard Sandiford

On Fri, Dec 15, 2017 at 4:40 AM, Martin Sebor <msebor@gmail.com> wrote:
> On 12/07/2017 03:48 PM, Jeff Law wrote:
>>
>> On 12/07/2017 03:38 PM, Richard Sandiford wrote:
>>
>>>> So I think that's the final ack on this series.
>>>
>>>
>>> Thanks to both of you, really appreciate it!
>>
>> Sorry it took so long.
>>
>>>
>>>> Richard S. can you confirm?  I fully expect the trunk has moved some
>>>> and the patches will need adjustments -- consider adjustments which
>>>> work in a manner similar to the patches to date pre-approved.
>>>
>>>
>>> Yeah, that's now all of the poly_int patches.  I still owe you replies
>>> to some of them -- I'll get to that as soon as I can.
>>
>> NP.  I don't think any of the questions were all that significant.
>> Those which were I think you already responded to.
>
>
> I am disappointed that the no-op ctor issue hasn't been adequately
> addressed.  No numbers were presented as to the difference it makes
> to have the ctor do the expected thing (i.e., initialize the object).
> In my view, the choice seems arbitrarily in favor of a hypothetical
> performance improvement at -O0 without regard to the impact on
> correctness.  We have recently seen the adverse effects of similar
> choices in other areas: the hash table insertion[*] and the related
> offset_int initialization.

As were coming from a C code base not initializing stuff is what I expect.
I'm really surprised to see lot of default initialization done in places
where it only hurts compile-time (of GCC at least where we need to
optimize that away).

Richard.

> Martin
>
> [*] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82977
>
> PS To be clear, the numbers I asked for were those showing
> the difference between a no-op ctor and one that initializes
> the object to some determinate state, whatever that is.  IIUC
> the numbers in the following post show the aggregate slowdown
> for many or most of the changes in the series, not just
> the ctor.  If the numbers were significant, I suggested
> a solution to explicitly request a non-op ctor to make
> the default safe and eliminate the overhead where it mattered.
>
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01028.html

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [001/nnn] poly_int: add poly-int.h
  2017-12-15  9:08                   ` Richard Biener
@ 2017-12-15 15:19                     ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-15 15:19 UTC (permalink / raw)
  To: Richard Biener, Martin Sebor; +Cc: GCC Patches, Richard Sandiford

On 12/15/2017 02:08 AM, Richard Biener wrote:
> On Fri, Dec 15, 2017 at 4:40 AM, Martin Sebor <msebor@gmail.com> wrote:
>> On 12/07/2017 03:48 PM, Jeff Law wrote:
>>>
>>> On 12/07/2017 03:38 PM, Richard Sandiford wrote:
>>>
>>>>> So I think that's the final ack on this series.
>>>>
>>>>
>>>> Thanks to both of you, really appreciate it!
>>>
>>> Sorry it took so long.
>>>
>>>>
>>>>> Richard S. can you confirm?  I fully expect the trunk has moved some
>>>>> and the patches will need adjustments -- consider adjustments which
>>>>> work in a manner similar to the patches to date pre-approved.
>>>>
>>>>
>>>> Yeah, that's now all of the poly_int patches.  I still owe you replies
>>>> to some of them -- I'll get to that as soon as I can.
>>>
>>> NP.  I don't think any of the questions were all that significant.
>>> Those which were I think you already responded to.
>>
>>
>> I am disappointed that the no-op ctor issue hasn't been adequately
>> addressed.  No numbers were presented as to the difference it makes
>> to have the ctor do the expected thing (i.e., initialize the object).
>> In my view, the choice seems arbitrarily in favor of a hypothetical
>> performance improvement at -O0 without regard to the impact on
>> correctness.  We have recently seen the adverse effects of similar
>> choices in other areas: the hash table insertion[*] and the related
>> offset_int initialization.
> 
> As were coming from a C code base not initializing stuff is what I expect.
> I'm really surprised to see lot of default initialization done in places
> where it only hurts compile-time (of GCC at least where we need to
> optimize that away).
I suspect a lot of the default initializations were done when Kaveh and
others were working to get us -Wuninitialized clean -- which happened
when uninitialized warnings were still done in RTL (flow.c).

I've long wished we had marked the initializations which were done
solely to make -Wuninitialized happy because it would be a good way to
measure progress on our analysis & optimization passes's ability to
prove the paths weren't executable.

WRT the nop ctor issue, I had a slight leaning towards initializing
them, but I certainly could argue either side.  I think the long term
goal really should be to move to C++11 where it can be done right.

jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [002/nnn] poly_int: IN_TARGET_CODE
  2017-12-15  1:08     ` Richard Sandiford
@ 2017-12-15 15:22       ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-15 15:22 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 12/14/2017 06:08 PM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
>> On 10/23/2017 10:58 AM, Richard Sandiford wrote:
>>> This patch makes each target-specific TU define an IN_TARGET_CODE macro,
>>> which is used to decide whether poly_int<1, C> should convert to C.
>>>
>>>
>>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>>> 	    Alan Hayward  <alan.hayward@arm.com>
>>> 	    David Sherwood  <david.sherwood@arm.com>
>>>
>>> gcc/
>>> 	* genattrtab.c (write_header): Define IN_TARGET_CODE to 1 in the
>>> 	target C file.
>>> 	* genautomata.c (main): Likewise.
>>> 	* genconditions.c (write_header): Likewise.
>>> 	* genemit.c (main): Likewise.
>>> 	* genextract.c (print_header): Likewise.
>>> 	* genopinit.c (main): Likewise.
>>> 	* genoutput.c (output_prologue): Likewise.
>>> 	* genpeep.c (main): Likewise.
>>> 	* genpreds.c (write_insn_preds_c): Likewise.
>>> 	* genrecog.c (writer_header): Likewise.
>>> 	* config/aarch64/aarch64-builtins.c (IN_TARGET_CODE): Define.
>>> 	* config/aarch64/aarch64-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/aarch64/aarch64.c (IN_TARGET_CODE): Likewise.
>>> 	* config/aarch64/cortex-a57-fma-steering.c (IN_TARGET_CODE): Likewise.
>>> 	* config/aarch64/driver-aarch64.c (IN_TARGET_CODE): Likewise.
>>> 	* config/alpha/alpha.c (IN_TARGET_CODE): Likewise.
>>> 	* config/alpha/driver-alpha.c (IN_TARGET_CODE): Likewise.
>>> 	* config/arc/arc-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/arc/arc.c (IN_TARGET_CODE): Likewise.
>>> 	* config/arc/driver-arc.c (IN_TARGET_CODE): Likewise.
>>> 	* config/arm/aarch-common.c (IN_TARGET_CODE): Likewise.
>>> 	* config/arm/arm-builtins.c (IN_TARGET_CODE): Likewise.
>>> 	* config/arm/arm-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/arm/arm.c (IN_TARGET_CODE): Likewise.
>>> 	* config/arm/driver-arm.c (IN_TARGET_CODE): Likewise.
>>> 	* config/avr/avr-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/avr/avr-devices.c (IN_TARGET_CODE): Likewise.
>>> 	* config/avr/avr-log.c (IN_TARGET_CODE): Likewise.
>>> 	* config/avr/avr.c (IN_TARGET_CODE): Likewise.
>>> 	* config/avr/driver-avr.c (IN_TARGET_CODE): Likewise.
>>> 	* config/avr/gen-avr-mmcu-specs.c (IN_TARGET_CODE): Likewise.
>>> 	* config/bfin/bfin.c (IN_TARGET_CODE): Likewise.
>>> 	* config/c6x/c6x.c (IN_TARGET_CODE): Likewise.
>>> 	* config/cr16/cr16.c (IN_TARGET_CODE): Likewise.
>>> 	* config/cris/cris.c (IN_TARGET_CODE): Likewise.
>>> 	* config/darwin.c (IN_TARGET_CODE): Likewise.
>>> 	* config/epiphany/epiphany.c (IN_TARGET_CODE): Likewise.
>>> 	* config/epiphany/mode-switch-use.c (IN_TARGET_CODE): Likewise.
>>> 	* config/epiphany/resolve-sw-modes.c (IN_TARGET_CODE): Likewise.
>>> 	* config/fr30/fr30.c (IN_TARGET_CODE): Likewise.
>>> 	* config/frv/frv.c (IN_TARGET_CODE): Likewise.
>>> 	* config/ft32/ft32.c (IN_TARGET_CODE): Likewise.
>>> 	* config/h8300/h8300.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/djgpp.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/driver-i386.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/driver-mingw32.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/host-cygwin.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/host-i386-darwin.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/host-mingw32.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/i386-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/i386.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/intelmic-mkoffload.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/msformat-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/winnt-cxx.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/winnt-stubs.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/winnt.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/x86-tune-sched-atom.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/x86-tune-sched-bd.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/x86-tune-sched-core.c (IN_TARGET_CODE): Likewise.
>>> 	* config/i386/x86-tune-sched.c (IN_TARGET_CODE): Likewise.
>>> 	* config/ia64/ia64-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/ia64/ia64.c (IN_TARGET_CODE): Likewise.
>>> 	* config/iq2000/iq2000.c (IN_TARGET_CODE): Likewise.
>>> 	* config/lm32/lm32.c (IN_TARGET_CODE): Likewise.
>>> 	* config/m32c/m32c-pragma.c (IN_TARGET_CODE): Likewise.
>>> 	* config/m32c/m32c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/m32r/m32r.c (IN_TARGET_CODE): Likewise.
>>> 	* config/m68k/m68k.c (IN_TARGET_CODE): Likewise.
>>> 	* config/mcore/mcore.c (IN_TARGET_CODE): Likewise.
>>> 	* config/microblaze/microblaze-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/microblaze/microblaze.c (IN_TARGET_CODE): Likewise.
>>> 	* config/mips/driver-native.c (IN_TARGET_CODE): Likewise.
>>> 	* config/mips/frame-header-opt.c (IN_TARGET_CODE): Likewise.
>>> 	* config/mips/mips.c (IN_TARGET_CODE): Likewise.
>>> 	* config/mmix/mmix.c (IN_TARGET_CODE): Likewise.
>>> 	* config/mn10300/mn10300.c (IN_TARGET_CODE): Likewise.
>>> 	* config/moxie/moxie.c (IN_TARGET_CODE): Likewise.
>>> 	* config/msp430/driver-msp430.c (IN_TARGET_CODE): Likewise.
>>> 	* config/msp430/msp430-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/msp430/msp430.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nds32/nds32-cost.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nds32/nds32-fp-as-gp.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nds32/nds32-intrinsic.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nds32/nds32-isr.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nds32/nds32-md-auxiliary.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nds32/nds32-memory-manipulation.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nds32/nds32-pipelines-auxiliary.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nds32/nds32-predicates.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nds32/nds32.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nios2/nios2.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nvptx/mkoffload.c (IN_TARGET_CODE): Likewise.
>>> 	* config/nvptx/nvptx.c (IN_TARGET_CODE): Likewise.
>>> 	* config/pa/pa.c (IN_TARGET_CODE): Likewise.
>>> 	* config/pdp11/pdp11.c (IN_TARGET_CODE): Likewise.
>>> 	* config/powerpcspe/driver-powerpcspe.c (IN_TARGET_CODE): Likewise.
>>> 	* config/powerpcspe/host-darwin.c (IN_TARGET_CODE): Likewise.
>>> 	* config/powerpcspe/host-ppc64-darwin.c (IN_TARGET_CODE): Likewise.
>>> 	* config/powerpcspe/powerpcspe-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/powerpcspe/powerpcspe-linux.c (IN_TARGET_CODE): Likewise.
>>> 	* config/powerpcspe/powerpcspe.c (IN_TARGET_CODE): Likewise.
>>> 	* config/riscv/riscv-builtins.c (IN_TARGET_CODE): Likewise.
>>> 	* config/riscv/riscv-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/riscv/riscv.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rl78/rl78-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rl78/rl78.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rs6000/driver-rs6000.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rs6000/host-darwin.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rs6000/host-ppc64-darwin.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rs6000/rs6000-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rs6000/rs6000-linux.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rs6000/rs6000-p8swap.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rs6000/rs6000-string.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rs6000/rs6000.c (IN_TARGET_CODE): Likewise.
>>> 	* config/rx/rx.c (IN_TARGET_CODE): Likewise.
>>> 	* config/s390/driver-native.c (IN_TARGET_CODE): Likewise.
>>> 	* config/s390/s390-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/s390/s390.c (IN_TARGET_CODE): Likewise.
>>> 	* config/sh/sh-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/sh/sh-mem.cc (IN_TARGET_CODE): Likewise.
>>> 	* config/sh/sh.c (IN_TARGET_CODE): Likewise.
>>> 	* config/sh/sh_optimize_sett_clrt.cc (IN_TARGET_CODE): Likewise.
>>> 	* config/sh/sh_treg_combine.cc (IN_TARGET_CODE): Likewise.
>>> 	* config/sparc/driver-sparc.c (IN_TARGET_CODE): Likewise.
>>> 	* config/sparc/sparc-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/sparc/sparc.c (IN_TARGET_CODE): Likewise.
>>> 	* config/spu/spu-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/spu/spu.c (IN_TARGET_CODE): Likewise.
>>> 	* config/stormy16/stormy16.c (IN_TARGET_CODE): Likewise.
>>> 	* config/tilegx/mul-tables.c (IN_TARGET_CODE): Likewise.
>>> 	* config/tilegx/tilegx-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/tilegx/tilegx.c (IN_TARGET_CODE): Likewise.
>>> 	* config/tilepro/mul-tables.c (IN_TARGET_CODE): Likewise.
>>> 	* config/tilepro/tilepro-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/tilepro/tilepro.c (IN_TARGET_CODE): Likewise.
>>> 	* config/v850/v850-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/v850/v850.c (IN_TARGET_CODE): Likewise.
>>> 	* config/vax/vax.c (IN_TARGET_CODE): Likewise.
>>> 	* config/visium/visium.c (IN_TARGET_CODE): Likewise.
>>> 	* config/vms/vms-c.c (IN_TARGET_CODE): Likewise.
>>> 	* config/vms/vms-f.c (IN_TARGET_CODE): Likewise.
>>> 	* config/vms/vms.c (IN_TARGET_CODE): Likewise.
>>> 	* config/xtensa/xtensa.c (IN_TARGET_CODE): Likewise.
>> ISTM this needs documenting somewhere.
>>
>> OK with a suitable doc patch.
> 
> OK.  I couldn't find anywhere that was an obvious fit, so in the end
> I went for the "Anatomy of a target back end".  (The effect on
> IN_TARGET_CODE on poly-int is already documented in poly-int.texi.)
> 
> Does this look OK?
> 
> Thanks,
> Richard
> 
> 
> 2017-12-15  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* doc/sourcebuild.texi: Document IN_TARGET_CODE.
> 	[...]
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [005/nnn] poly_int: rtx constants
  2017-12-15  1:25     ` Richard Sandiford
@ 2017-12-19  4:52       ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2017-12-19  4:52 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 12/14/2017 06:25 PM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
>> On 10/23/2017 11:00 AM, Richard Sandiford wrote:
>>> This patch adds an rtl representation of poly_int values.
>>> There were three possible ways of doing this:
>>>
>>> (1) Add a new rtl code for the poly_ints themselves and store the
>>>     coefficients as trailing wide_ints.  This would give constants like:
>>>
>>>       (const_poly_int [c0 c1 ... cn])
>>>
>>>     The runtime value would be:
>>>
>>>       c0 + c1 * x1 + ... + cn * xn
>>>
>>> (2) Like (1), but use rtxes for the coefficients.  This would give
>>>     constants like:
>>>
>>>       (const_poly_int [(const_int c0)
>>>                        (const_int c1)
>>>                        ...
>>>                        (const_int cn)])
>>>
>>>     although the coefficients could be const_wide_ints instead
>>>     of const_ints where appropriate.
>>>
>>> (3) Add a new rtl code for the polynomial indeterminates,
>>>     then use them in const wrappers.  A constant like c0 + c1 * x1
>>>     would then look like:
>>>
>>>       (const:M (plus:M (mult:M (const_param:M x1)
>>>                                (const_int c1))
>>>                        (const_int c0)))
>>>
>>> There didn't seem to be that much to choose between them.  The main
>>> advantage of (1) is that it's a more efficient representation and
>>> that we can refer to the cofficients directly as wide_int_storage.
>> Well, and #1 feels more like how we handle CONST_INT :-)
>>>
>>>
>>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>>> 	    Alan Hayward  <alan.hayward@arm.com>
>>> 	    David Sherwood  <david.sherwood@arm.com>
>>>
>>> gcc/
>>> 	* doc/rtl.texi (const_poly_int): Document.
>>> 	* gengenrtl.c (excluded_rtx): Return true for CONST_POLY_INT.
>>> 	* rtl.h (const_poly_int_def): New struct.
>>> 	(rtx_def::u): Add a cpi field.
>>> 	(CASE_CONST_UNIQUE, CASE_CONST_ANY): Add CONST_POLY_INT.
>>> 	(CONST_POLY_INT_P, CONST_POLY_INT_COEFFS): New macros.
>>> 	(wi::rtx_to_poly_wide_ref): New typedef
>>> 	(const_poly_int_value, wi::to_poly_wide, rtx_to_poly_int64)
>>> 	(poly_int_rtx_p): New functions.
>>> 	(trunc_int_for_mode): Declare a poly_int64 version.
>>> 	(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
>>> 	(immed_wide_int_const): Take a poly_wide_int_ref rather than
>>> 	a wide_int_ref.
>>> 	(strip_offset): Declare.
>>> 	(strip_offset_and_add): New function.
>>> 	* rtl.def (CONST_POLY_INT): New rtx code.
>>> 	* rtl.c (rtx_size): Handle CONST_POLY_INT.
>>> 	(shared_const_p): Use poly_int_rtx_p.
>>> 	* emit-rtl.h (gen_int_mode): Take a poly_int64 instead of a
>>> 	HOST_WIDE_INT.
>>> 	(gen_int_shift_amount): Likewise.
>>> 	* emit-rtl.c (const_poly_int_hasher): New class.
>>> 	(const_poly_int_htab): New variable.
>>> 	(init_emit_once): Initialize it when NUM_POLY_INT_COEFFS > 1.
>>> 	(const_poly_int_hasher::hash): New function.
>>> 	(const_poly_int_hasher::equal): Likewise.
>>> 	(gen_int_mode): Take a poly_int64 instead of a HOST_WIDE_INT.
>>> 	(immed_wide_int_const): Rename to...
>>> 	(immed_wide_int_const_1): ...this and make static.
>>> 	(immed_wide_int_const): New function, taking a poly_wide_int_ref
>>> 	instead of a wide_int_ref.
>>> 	(gen_int_shift_amount): Take a poly_int64 instead of a HOST_WIDE_INT.
>>> 	(gen_lowpart_common): Handle CONST_POLY_INT.
>>> 	* cse.c (hash_rtx_cb, equiv_constant): Likewise.
>>> 	* cselib.c (cselib_hash_rtx): Likewise.
>>> 	* dwarf2out.c (const_ok_for_output_1): Likewise.
>>> 	* expr.c (convert_modes): Likewise.
>>> 	* print-rtl.c (rtx_writer::print_rtx, print_value): Likewise.
>>> 	* rtlhash.c (add_rtx): Likewise.
>>> 	* explow.c (trunc_int_for_mode): Add a poly_int64 version.
>>> 	(plus_constant): Take a poly_int64 instead of a HOST_WIDE_INT.
>>> 	Handle existing CONST_POLY_INT rtxes.
>>> 	* expmed.h (expand_shift): Take a poly_int64 instead of a
>>> 	HOST_WIDE_INT.
>>> 	* expmed.c (expand_shift): Likewise.
>>> 	* rtlanal.c (strip_offset): New function.
>>> 	(commutative_operand_precedence): Give CONST_POLY_INT the same
>>> 	precedence as CONST_DOUBLE and put CONST_WIDE_INT between that
>>> 	and CONST_INT.
>>> 	* rtl-tests.c (const_poly_int_tests): New struct.
>>> 	(rtl_tests_c_tests): Use it.
>>> 	* simplify-rtx.c (simplify_const_unary_operation): Handle
>>> 	CONST_POLY_INT.
>>> 	(simplify_const_binary_operation): Likewise.
>>> 	(simplify_binary_operation_1): Fold additions of symbolic constants
>>> 	and CONST_POLY_INTs.
>>> 	(simplify_subreg): Handle extensions and truncations of
>>> 	CONST_POLY_INTs.
>>> 	(simplify_const_poly_int_tests): New struct.
>>> 	(simplify_rtx_c_tests): Use it.
>>> 	* wide-int.h (storage_ref): Add default constructor.
>>> 	(wide_int_ref_storage): Likewise.
>>> 	(trailing_wide_ints): Use GTY((user)).
>>> 	(trailing_wide_ints::operator[]): Add a const version.
>>> 	(trailing_wide_ints::get_precision): New function.
>>> 	(trailing_wide_ints::extra_size): Likewise.
>> Do we need to define anything WRT structure sharing in rtl.texi for a
>> CONST_POLY_INT?
> 
> Good catch.  Fixed in the patch below.
> 
>>> Index: gcc/rtl.c
>>> ===================================================================
>>> --- gcc/rtl.c	2017-10-23 16:52:20.579835373 +0100
>>> +++ gcc/rtl.c	2017-10-23 17:00:54.443002147 +0100
>>> @@ -257,9 +261,10 @@ shared_const_p (const_rtx orig)
>>>  
>>>    /* CONST can be shared if it contains a SYMBOL_REF.  If it contains
>>>       a LABEL_REF, it isn't sharable.  */
>>> +  poly_int64 offset;
>>>    return (GET_CODE (XEXP (orig, 0)) == PLUS
>>>  	  && GET_CODE (XEXP (XEXP (orig, 0), 0)) == SYMBOL_REF
>>> -	  && CONST_INT_P (XEXP (XEXP (orig, 0), 1)));
>>> +	  && poly_int_rtx_p (XEXP (XEXP (orig, 0), 1), &offset));
>> Did this just change structure sharing for CONST_WIDE_INT?
> 
> No, we'd only use CONST_WIDE_INT for things that don't fit in
> poly_int64.
> 
>>> +  /* Create a new rtx.  There's a choice to be made here between installing
>>> +     the actual mode of the rtx or leaving it as VOIDmode (for consistency
>>> +     with CONST_INT).  In practice the handling of the codes is different
>>> +     enough that we get no benefit from using VOIDmode, and various places
>>> +     assume that VOIDmode implies CONST_INT.  Using the real mode seems like
>>> +     the right long-term direction anyway.  */
>> Certainly my preference is to get the mode in there.  I see modeless
>> CONST_INTs as a long standing wart and I'm not keen to repeat it.
> 
> Yeah.  Still regularly hit problems related to modeless CONST_INTs
> today (including the gen_int_shift_amount patch).
> 
>>> Index: gcc/wide-int.h
>>> ===================================================================
>>> --- gcc/wide-int.h	2017-10-23 17:00:20.923835582 +0100
>>> +++ gcc/wide-int.h	2017-10-23 17:00:54.445999420 +0100
>>> @@ -613,6 +613,7 @@ #define SHIFT_FUNCTION \
>>>       access.  */
>>>    struct storage_ref
>>>    {
>>> +    storage_ref () {}
>>>      storage_ref (const HOST_WIDE_INT *, unsigned int, unsigned int);
>>>  
>>>      const HOST_WIDE_INT *val;
>>> @@ -944,6 +945,8 @@ struct wide_int_ref_storage : public wi:
>>>    HOST_WIDE_INT scratch[2];
>>>  
>>>  public:
>>> +  wide_int_ref_storage () {}
>>> +
>>>    wide_int_ref_storage (const wi::storage_ref &);
>>>  
>>>    template <typename T>
>> So doesn't this play into the whole question about initialization of
>> these objects.  So I'll defer on this hunk until we settle that
>> question, but the rest is OK.
> 
> Any more thoughts on this?  In the end the 001 patch went in with
> the empty constructors.  Like I say, I'm happy to switch to C++-11
> "= default;" once we require C++11, but I think having well-defined
> implicit construction would make switching to "= default" harder
> in future.
I think we're good to go.  I would have slightly preferred to avoid the
empty ctor, but not enough to raise an objection to Richi's ACK and
ultimately make the switch to = default harder later.

And just to be clear, I'd like to propose we step forward to C++11 in
the gcc-9 timeframe.  I haven't run that by anyone, but that's the
timeframe I'd personally prefer.


jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [047/nnn] poly_int: argument sizes
  2017-12-06 20:57   ` Jeff Law
@ 2017-12-20 11:37     ` Richard Sandiford
  0 siblings, 0 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-12-20 11:37 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 11:20 AM, Richard Sandiford wrote:
>> This patch changes various bits of state related to argument sizes so
>> that they have type poly_int64 rather than HOST_WIDE_INT.  This includes:
>> 
>> - incoming_args::pops_args and incoming_args::size
>> - rtl_data::outgoing_args_size
>> - pending_stack_adjust
>> - stack_pointer_delta
>> - stack_usage::pushed_stack_size
>> - args_size::constant
>> 
>> It also changes TARGET_RETURN_POPS_ARGS so that the size of the
>> arguments passed in and the size returned by the hook are both
>> poly_int64s.
>> 
>> 
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* target.def (return_pops_args): Treat both the input and output
>> 	sizes as poly_int64s rather than HOST_WIDE_INTS.
>> 	* targhooks.h (default_return_pops_args): Update accordingly.
>> 	* targhooks.c (default_return_pops_args): Likewise.
>> 	* doc/tm.texi: Regenerate.
>> 	* emit-rtl.h (incoming_args): Change pops_args, size and
>> 	outgoing_args_size from int to poly_int64_pod.
>> 	* function.h (expr_status): Change x_pending_stack_adjust and
>> 	x_stack_pointer_delta from int to poly_int64.
>> 	(args_size::constant): Change from HOST_WIDE_INT to poly_int64.
>> 	(ARGS_SIZE_RTX): Update accordingly.
>> 	* calls.c (highest_outgoing_arg_in_use): Change from int to
>> 	unsigned int.
>> 	(stack_usage_watermark, stored_args_watermark): New variables.
>> 	(stack_region_maybe_used_p, mark_stack_region_used): New functions.
>> 	(emit_call_1): Change the stack_size and rounded_stack_size
>> 	parameters from HOST_WIDE_INT to poly_int64.  Track n_popped
>> 	as a poly_int64.
>> 	(save_fixed_argument_area): Check stack_usage_watermark.
>> 	(initialize_argument_information): Change old_pending_adj from
>> 	a HOST_WIDE_INT * to a poly_int64_pod *.
>> 	(compute_argument_block_size): Return the size as a poly_int64
>> 	rather than an int.
>> 	(finalize_must_preallocate): Track polynomial argument sizes.
>> 	(compute_argument_addresses): Likewise.
>> 	(internal_arg_pointer_based_exp): Track polynomial offsets.
>> 	(mem_overlaps_already_clobbered_arg_p): Rename to...
>> 	(mem_might_overlap_already_clobbered_arg_p): ...this and take the
>> 	size as a poly_uint64 rather than an unsigned HOST_WIDE_INT.
>> 	Check stored_args_used_watermark.
>> 	(load_register_parameters): Update accordingly.
>> 	(check_sibcall_argument_overlap_1): Likewise.
>> 	(combine_pending_stack_adjustment_and_call): Take the unadjusted
>> 	args size as a poly_int64 rather than an int.  Return a bool
>> 	indicating whether the optimization was possible and return
>> 	the new adjustment by reference.
>> 	(check_sibcall_argument_overlap): Track polynomail argument sizes.
>> 	Update stored_args_watermark.
>> 	(can_implement_as_sibling_call_p): Handle polynomial argument sizes.
>> 	(expand_call): Likewise.  Maintain stack_usage_watermark and
>> 	stored_args_watermark.  Update calls to
>> 	combine_pending_stack_adjustment_and_call.
>> 	(emit_library_call_value_1): Handle polynomial argument sizes.
>> 	Call stack_region_maybe_used_p and mark_stack_region_used.
>> 	Maintain stack_usage_watermark.
>> 	(store_one_arg): Likewise.  Update call to
>> 	mem_overlaps_already_clobbered_arg_p.
>> 	* config/arm/arm.c (arm_output_function_prologue): Add a cast to
>> 	HOST_WIDE_INT.
>> 	* config/avr/avr.c (avr_outgoing_args_size): Likewise.
>> 	* config/microblaze/microblaze.c (microblaze_function_prologue):
>> 	Likewise.
>> 	* config/cr16/cr16.c (cr16_return_pops_args): Update for new
>> 	TARGET_RETURN_POPS_ARGS interface.
>> 	(cr16_compute_frame, cr16_initial_elimination_offset): Add casts
>> 	to HOST_WIDE_INT.
>> 	* config/ft32/ft32.c (ft32_compute_frame): Likewise.
>> 	* config/i386/i386.c (ix86_return_pops_args): Update for new
>> 	TARGET_RETURN_POPS_ARGS interface.
>> 	(ix86_expand_split_stack_prologue): Add a cast to HOST_WIDE_INT.
>> 	* config/moxie/moxie.c (moxie_compute_frame): Likewise.
>> 	* config/m68k/m68k.c (m68k_return_pops_args): Update for new
>> 	TARGET_RETURN_POPS_ARGS interface.
>> 	* config/vax/vax.c (vax_return_pops_args): Likewise.
>> 	* config/pa/pa.h (STACK_POINTER_OFFSET): Add a cast to poly_int64.
>> 	(EXIT_IGNORE_STACK): Update reference to crtl->outgoing_args_size.
>> 	* config/arm/arm.h (CALLER_INTERWORKING_SLOT_SIZE): Likewise.
>> 	* config/powerpcspe/aix.h (STACK_DYNAMIC_OFFSET): Likewise.
>> 	* config/powerpcspe/darwin.h (STACK_DYNAMIC_OFFSET): Likewise.
>> 	* config/powerpcspe/powerpcspe.h (STACK_DYNAMIC_OFFSET): Likewise.
>> 	* config/rs6000/aix.h (STACK_DYNAMIC_OFFSET): Likewise.
>> 	* config/rs6000/darwin.h (STACK_DYNAMIC_OFFSET): Likewise.
>> 	* config/rs6000/rs6000.h (STACK_DYNAMIC_OFFSET): Likewise.
>> 	* dojump.h (saved_pending_stack_adjust): Change x_pending_stack_adjust
>> 	and x_stack_pointer_delta from int to poly_int64.
>> 	* dojump.c (do_pending_stack_adjust): Update accordingly.
>> 	* explow.c (allocate_dynamic_stack_space): Handle polynomial
>> 	stack_pointer_deltas.
>> 	* function.c (STACK_DYNAMIC_OFFSET): Add a cast to poly_int64.
>> 	(pad_to_arg_alignment): Track polynomial offsets.
>> 	(assign_parm_find_stack_rtl): Likewise.
>> 	(assign_parms, locate_and_pad_parm): Handle polynomial argument sizes.
>> 	* toplev.c (output_stack_usage): Update reference to
>> 	current_function_pushed_stack_size.
> I haven't been able to convince myself that the changes to the
> stack_usage_map are correct, particularly in the case where the
> UPPER_BOUND is not constant.  But I'm willing to let it go given the
> only target that could potentially be affected would be SVE and I'd
> expect that if you'd gotten it wrong it would have showed up in your
> testing.
>
> I'm also slightly worried about how we handle ARGS_GROW_DOWNWARD targets
> (pa, stormy16).  I couldn't convince myself those changes were correct
> either.  Again, I'm willing on fall back to lean on your extensive
> experience and testing here.
>
> Of all the patches to-date I've looked at, this one worries me the most
> (which is why it's next to last according to my records).  The potential
> for a goof in the argument setup, padding, stack save area, and the
> consequences for such a goof worry me.
>
> I'm going to conditionally ack this -- my condition is that you
> re-review the the calls.c/function.c changes as well.

I agree this is probably the least obvious of the changes.  I've had
another look and I still think the approach is OK.

For stack_usage_map: we're trying to detect whether an instruction
references a region of stack that has already been clobbered.
We can conservatively do this as a range from the lowest possible byte
offset in the region to the highest possible byte offset in the region.
This works both when recording a clobber and when testing for an overlap
with a previous clobber.

For polynomial lower bounds, the lowest possible offset is given by
constant_lower_bound.  For polynomial upper bounds, the highest
possible offset is HOST_WIDE_INT_M1U (since we track unsigned offsets
from a zero base).  But it's not possible to have the stack_usage_map
array itself go up to HOST_WIDE_INT_M1U, so instead, the watermark
records the lowest lower bound whose corresponding upper bound is
HOST_WIDE_INT_M1U, so that every index of stack_usage_map starting at
the watermark is conceptually set.

The patch doesn't really change the handling of ARGS_GROW_DOWNWARD,
it just rewrites the tests into a different form.  E.g. there were
two instances of code like:

                  if (ARGS_GROW_DOWNWARD)
                    highest_outgoing_arg_in_use
                      = MAX (initial_highest_arg_in_use, needed + 1);
                  else
                    highest_outgoing_arg_in_use
                      = MAX (initial_highest_arg_in_use, needed);

and the patch splits out the calculation of the second MAX argument:

                  poly_int64 limit = needed;
                  if (ARGS_GROW_DOWNWARD)
                    limit += 1;

The third ARGS_GROW_DOWNARD-related hunk is for pad_to_arg_alignment.
Previously we had:

          offset_ptr->constant = -sp_offset +
            (ARGS_GROW_DOWNWARD
            ? FLOOR_ROUND (offset_ptr->constant + sp_offset, boundary_in_bytes)
            : CEIL_ROUND (offset_ptr->constant + sp_offset, boundary_in_bytes));

Expanding FLOOR_ROUND, and using the fact that boundary_in_bytes is
a power of 2, the ARGS_GROW_DOWNARD case is equivalent to:

          offset_ptr->constant = -sp_offset
            + (offset_ptr->constant + sp_offset)
            - ((offset_ptr->constant + sp_offset) & (boundary_in_bytes - 1));

The -sp_offset + sp_offset cancels out, giving:

          offset_ptr->constant
            = offset_ptr->constant
            - ((offset_ptr->constant + sp_offset) & (boundary_in_bytes - 1));

or:

          offset_ptr->constant
            -= (offset_ptr->constant + sp_offset) & (boundary_in_bytes - 1);

But since (offset_ptr->constant + sp_offset) is polynomial, we can't
always calculate this value, so the code is conditional on a call
to known_misalignment:

      if (...
          || !known_misalignment (offset_ptr->constant + sp_offset,
                                  boundary_in_bytes, &misalign))
        ...
      else
        {
          if (ARGS_GROW_DOWNWARD)
            offset_ptr->constant -= misalign;
        
Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [039/nnn] poly_int: pass_store_merging::execute
  2017-11-28 18:00   ` Jeff Law
@ 2017-12-20 12:59     ` Richard Sandiford
  0 siblings, 0 replies; 302+ messages in thread
From: Richard Sandiford @ 2017-12-20 12:59 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 11:17 AM, Richard Sandiford wrote:
>> This patch makes pass_store_merging::execute track polynomial sizes
>> and offsets.
>> 
>> 
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* gimple-ssa-store-merging.c (pass_store_merging::execute): Track
>> 	polynomial sizes and offsets.
> OK.  THough I wouldn't be surprised if this needs revamping after
> Jakub's work in this space.

Yeah, it needed quite a big revamp in the end.  Same idea, just in
different places.  And...

> It wasn't clear why you moved some of the code which computes invalid vs
> where we test invalid, but I don't see any problem with that movement of
> code.

...this part fortunately went away with the new code structure.

Here's what I applied after retesting.  It now fits in the series after
[036/nnn].

Thanks,
Richard


2017-12-20  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* poly-int-types.h (round_down_to_byte_boundary): New macro.
	(round_up_to_byte_boundary): Likewise.
	* expr.h (get_bit_range): Add temporary shim.
	* gimple-ssa-store-merging.c (store_operand_info): Change the
	bitsize, bitpos, bitregion_start and bitregion_end fields from
	unsigned HOST_WIDE_INT to poly_uint64.
	(merged_store_group): Likewise load_align_base.
	(compatible_load_p, compatible_load_p): Update accordingly.
	(imm_store_chain_info::coalesce_immediate_stores): Likewise.
	(split_group, imm_store_chain_info::output_merged_store): Likewise.
	(mem_valid_for_store_merging): Return the bitsize, bitpos,
	bitregion_start and bitregion_end as poly_uint64s rather than
	unsigned HOST_WIDE_INTs.  Track polynomial offsets internally.
	(handled_load): Take the bitsize, bitpos,
	bitregion_start and bitregion_end as poly_uint64s rather than
	unsigned HOST_WIDE_INTs.
	(pass_store_merging::process_store): Update call to
	mem_valid_for_store_merging.

Index: gcc/poly-int-types.h
===================================================================
--- gcc/poly-int-types.h	2017-12-20 09:36:17.817094740 +0000
+++ gcc/poly-int-types.h	2017-12-20 09:37:07.445382807 +0000
@@ -60,6 +60,18 @@ #define bits_to_bytes_round_up(X) force_
    of bytes in size.  */
 #define num_trailing_bits(X) force_get_misalignment (X, BITS_PER_UNIT)
 
+/* Round bit quantity X down to the nearest byte boundary.
+
+   This is safe because non-constant mode sizes must be a whole number
+   of bytes in size.  */
+#define round_down_to_byte_boundary(X) force_align_down (X, BITS_PER_UNIT)
+
+/* Round bit quantity X up the nearest byte boundary.
+
+   This is safe because non-constant mode sizes must be a whole number
+   of bytes in size.  */
+#define round_up_to_byte_boundary(X) force_align_up (X, BITS_PER_UNIT)
+
 /* Return the size of an element in a vector of size SIZE, given that
    the vector has NELTS elements.  The return value is in the same units
    as SIZE (either bits or bytes).
Index: gcc/expr.h
===================================================================
--- gcc/expr.h	2017-12-20 09:36:17.817094740 +0000
+++ gcc/expr.h	2017-12-20 09:37:07.444382841 +0000
@@ -243,6 +243,15 @@ extern bool emit_push_insn (rtx, machine
 extern void get_bit_range (unsigned HOST_WIDE_INT *, unsigned HOST_WIDE_INT *,
 			   tree, HOST_WIDE_INT *, tree *);
 
+/* Temporary.  */
+inline void
+get_bit_range (poly_uint64_pod *bitstart, poly_uint64_pod *bitend, tree exp,
+	       poly_int64_pod *bitpos, tree *offset)
+{
+  get_bit_range (&bitstart->coeffs[0], &bitend->coeffs[0], exp,
+		 &bitpos->coeffs[0], offset);
+}
+
 /* Expand an assignment that stores the value of FROM into TO.  */
 extern void expand_assignment (tree, tree, bool);
 
Index: gcc/gimple-ssa-store-merging.c
===================================================================
--- gcc/gimple-ssa-store-merging.c	2017-12-20 09:36:17.817094740 +0000
+++ gcc/gimple-ssa-store-merging.c	2017-12-20 09:37:07.445382807 +0000
@@ -1321,10 +1321,10 @@ struct store_operand_info
 {
   tree val;
   tree base_addr;
-  unsigned HOST_WIDE_INT bitsize;
-  unsigned HOST_WIDE_INT bitpos;
-  unsigned HOST_WIDE_INT bitregion_start;
-  unsigned HOST_WIDE_INT bitregion_end;
+  poly_uint64 bitsize;
+  poly_uint64 bitpos;
+  poly_uint64 bitregion_start;
+  poly_uint64 bitregion_end;
   gimple *stmt;
   bool bit_not_p;
   store_operand_info ();
@@ -1414,7 +1414,7 @@ struct merged_store_group
   /* The size of the allocated memory for val and mask.  */
   unsigned HOST_WIDE_INT buf_size;
   unsigned HOST_WIDE_INT align_base;
-  unsigned HOST_WIDE_INT load_align_base[2];
+  poly_uint64 load_align_base[2];
 
   unsigned int align;
   unsigned int load_align[2];
@@ -2198,8 +2198,8 @@ compatible_load_p (merged_store_group *m
 {
   store_immediate_info *infof = merged_store->stores[0];
   if (!info->ops[idx].base_addr
-      || (info->ops[idx].bitpos - infof->ops[idx].bitpos
-	  != info->bitpos - infof->bitpos)
+      || maybe_ne (info->ops[idx].bitpos - infof->ops[idx].bitpos,
+		   info->bitpos - infof->bitpos)
       || !operand_equal_p (info->ops[idx].base_addr,
 			   infof->ops[idx].base_addr, 0))
     return false;
@@ -2229,7 +2229,7 @@ compatible_load_p (merged_store_group *m
      the construction of the immediate chain info guarantees no intervening
      stores, so no further checks are needed.  Example:
      _1 = s.a; _2 = _1 & -7; s.a = _2; _3 = s.b; _4 = _3 & -7; s.b = _4;  */
-  if (info->ops[idx].bitpos == info->bitpos
+  if (known_eq (info->ops[idx].bitpos, info->bitpos)
       && operand_equal_p (info->ops[idx].base_addr, base_addr, 0))
     return true;
 
@@ -2624,8 +2624,8 @@ imm_store_chain_info::coalesce_immediate
 	      && infof->ops[1].base_addr
 	      && info->ops[0].base_addr
 	      && info->ops[1].base_addr
-	      && (info->ops[1].bitpos - infof->ops[0].bitpos
-		  == info->bitpos - infof->bitpos)
+	      && known_eq (info->ops[1].bitpos - infof->ops[0].bitpos,
+			   info->bitpos - infof->bitpos)
 	      && operand_equal_p (info->ops[1].base_addr,
 				  infof->ops[0].base_addr, 0))
 	    {
@@ -3031,11 +3031,12 @@ split_group (merged_store_group *group,
 	  for (int i = 0; i < 2; ++i)
 	    if (group->load_align[i])
 	      {
-		align_bitpos = try_bitpos - group->stores[0]->bitpos;
-		align_bitpos += group->stores[0]->ops[i].bitpos;
-		align_bitpos -= group->load_align_base[i];
-		align_bitpos &= (group_load_align - 1);
-		if (align_bitpos)
+		align_bitpos
+		  = known_alignment (try_bitpos
+				     - group->stores[0]->bitpos
+				     + group->stores[0]->ops[i].bitpos
+				     - group->load_align_base[i]);
+		if (align_bitpos & (group_load_align - 1))
 		  {
 		    unsigned HOST_WIDE_INT a = least_bit_hwi (align_bitpos);
 		    load_align = MIN (load_align, a);
@@ -3491,10 +3492,10 @@ imm_store_chain_info::output_merged_stor
 
 		  unsigned HOST_WIDE_INT load_align = group->load_align[j];
 		  unsigned HOST_WIDE_INT align_bitpos
-		    = (try_pos * BITS_PER_UNIT
-		       - split_store->orig_stores[0]->bitpos
-		       + op.bitpos) & (load_align - 1);
-		  if (align_bitpos)
+		    = known_alignment (try_pos * BITS_PER_UNIT
+				       - split_store->orig_stores[0]->bitpos
+				       + op.bitpos);
+		  if (align_bitpos & (load_align - 1))
 		    load_align = least_bit_hwi (align_bitpos);
 
 		  tree load_int_type
@@ -3502,10 +3503,11 @@ imm_store_chain_info::output_merged_stor
 		  load_int_type
 		    = build_aligned_type (load_int_type, load_align);
 
-		  unsigned HOST_WIDE_INT load_pos
-		    = (try_pos * BITS_PER_UNIT
-		       - split_store->orig_stores[0]->bitpos
-		       + op.bitpos) / BITS_PER_UNIT;
+		  poly_uint64 load_pos
+		    = exact_div (try_pos * BITS_PER_UNIT
+				 - split_store->orig_stores[0]->bitpos
+				 + op.bitpos,
+				 BITS_PER_UNIT);
 		  ops[j] = fold_build2 (MEM_REF, load_int_type, load_addr[j],
 					build_int_cst (offset_type, load_pos));
 		  if (TREE_CODE (ops[j]) == MEM_REF)
@@ -3811,30 +3813,28 @@ rhs_valid_for_store_merging_p (tree rhs)
    case.  */
 
 static tree
-mem_valid_for_store_merging (tree mem, unsigned HOST_WIDE_INT *pbitsize,
-			     unsigned HOST_WIDE_INT *pbitpos,
-			     unsigned HOST_WIDE_INT *pbitregion_start,
-			     unsigned HOST_WIDE_INT *pbitregion_end)
-{
-  HOST_WIDE_INT bitsize;
-  HOST_WIDE_INT bitpos;
-  unsigned HOST_WIDE_INT bitregion_start = 0;
-  unsigned HOST_WIDE_INT bitregion_end = 0;
+mem_valid_for_store_merging (tree mem, poly_uint64 *pbitsize,
+			     poly_uint64 *pbitpos,
+			     poly_uint64 *pbitregion_start,
+			     poly_uint64 *pbitregion_end)
+{
+  poly_int64 bitsize, bitpos;
+  poly_uint64 bitregion_start = 0, bitregion_end = 0;
   machine_mode mode;
   int unsignedp = 0, reversep = 0, volatilep = 0;
   tree offset;
   tree base_addr = get_inner_reference (mem, &bitsize, &bitpos, &offset, &mode,
 					&unsignedp, &reversep, &volatilep);
   *pbitsize = bitsize;
-  if (bitsize == 0)
+  if (known_eq (bitsize, 0))
     return NULL_TREE;
 
   if (TREE_CODE (mem) == COMPONENT_REF
       && DECL_BIT_FIELD_TYPE (TREE_OPERAND (mem, 1)))
     {
       get_bit_range (&bitregion_start, &bitregion_end, mem, &bitpos, &offset);
-      if (bitregion_end)
-	++bitregion_end;
+      if (maybe_ne (bitregion_end, 0U))
+	bitregion_end += 1;
     }
 
   if (reversep)
@@ -3850,24 +3850,20 @@ mem_valid_for_store_merging (tree mem, u
      PR 23684 and this way we can catch more chains.  */
   else if (TREE_CODE (base_addr) == MEM_REF)
     {
-      offset_int bit_off, byte_off = mem_ref_offset (base_addr);
-      bit_off = byte_off << LOG2_BITS_PER_UNIT;
+      poly_offset_int byte_off = mem_ref_offset (base_addr);
+      poly_offset_int bit_off = byte_off << LOG2_BITS_PER_UNIT;
       bit_off += bitpos;
-      if (!wi::neg_p (bit_off) && wi::fits_shwi_p (bit_off))
+      if (known_ge (bit_off, 0) && bit_off.to_shwi (&bitpos))
 	{
-	  bitpos = bit_off.to_shwi ();
-	  if (bitregion_end)
+	  if (maybe_ne (bitregion_end, 0U))
 	    {
 	      bit_off = byte_off << LOG2_BITS_PER_UNIT;
 	      bit_off += bitregion_start;
-	      if (wi::fits_uhwi_p (bit_off))
+	      if (bit_off.to_uhwi (&bitregion_start))
 		{
-		  bitregion_start = bit_off.to_uhwi ();
 		  bit_off = byte_off << LOG2_BITS_PER_UNIT;
 		  bit_off += bitregion_end;
-		  if (wi::fits_uhwi_p (bit_off))
-		    bitregion_end = bit_off.to_uhwi ();
-		  else
+		  if (!bit_off.to_uhwi (&bitregion_end))
 		    bitregion_end = 0;
 		}
 	      else
@@ -3882,15 +3878,15 @@ mem_valid_for_store_merging (tree mem, u
      address now.  */
   else
     {
-      if (bitpos < 0)
+      if (maybe_lt (bitpos, 0))
 	return NULL_TREE;
       base_addr = build_fold_addr_expr (base_addr);
     }
 
-  if (!bitregion_end)
+  if (known_eq (bitregion_end, 0U))
     {
-      bitregion_start = ROUND_DOWN (bitpos, BITS_PER_UNIT);
-      bitregion_end = ROUND_UP (bitpos + bitsize, BITS_PER_UNIT);
+      bitregion_start = round_down_to_byte_boundary (bitpos);
+      bitregion_end = round_up_to_byte_boundary (bitpos + bitsize);
     }
 
   if (offset != NULL_TREE)
@@ -3922,9 +3918,8 @@ mem_valid_for_store_merging (tree mem, u
 
 static bool
 handled_load (gimple *stmt, store_operand_info *op,
-	      unsigned HOST_WIDE_INT bitsize, unsigned HOST_WIDE_INT bitpos,
-	      unsigned HOST_WIDE_INT bitregion_start,
-	      unsigned HOST_WIDE_INT bitregion_end)
+	      poly_uint64 bitsize, poly_uint64 bitpos,
+	      poly_uint64 bitregion_start, poly_uint64 bitregion_end)
 {
   if (!is_gimple_assign (stmt))
     return false;
@@ -3956,10 +3951,12 @@ handled_load (gimple *stmt, store_operan
 				       &op->bitregion_start,
 				       &op->bitregion_end);
       if (op->base_addr != NULL_TREE
-	  && op->bitsize == bitsize
-	  && ((op->bitpos - bitpos) % BITS_PER_UNIT) == 0
-	  && op->bitpos - op->bitregion_start >= bitpos - bitregion_start
-	  && op->bitregion_end - op->bitpos >= bitregion_end - bitpos)
+	  && known_eq (op->bitsize, bitsize)
+	  && multiple_p (op->bitpos - bitpos, BITS_PER_UNIT)
+	  && known_ge (op->bitpos - op->bitregion_start,
+		       bitpos - bitregion_start)
+	  && known_ge (op->bitregion_end - op->bitpos,
+		       bitregion_end - bitpos))
 	{
 	  op->stmt = stmt;
 	  op->val = mem;
@@ -3978,18 +3975,18 @@ pass_store_merging::process_store (gimpl
 {
   tree lhs = gimple_assign_lhs (stmt);
   tree rhs = gimple_assign_rhs1 (stmt);
-  unsigned HOST_WIDE_INT bitsize, bitpos;
-  unsigned HOST_WIDE_INT bitregion_start;
-  unsigned HOST_WIDE_INT bitregion_end;
+  poly_uint64 bitsize, bitpos;
+  poly_uint64 bitregion_start, bitregion_end;
   tree base_addr
     = mem_valid_for_store_merging (lhs, &bitsize, &bitpos,
 				   &bitregion_start, &bitregion_end);
-  if (bitsize == 0)
+  if (known_eq (bitsize, 0U))
     return;
 
   bool invalid = (base_addr == NULL_TREE
-		  || ((bitsize > MAX_BITSIZE_MODE_ANY_INT)
-		       && (TREE_CODE (rhs) != INTEGER_CST)));
+		  || (maybe_gt (bitsize,
+				(unsigned int) MAX_BITSIZE_MODE_ANY_INT)
+		      && (TREE_CODE (rhs) != INTEGER_CST)));
   enum tree_code rhs_code = ERROR_MARK;
   bool bit_not_p = false;
   struct symbolic_number n;
@@ -4058,9 +4055,11 @@ pass_store_merging::process_store (gimpl
 	    invalid = true;
 	    break;
 	  }
-      if ((bitsize % BITS_PER_UNIT) == 0
-	  && (bitpos % BITS_PER_UNIT) == 0
-	  && bitsize <= 64
+      unsigned HOST_WIDE_INT const_bitsize;
+      if (bitsize.is_constant (&const_bitsize)
+	  && multiple_p (const_bitsize, BITS_PER_UNIT)
+	  && multiple_p (bitpos, BITS_PER_UNIT)
+	  && const_bitsize <= 64
 	  && BYTES_BIG_ENDIAN == WORDS_BIG_ENDIAN)
 	{
 	  ins_stmt = find_bswap_or_nop_1 (def_stmt, &n, 12);
@@ -4068,7 +4067,8 @@ pass_store_merging::process_store (gimpl
 	    {
 	      uint64_t nn = n.n;
 	      for (unsigned HOST_WIDE_INT i = 0;
-		   i < bitsize; i += BITS_PER_UNIT, nn >>= BITS_PER_MARKER)
+		   i < const_bitsize;
+		   i += BITS_PER_UNIT, nn >>= BITS_PER_MARKER)
 		if ((nn & MARKER_MASK) == 0
 		    || (nn & MARKER_MASK) == MARKER_BYTE_UNKNOWN)
 		  {
@@ -4089,7 +4089,13 @@ pass_store_merging::process_store (gimpl
 	}
     }
 
-  if (invalid)
+  unsigned HOST_WIDE_INT const_bitsize, const_bitpos;
+  unsigned HOST_WIDE_INT const_bitregion_start, const_bitregion_end;
+  if (invalid
+      || !bitsize.is_constant (&const_bitsize)
+      || !bitpos.is_constant (&const_bitpos)
+      || !bitregion_start.is_constant (&const_bitregion_start)
+      || !bitregion_end.is_constant (&const_bitregion_end))
     {
       terminate_all_aliasing_chains (NULL, stmt);
       return;
@@ -4106,9 +4112,10 @@ pass_store_merging::process_store (gimpl
   if (chain_info)
     {
       unsigned int ord = (*chain_info)->m_store_info.length ();
-      info = new store_immediate_info (bitsize, bitpos, bitregion_start,
-				       bitregion_end, stmt, ord, rhs_code,
-				       n, ins_stmt,
+      info = new store_immediate_info (const_bitsize, const_bitpos,
+				       const_bitregion_start,
+				       const_bitregion_end,
+				       stmt, ord, rhs_code, n, ins_stmt,
 				       bit_not_p, ops[0], ops[1]);
       if (dump_file && (dump_flags & TDF_DETAILS))
 	{
@@ -4135,9 +4142,10 @@ pass_store_merging::process_store (gimpl
   /* Start a new chain.  */
   struct imm_store_chain_info *new_chain
     = new imm_store_chain_info (m_stores_head, base_addr);
-  info = new store_immediate_info (bitsize, bitpos, bitregion_start,
-				   bitregion_end, stmt, 0, rhs_code,
-				   n, ins_stmt,
+  info = new store_immediate_info (const_bitsize, const_bitpos,
+				   const_bitregion_start,
+				   const_bitregion_end,
+				   stmt, 0, rhs_code, n, ins_stmt,
 				   bit_not_p, ops[0], ops[1]);
   new_chain->m_store_info.safe_push (info);
   m_stores.put (base_addr, new_chain);

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [045/nnn] poly_int: REG_ARGS_SIZE
  2017-10-23 17:19 ` [045/nnn] poly_int: REG_ARGS_SIZE Richard Sandiford
  2017-12-06  0:10   ` Jeff Law
@ 2017-12-22 21:56   ` Andreas Schwab
  2017-12-23  9:36     ` Richard Sandiford
  1 sibling, 1 reply; 302+ messages in thread
From: Andreas Schwab @ 2017-12-22 21:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford

This breaks gcc.dg/tls/opt-3.c, gcc.dg/tls/pr47715-3.c and
gcc.dg/tls/struct-1.c on m68k:

/daten/aranym/gcc/gcc-20171222/gcc/testsuite/gcc.dg/tls/opt-3.c:11:3: internal compiler error: in add_args_size_note, at rtlanal.c:2379
0xae7aa9 add_args_size_note(rtx_insn*, poly_int<1u, long>)
        ../../gcc/rtlanal.c:2379
0x7ea4ca fixup_args_size_notes(rtx_insn*, rtx_insn*, poly_int<1u, long>)
        ../../gcc/expr.c:4105
0x7f6a02 emit_single_push_insn
        ../../gcc/expr.c:4225
0x7fa412 emit_push_insn(rtx_def*, machine_mode, tree_node*, rtx_def*, unsigned int, int, rtx_def*, poly_int<1u, long>, rtx_def*, rtx_def*, int, rtx_def*, bool)
        ../../gcc/expr.c:4561
0x6b8976 store_one_arg
        ../../gcc/calls.c:5694
0x6c15b1 expand_call(tree_node*, rtx_def*, int)
        ../../gcc/calls.c:4030
0x7f0485 expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool)
        ../../gcc/expr.c:10927
0x6d6c97 expand_expr
        ../../gcc/expr.h:276
0x6d6c97 expand_call_stmt
        ../../gcc/cfgexpand.c:2690
0x6d6c97 expand_gimple_stmt_1
        ../../gcc/cfgexpand.c:3624
0x6d6c97 expand_gimple_stmt
        ../../gcc/cfgexpand.c:3790
0x6d8058 expand_gimple_tailcall
        ../../gcc/cfgexpand.c:3836
0x6d8058 expand_gimple_basic_block
        ../../gcc/cfgexpand.c:5774
0x6dd62e execute
        ../../gcc/cfgexpand.c:6403


Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [045/nnn] poly_int: REG_ARGS_SIZE
  2017-12-22 21:56   ` Andreas Schwab
@ 2017-12-23  9:36     ` Richard Sandiford
  2017-12-24 12:49       ` Andreas Schwab
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-12-23  9:36 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gcc-patches

Andreas Schwab <schwab@linux-m68k.org> writes:
> This breaks gcc.dg/tls/opt-3.c, gcc.dg/tls/pr47715-3.c and
> gcc.dg/tls/struct-1.c on m68k:
>
> /daten/aranym/gcc/gcc-20171222/gcc/testsuite/gcc.dg/tls/opt-3.c:11:3:
> internal compiler error: in add_args_size_note, at rtlanal.c:2379
> 0xae7aa9 add_args_size_note(rtx_insn*, poly_int<1u, long>)
>         ../../gcc/rtlanal.c:2379
> 0x7ea4ca fixup_args_size_notes(rtx_insn*, rtx_insn*, poly_int<1u, long>)
>         ../../gcc/expr.c:4105
> 0x7f6a02 emit_single_push_insn
>         ../../gcc/expr.c:4225
> 0x7fa412 emit_push_insn(rtx_def*, machine_mode, tree_node*, rtx_def*,
> unsigned int, int, rtx_def*, poly_int<1u, long>, rtx_def*, rtx_def*,
> int, rtx_def*, bool)
>         ../../gcc/expr.c:4561
> 0x6b8976 store_one_arg
>         ../../gcc/calls.c:5694
> 0x6c15b1 expand_call(tree_node*, rtx_def*, int)
>         ../../gcc/calls.c:4030
> 0x7f0485 expand_expr_real_1(tree_node*, rtx_def*, machine_mode,
> expand_modifier, rtx_def**, bool)
>         ../../gcc/expr.c:10927
> 0x6d6c97 expand_expr
>         ../../gcc/expr.h:276
> 0x6d6c97 expand_call_stmt
>         ../../gcc/cfgexpand.c:2690
> 0x6d6c97 expand_gimple_stmt_1
>         ../../gcc/cfgexpand.c:3624
> 0x6d6c97 expand_gimple_stmt
>         ../../gcc/cfgexpand.c:3790
> 0x6d8058 expand_gimple_tailcall
>         ../../gcc/cfgexpand.c:3836
> 0x6d8058 expand_gimple_basic_block
>         ../../gcc/cfgexpand.c:5774
> 0x6dd62e execute
>         ../../gcc/cfgexpand.c:6403

Bah.  Looks like I need to update my scripts to use --enable-tls,
since I'd ended up with emultls for the m68k targets.

I think the assert is catching a pre-existing bug here.  If we pushed
a value that needs a call to something like __tls_get_addr, we ended
up with two different REG_ARGS_SIZE notes on the same instruction.

It seems to be OK for emit_single_push_insn to push something that
needs a call to __tls_get_addr:

      /* We have to allow non-call_pop patterns for the case
	 of emit_single_push_insn of a TLS address.  */
      if (GET_CODE (pat) != PARALLEL)
	return 0;

so I think the problem is in the way this is handled rather than the fact
that it occurs at all.

If we're pushing a value X that needs a call C to calculate, we'll
add REG_ARGS_SIZE notes to the pushes and pops for C as part of the
call sequence.  Then emit_single_push_insn calls fixup_args_size_notes
on the whole push sequence (the calculation of X, including C,
and the push of X itself).  This is where the double notes came from.
But emit_single_push_insn_1 adjusted stack_pointer_delta *before* the
push, so the notes added for C were relative to the situation after
the future push of X rather than before it.

Presumably this didn't matter in practice because the note added
second tended to trump the note added first.  But code is allowed to
walk REG_NOTES without having to disregard secondary notes.

This patch seems to fix it for me, but I'm not sure how best to test it.

Thanks,
Richard


2017-12-23  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* expr.c (fixup_args_size_notes): Check that any existing
	REG_ARGS_SIZE notes are correct, and don't try to re-add them.
	(emit_single_push_insn_1): Move stack_pointer_delta adjustment to...
	(emit_single_push_insn): ...here.

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-12-23 09:29:20.226338285 +0000
+++ gcc/expr.c	2017-12-23 09:29:45.783339673 +0000
@@ -4089,6 +4089,14 @@ fixup_args_size_notes (rtx_insn *prev, r
       if (!NONDEBUG_INSN_P (insn))
 	continue;
 
+      /* We might have existing REG_ARGS_SIZE notes, e.g. when pushing
+	 a call argument containing a TLS address that itself requires
+	 a call to __tls_get_addr.  The handling of stack_pointer_delta
+	 in emit_single_push_insn is supposed to ensure that any such
+	 notes are already correct.  */
+      rtx note = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX);
+      gcc_assert (!note || known_eq (args_size, get_args_size (note)));
+
       poly_int64 this_delta = find_args_size_adjust (insn);
       if (known_eq (this_delta, 0))
 	{
@@ -4102,7 +4110,8 @@ fixup_args_size_notes (rtx_insn *prev, r
       if (known_eq (this_delta, HOST_WIDE_INT_MIN))
 	saw_unknown = true;
 
-      add_args_size_note (insn, args_size);
+      if (!note)
+	add_args_size_note (insn, args_size);
       if (STACK_GROWS_DOWNWARD)
 	this_delta = -poly_uint64 (this_delta);
 
@@ -4126,7 +4135,6 @@ emit_single_push_insn_1 (machine_mode mo
   rtx dest;
   enum insn_code icode;
 
-  stack_pointer_delta += PUSH_ROUNDING (GET_MODE_SIZE (mode));
   /* If there is push pattern, use it.  Otherwise try old way of throwing
      MEM representing push operation to move expander.  */
   icode = optab_handler (push_optab, mode);
@@ -4213,6 +4221,14 @@ emit_single_push_insn (machine_mode mode
 
   emit_single_push_insn_1 (mode, x, type);
 
+  /* Adjust stack_pointer_delta to describe the situation after the push
+     we just performed.  Note that we must do this after the push rather
+     than before the push in case calculating X needs pushes and pops of
+     its own (e.g. if calling __tls_get_addr).  The REG_ARGS_SIZE notes
+     for such pushes and pops must not include the effect of the future
+     push of X.  */
+  stack_pointer_delta += PUSH_ROUNDING (GET_MODE_SIZE (mode));
+
   last = get_last_insn ();
 
   /* Notice the common case where we emitted exactly one insn.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [045/nnn] poly_int: REG_ARGS_SIZE
  2017-12-23  9:36     ` Richard Sandiford
@ 2017-12-24 12:49       ` Andreas Schwab
  2017-12-28 20:37         ` RFA: Fix REG_ARGS_SIZE handling when pushing TLS addresses Richard Sandiford
  0 siblings, 1 reply; 302+ messages in thread
From: Andreas Schwab @ 2017-12-24 12:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: richard.sandiford

On Dez 23 2017, Richard Sandiford <richard.sandiford@linaro.org> wrote:

> gcc/
> 	* expr.c (fixup_args_size_notes): Check that any existing
> 	REG_ARGS_SIZE notes are correct, and don't try to re-add them.
> 	(emit_single_push_insn_1): Move stack_pointer_delta adjustment to...
> 	(emit_single_push_insn): ...here.

Successfully regtested on m68k-linux.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 302+ messages in thread

* RFA: Fix REG_ARGS_SIZE handling when pushing TLS addresses
  2017-12-24 12:49       ` Andreas Schwab
@ 2017-12-28 20:37         ` Richard Sandiford
  2018-01-02 19:07           ` Jeff Law
  0 siblings, 1 reply; 302+ messages in thread
From: Richard Sandiford @ 2017-12-28 20:37 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: gcc-patches

Andreas Schwab <schwab@linux-m68k.org> writes:
> On Dez 23 2017, Richard Sandiford <richard.sandiford@linaro.org> wrote:
>> gcc/
>> 	* expr.c (fixup_args_size_notes): Check that any existing
>> 	REG_ARGS_SIZE notes are correct, and don't try to re-add them.
>> 	(emit_single_push_insn_1): Move stack_pointer_delta adjustment to...
>> 	(emit_single_push_insn): ...here.
>
> Successfully regtested on m68k-linux.

Thanks.  Now also tested on aarch64-linux-gnu, x86_64-linux-gnu and
powerpc64-linux-gnu (not that that will give mucn coverage).  Also
tested with a before-and-after comparison of testsuite output for
a range of targets.  OK to install?

Richard


The new assert in add_args_size_note triggered for gcc.dg/tls/opt-3.c
and other on m68k.  This looks like a pre-existing bug: if we pushed
a value that needs a call to something like __tls_get_addr, we ended
up with two different REG_ARGS_SIZE notes on the same instruction.

It seems to be OK for emit_single_push_insn to push something that
needs a call to __tls_get_addr:

      /* We have to allow non-call_pop patterns for the case
	 of emit_single_push_insn of a TLS address.  */
      if (GET_CODE (pat) != PARALLEL)
	return 0;

so I think the bug is in the way this is handled rather than the fact
that it occurs at all.

If we're pushing a value X that needs a call C to calculate, we'll
add REG_ARGS_SIZE notes to the pushes and pops for C as part of the
call sequence.  Then emit_single_push_insn calls fixup_args_size_notes
on the whole push sequence (the calculation of X, including C,
and the push of X itself).  This is where the double notes came from.
But emit_single_push_insn_1 adjusted stack_pointer_delta *before* the
push, so the notes added for C were relative to the situation after
the future push of X rather than before it.

Presumably this didn't matter in practice because the note added
second tended to trump the note added first.  But code is allowed to
walk REG_NOTES without having to disregard secondary notes.

2017-12-23  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* expr.c (fixup_args_size_notes): Check that any existing
	REG_ARGS_SIZE notes are correct, and don't try to re-add them.
	(emit_single_push_insn_1): Move stack_pointer_delta adjustment to...
	(emit_single_push_insn): ...here.

Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-12-23 09:29:20.226338285 +0000
+++ gcc/expr.c	2017-12-23 09:29:45.783339673 +0000
@@ -4089,6 +4089,14 @@ fixup_args_size_notes (rtx_insn *prev, r
       if (!NONDEBUG_INSN_P (insn))
 	continue;
 
+      /* We might have existing REG_ARGS_SIZE notes, e.g. when pushing
+	 a call argument containing a TLS address that itself requires
+	 a call to __tls_get_addr.  The handling of stack_pointer_delta
+	 in emit_single_push_insn is supposed to ensure that any such
+	 notes are already correct.  */
+      rtx note = find_reg_note (insn, REG_ARGS_SIZE, NULL_RTX);
+      gcc_assert (!note || known_eq (args_size, get_args_size (note)));
+
       poly_int64 this_delta = find_args_size_adjust (insn);
       if (known_eq (this_delta, 0))
 	{
@@ -4102,7 +4110,8 @@ fixup_args_size_notes (rtx_insn *prev, r
       if (known_eq (this_delta, HOST_WIDE_INT_MIN))
 	saw_unknown = true;
 
-      add_args_size_note (insn, args_size);
+      if (!note)
+	add_args_size_note (insn, args_size);
       if (STACK_GROWS_DOWNWARD)
 	this_delta = -poly_uint64 (this_delta);
 
@@ -4126,7 +4135,6 @@ emit_single_push_insn_1 (machine_mode mo
   rtx dest;
   enum insn_code icode;
 
-  stack_pointer_delta += PUSH_ROUNDING (GET_MODE_SIZE (mode));
   /* If there is push pattern, use it.  Otherwise try old way of throwing
      MEM representing push operation to move expander.  */
   icode = optab_handler (push_optab, mode);
@@ -4213,6 +4221,14 @@ emit_single_push_insn (machine_mode mode
 
   emit_single_push_insn_1 (mode, x, type);
 
+  /* Adjust stack_pointer_delta to describe the situation after the push
+     we just performed.  Note that we must do this after the push rather
+     than before the push in case calculating X needs pushes and pops of
+     its own (e.g. if calling __tls_get_addr).  The REG_ARGS_SIZE notes
+     for such pushes and pops must not include the effect of the future
+     push of X.  */
+  stack_pointer_delta += PUSH_ROUNDING (GET_MODE_SIZE (mode));
+
   last = get_last_insn ();
 
   /* Notice the common case where we emitted exactly one insn.  */

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: RFA: Fix REG_ARGS_SIZE handling when pushing TLS addresses
  2017-12-28 20:37         ` RFA: Fix REG_ARGS_SIZE handling when pushing TLS addresses Richard Sandiford
@ 2018-01-02 19:07           ` Jeff Law
  0 siblings, 0 replies; 302+ messages in thread
From: Jeff Law @ 2018-01-02 19:07 UTC (permalink / raw)
  To: Andreas Schwab, gcc-patches, richard.sandiford

On 12/28/2017 01:37 PM, Richard Sandiford wrote:
> Andreas Schwab <schwab@linux-m68k.org> writes:
>> On Dez 23 2017, Richard Sandiford <richard.sandiford@linaro.org> wrote:
>>> gcc/
>>> 	* expr.c (fixup_args_size_notes): Check that any existing
>>> 	REG_ARGS_SIZE notes are correct, and don't try to re-add them.
>>> 	(emit_single_push_insn_1): Move stack_pointer_delta adjustment to...
>>> 	(emit_single_push_insn): ...here.
>>
>> Successfully regtested on m68k-linux.
> 
> Thanks.  Now also tested on aarch64-linux-gnu, x86_64-linux-gnu and
> powerpc64-linux-gnu (not that that will give mucn coverage).  Also
> tested with a before-and-after comparison of testsuite output for
> a range of targets.  OK to install?
> 
> Richard
> 
> 
> The new assert in add_args_size_note triggered for gcc.dg/tls/opt-3.c
> and other on m68k.  This looks like a pre-existing bug: if we pushed
> a value that needs a call to something like __tls_get_addr, we ended
> up with two different REG_ARGS_SIZE notes on the same instruction.
> 
> It seems to be OK for emit_single_push_insn to push something that
> needs a call to __tls_get_addr:
> 
>       /* We have to allow non-call_pop patterns for the case
> 	 of emit_single_push_insn of a TLS address.  */
>       if (GET_CODE (pat) != PARALLEL)
> 	return 0;
> 
> so I think the bug is in the way this is handled rather than the fact
> that it occurs at all.
> 
> If we're pushing a value X that needs a call C to calculate, we'll
> add REG_ARGS_SIZE notes to the pushes and pops for C as part of the
> call sequence.  Then emit_single_push_insn calls fixup_args_size_notes
> on the whole push sequence (the calculation of X, including C,
> and the push of X itself).  This is where the double notes came from.
> But emit_single_push_insn_1 adjusted stack_pointer_delta *before* the
> push, so the notes added for C were relative to the situation after
> the future push of X rather than before it.
> 
> Presumably this didn't matter in practice because the note added
> second tended to trump the note added first.  But code is allowed to
> walk REG_NOTES without having to disregard secondary notes.
> 
> 2017-12-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 
> gcc/
> 	* expr.c (fixup_args_size_notes): Check that any existing
> 	REG_ARGS_SIZE notes are correct, and don't try to re-add them.
> 	(emit_single_push_insn_1): Move stack_pointer_delta adjustment to...
> 	(emit_single_push_insn): ...here.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* [PATCH] Fix gcc.dg/vect-opt-info-1.c testcase
  2017-10-23 17:26 ` [063/nnn] poly_int: vectoriser vf and uf Richard Sandiford
  2017-12-06  2:46   ` Jeff Law
@ 2018-01-03 21:23   ` Jakub Jelinek
  2018-01-03 21:30     ` Richard Sandiford
  2018-01-04 17:32     ` Jeff Law
  1 sibling, 2 replies; 302+ messages in thread
From: Jakub Jelinek @ 2018-01-03 21:23 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: gcc-patches

On Mon, Oct 23, 2017 at 06:26:12PM +0100, Richard Sandiford wrote:
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
...

> --- /dev/null	2017-10-21 08:51:42.385141415 +0100
> +++ gcc/testsuite/gcc.dg/vect-opt-info-1.c	2017-10-23 17:22:26.571498977 +0100
> @@ -0,0 +1,11 @@
> +/* { dg-options "-std=c99 -fopt-info -O3" } */
> +
> +void
> +vadd (int *dst, int *op1, int *op2, int count)
> +{
> +  for (int i = 0; i < count; ++i)
> +    dst[i] = op1[i] + op2[i];
> +}
> +
> +/* { dg-message "loop vectorized" "" { target *-*-* } 6 } */
> +/* { dg-message "loop versioned for vectorization because of possible aliasing" "" { target *-*-* } 6 } */

This testcase fails e.g. on i686-linux.  The problem is
1) it really should be at least guarded with
/* { dg-do compile { target vect_int } } */
because on targets that can't vectorize even simple int operations
this will obviously fail
2) that won't help for i686 though, because we need -msse2 added
to options for it to work; that is normally added by
check_vect_support_and_set_flags
only when in vect.exp.  If it was just that target, we could add
dg-additional-options, but I'm afraid many other targets add some options.

The following works for me, calling it nodump-* ensures that
-fdump-tree-* isn't added, which I believe is essential for the testcase;
tested on x86_64-linux with
RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse,-m64\} vect.exp=nodump*'
ok for trunk?

Sadly I don't have your broken development version of the patch, so can't
verify it fails with the broken patch.

2018-01-03  Jakub Jelinek  <jakub@redhat.com>

	* gcc.dg/vect-opt-info-1.c: Moved to ...
	* gcc.dg/vect/nodump-vect-opt-info-1.c: ... here.  Only run on
	vect_int targets, use dg-additional-options instead of dg-options and
	use relative line numbers instead of absolute.

--- gcc/testsuite/gcc.dg/vect-opt-info-1.c.jj	2018-01-03 10:04:47.568412808 +0100
+++ gcc/testsuite/gcc.dg/vect-opt-info-1.c	2018-01-03 22:14:44.082848915 +0100
@@ -1,11 +0,0 @@
-/* { dg-options "-std=c99 -fopt-info -O3" } */
-
-void
-vadd (int *dst, int *op1, int *op2, int count)
-{
-  for (int i = 0; i < count; ++i)
-    dst[i] = op1[i] + op2[i];
-}
-
-/* { dg-message "loop vectorized" "" { target *-*-* } 6 } */
-/* { dg-message "loop versioned for vectorization because of possible aliasing" "" { target *-*-* } 6 } */
--- gcc/testsuite/gcc.dg/vect/nodump-vect-opt-info-1.c.jj	2018-01-03 22:14:49.387852927 +0100
+++ gcc/testsuite/gcc.dg/vect/nodump-vect-opt-info-1.c	2018-01-03 22:17:30.437974412 +0100
@@ -0,0 +1,11 @@
+/* { dg-do compile { target vect_int } } */
+/* { dg-additional-options "-std=c99 -fopt-info -O3" } */
+
+void
+vadd (int *dst, int *op1, int *op2, int count)
+{
+/* { dg-message "loop vectorized" "" { target *-*-* } .+2 } */
+/* { dg-message "loop versioned for vectorization because of possible aliasing" "" { target *-*-* } .+1 } */
+  for (int i = 0; i < count; ++i)
+    dst[i] = op1[i] + op2[i];
+}

	Jakub

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [PATCH] Fix gcc.dg/vect-opt-info-1.c testcase
  2018-01-03 21:23   ` [PATCH] Fix gcc.dg/vect-opt-info-1.c testcase Jakub Jelinek
@ 2018-01-03 21:30     ` Richard Sandiford
  2018-01-04 17:32     ` Jeff Law
  1 sibling, 0 replies; 302+ messages in thread
From: Richard Sandiford @ 2018-01-03 21:30 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches

Jakub Jelinek <jakub@redhat.com> writes:
> On Mon, Oct 23, 2017 at 06:26:12PM +0100, Richard Sandiford wrote:
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
> ...
>
>> --- /dev/null	2017-10-21 08:51:42.385141415 +0100
>> +++ gcc/testsuite/gcc.dg/vect-opt-info-1.c 2017-10-23
>> 17:22:26.571498977 +0100
>> @@ -0,0 +1,11 @@
>> +/* { dg-options "-std=c99 -fopt-info -O3" } */
>> +
>> +void
>> +vadd (int *dst, int *op1, int *op2, int count)
>> +{
>> +  for (int i = 0; i < count; ++i)
>> +    dst[i] = op1[i] + op2[i];
>> +}
>> +
>> +/* { dg-message "loop vectorized" "" { target *-*-* } 6 } */
>> +/* { dg-message "loop versioned for vectorization because of possible
>> aliasing" "" { target *-*-* } 6 } */
>
> This testcase fails e.g. on i686-linux.  The problem is
> 1) it really should be at least guarded with
> /* { dg-do compile { target vect_int } } */
> because on targets that can't vectorize even simple int operations
> this will obviously fail

Hmm, yeah.

> 2) that won't help for i686 though, because we need -msse2 added
> to options for it to work; that is normally added by
> check_vect_support_and_set_flags
> only when in vect.exp.  If it was just that target, we could add
> dg-additional-options, but I'm afraid many other targets add some options.
>
> The following works for me, calling it nodump-* ensures that
> -fdump-tree-* isn't added, which I believe is essential for the testcase;

Yeah, that's right, the bug was using dump_file when dump_enabled_p (),
which would segfault when -fopt-info was passed and -fdump-tree-vect*
wasn't.

> tested on x86_64-linux with
> RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse,-m64\} vect.exp=nodump*'
> ok for trunk?
>
> Sadly I don't have your broken development version of the patch, so can't
> verify it fails with the broken patch.

Me neither any more, but it looks good to me FWIW.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [PATCH] Fix gcc.dg/vect-opt-info-1.c testcase
  2018-01-03 21:23   ` [PATCH] Fix gcc.dg/vect-opt-info-1.c testcase Jakub Jelinek
  2018-01-03 21:30     ` Richard Sandiford
@ 2018-01-04 17:32     ` Jeff Law
  1 sibling, 0 replies; 302+ messages in thread
From: Jeff Law @ 2018-01-04 17:32 UTC (permalink / raw)
  To: Jakub Jelinek, Richard Sandiford; +Cc: gcc-patches

On 01/03/2018 02:23 PM, Jakub Jelinek wrote:
> On Mon, Oct 23, 2017 at 06:26:12PM +0100, Richard Sandiford wrote:
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
> ...
> 
>> --- /dev/null	2017-10-21 08:51:42.385141415 +0100
>> +++ gcc/testsuite/gcc.dg/vect-opt-info-1.c	2017-10-23 17:22:26.571498977 +0100
>> @@ -0,0 +1,11 @@
>> +/* { dg-options "-std=c99 -fopt-info -O3" } */
>> +
>> +void
>> +vadd (int *dst, int *op1, int *op2, int count)
>> +{
>> +  for (int i = 0; i < count; ++i)
>> +    dst[i] = op1[i] + op2[i];
>> +}
>> +
>> +/* { dg-message "loop vectorized" "" { target *-*-* } 6 } */
>> +/* { dg-message "loop versioned for vectorization because of possible aliasing" "" { target *-*-* } 6 } */
> 
> This testcase fails e.g. on i686-linux.  The problem is
> 1) it really should be at least guarded with
> /* { dg-do compile { target vect_int } } */
> because on targets that can't vectorize even simple int operations
> this will obviously fail
> 2) that won't help for i686 though, because we need -msse2 added
> to options for it to work; that is normally added by
> check_vect_support_and_set_flags
> only when in vect.exp.  If it was just that target, we could add
> dg-additional-options, but I'm afraid many other targets add some options.
> 
> The following works for me, calling it nodump-* ensures that
> -fdump-tree-* isn't added, which I believe is essential for the testcase;
> tested on x86_64-linux with
> RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse,-m64\} vect.exp=nodump*'
> ok for trunk?
> 
> Sadly I don't have your broken development version of the patch, so can't
> verify it fails with the broken patch.
> 
> 2018-01-03  Jakub Jelinek  <jakub@redhat.com>
> 
> 	* gcc.dg/vect-opt-info-1.c: Moved to ...
> 	* gcc.dg/vect/nodump-vect-opt-info-1.c: ... here.  Only run on
> 	vect_int targets, use dg-additional-options instead of dg-options and
> 	use relative line numbers instead of absolute.
OK.
jeff

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [040/nnn] poly_int: get_inner_reference & co.
  2017-10-23 17:18 ` [040/nnn] poly_int: get_inner_reference & co Richard Sandiford
  2017-12-06 17:26   ` Jeff Law
@ 2018-12-21 11:17   ` Thomas Schwinge
  2018-12-21 11:40     ` Jakub Jelinek
  1 sibling, 1 reply; 302+ messages in thread
From: Thomas Schwinge @ 2018-12-21 11:17 UTC (permalink / raw)
  To: Richard Sandiford, gcc-patches; +Cc: Julian Brown, jakub

[-- Attachment #1: Type: text/plain, Size: 1591 bytes --]

Hi!

On Mon, 23 Oct 2017 18:17:38 +0100, Richard Sandiford <richard.sandiford@linaro.org> wrote:
> This patch makes get_inner_reference and ptr_difference_const return the
> bit size and bit position as poly_int64s rather than HOST_WIDE_INTS.
> The non-mechanical changes were handled by previous patches.

(A variant of that got committed to trunk in r255914.)

> --- gcc/gimplify.c	2017-10-23 17:11:40.246949037 +0100
> +++ gcc/gimplify.c	2017-10-23 17:18:47.663057272 +0100

> @@ -8056,13 +8056,13 @@ gimplify_scan_omp_clauses (tree *list_p,

> -			    if (bitpos2)
> -			      o2 = o2 + bitpos2 / BITS_PER_UNIT;
> -			    if (wi::ltu_p (o1, o2)
> -				|| (wi::eq_p (o1, o2) && bitpos < bitpos2))
> +			    o2 += bits_to_bytes_round_down (bitpos2);
> +			    if (may_lt (o1, o2)
> +				|| (must_eq (o1, 2)
> +				    && may_lt (bitpos, bitpos2)))
>  			      {

("must_eq" is nowadays known as "known_eq".)  As Julian points out in
<https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00824.html> (thanks!,
but please, bug fixes separate from code refactoring), there is an
'apparent bug introduced [...]: "known_eq (o1, 2)" should have been
"known_eq (o1, o2)"'.

I have not searched now for any other such issues -- could this one have
been (or, any others now be) found automatically?

OK to fix (on all relevant branches) this as in the patch attached?  If
approving this patch, please respond with "Reviewed-by: NAME <EMAIL>" so
that your effort will be recorded in the commit log, see
<https://gcc.gnu.org/wiki/Reviewed-by>.


Grüße
 Thomas



[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-poly_int-get_inner_reference-co.-fix-known_eq-typo-b.patch --]
[-- Type: text/x-diff, Size: 820 bytes --]

From 0396c4087114d4a63824d89ff33110b76d607768 Mon Sep 17 00:00:00 2001
From: Thomas Schwinge <thomas@codesourcery.com>
Date: Fri, 21 Dec 2018 11:58:45 +0100
Subject: [PATCH] poly_int: get_inner_reference & co.: fix known_eq typo/bug

	gcc/
	* gimplify.c (gimplify_scan_omp_clauses): Fix known_eq typo/bug.
---
 gcc/gimplify.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 465d138abbed..40ed18e30271 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -8719,7 +8719,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 			      o2 = 0;
 			    o2 += bits_to_bytes_round_down (bitpos2);
 			    if (maybe_lt (o1, o2)
-				|| (known_eq (o1, 2)
+				|| (known_eq (o1, o2)
 				    && maybe_lt (bitpos, bitpos2)))
 			      {
 				if (ptr)
-- 
2.17.1


^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [040/nnn] poly_int: get_inner_reference & co.
  2018-12-21 11:17   ` Thomas Schwinge
@ 2018-12-21 11:40     ` Jakub Jelinek
  2018-12-28 14:34       ` Thomas Schwinge
  0 siblings, 1 reply; 302+ messages in thread
From: Jakub Jelinek @ 2018-12-21 11:40 UTC (permalink / raw)
  To: Thomas Schwinge; +Cc: Richard Sandiford, gcc-patches, Julian Brown

On Fri, Dec 21, 2018 at 12:10:26PM +0100, Thomas Schwinge wrote:
> 	gcc/
> 	* gimplify.c (gimplify_scan_omp_clauses): Fix known_eq typo/bug.

Ok, thanks.

> ---
>  gcc/gimplify.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 465d138abbed..40ed18e30271 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -8719,7 +8719,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
>  			      o2 = 0;
>  			    o2 += bits_to_bytes_round_down (bitpos2);
>  			    if (maybe_lt (o1, o2)
> -				|| (known_eq (o1, 2)
> +				|| (known_eq (o1, o2)
>  				    && maybe_lt (bitpos, bitpos2)))
>  			      {
>  				if (ptr)
> -- 
> 2.17.1
> 


	Jakub

^ permalink raw reply	[flat|nested] 302+ messages in thread

* Re: [040/nnn] poly_int: get_inner_reference & co.
  2018-12-21 11:40     ` Jakub Jelinek
@ 2018-12-28 14:34       ` Thomas Schwinge
  0 siblings, 0 replies; 302+ messages in thread
From: Thomas Schwinge @ 2018-12-28 14:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, Richard Sandiford, Julian Brown

Hi!

On Fri, 21 Dec 2018 12:17:17 +0100, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Dec 21, 2018 at 12:10:26PM +0100, Thomas Schwinge wrote:
> > [...]
> 
> Ok, thanks.

Committed to trunk in r267447:

commit a91917652db3a84fc69baf05468ba6f1aad655b7
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Dec 28 11:34:03 2018 +0000

    poly_int: get_inner_reference & co.: fix known_eq typo/bug
    
            gcc/
            * gimplify.c (gimplify_scan_omp_clauses): Fix known_eq typo/bug.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@267447 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  | 5 +++++
 gcc/gimplify.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 310d4f03e0a0..feaa4251815a 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,8 @@
+2018-12-28  Thomas Schwinge  <thomas@codesourcery.com>
+	    Julian Brown  <julian@codesourcery.com>
+
+	* gimplify.c (gimplify_scan_omp_clauses): Fix known_eq typo/bug.
+
 2018-12-27  Jan Hubicka  <hubicka@ucw.cz>
 
 	* ipa-devirt.c (polymorphic_call_target_d): Add n_odr_types.
diff --git gcc/gimplify.c gcc/gimplify.c
index 465d138abbed..40ed18e30271 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -8719,7 +8719,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 			      o2 = 0;
 			    o2 += bits_to_bytes_round_down (bitpos2);
 			    if (maybe_lt (o1, o2)
-				|| (known_eq (o1, 2)
+				|| (known_eq (o1, o2)
 				    && maybe_lt (bitpos, bitpos2)))
 			      {
 				if (ptr)

Committed to gcc-8-branch in r267449:

commit 3637548fdd871a96522c32f03685794bfc0a3b76
Author: tschwinge <tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4>
Date:   Fri Dec 28 11:35:40 2018 +0000

    poly_int: get_inner_reference & co.: fix known_eq typo/bug
    
            gcc/
            * gimplify.c (gimplify_scan_omp_clauses): Fix known_eq typo/bug.
    
    trunk r267447
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-8-branch@267449 138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog  | 5 +++++
 gcc/gimplify.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index f79f520496d4..e6960445024c 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,8 @@
+2018-12-28  Thomas Schwinge  <thomas@codesourcery.com>
+	    Julian Brown  <julian@codesourcery.com>
+
+	* gimplify.c (gimplify_scan_omp_clauses): Fix known_eq typo/bug.
+
 2018-12-27  Martin Liska  <mliska@suse.cz>
 
 	Backport from mainline
diff --git gcc/gimplify.c gcc/gimplify.c
index 43cb891ae3d9..8f1328691cf5 100644
--- gcc/gimplify.c
+++ gcc/gimplify.c
@@ -8171,7 +8171,7 @@ gimplify_scan_omp_clauses (tree *list_p, gimple_seq *pre_p,
 			      o2 = 0;
 			    o2 += bits_to_bytes_round_down (bitpos2);
 			    if (maybe_lt (o1, o2)
-				|| (known_eq (o1, 2)
+				|| (known_eq (o1, o2)
 				    && maybe_lt (bitpos, bitpos2)))
 			      {
 				if (ptr)


Grüße
 Thomas

^ permalink raw reply	[flat|nested] 302+ messages in thread

end of thread, other threads:[~2018-12-28 11:46 UTC | newest]

Thread overview: 302+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-23 16:57 [000/nnn] poly_int: representation of runtime offsets and sizes Richard Sandiford
2017-10-23 16:58 ` [001/nnn] poly_int: add poly-int.h Richard Sandiford
2017-10-25 16:17   ` Martin Sebor
2017-11-08  9:44     ` Richard Sandiford
2017-11-08 16:51       ` Martin Sebor
2017-11-08 16:56         ` Richard Sandiford
2017-11-08 17:33           ` Martin Sebor
2017-11-08 17:34           ` Martin Sebor
2017-11-08 18:34             ` Richard Sandiford
2017-11-09  9:10               ` Martin Sebor
2017-11-09 11:14                 ` Richard Sandiford
2017-11-09 17:42                   ` Martin Sebor
2017-11-13 17:59                   ` Jeff Law
2017-11-13 23:57                     ` Richard Sandiford
2017-11-14  1:21                       ` Martin Sebor
2017-11-14  9:46                         ` Richard Sandiford
2017-11-17  3:31                       ` Jeff Law
2017-11-08 10:03   ` Richard Sandiford
2017-11-14  0:42     ` Richard Sandiford
2017-12-06 20:11       ` Jeff Law
2017-12-07 14:46         ` Richard Biener
2017-12-07 15:08           ` Jeff Law
2017-12-07 22:39             ` Richard Sandiford
2017-12-07 22:48               ` Jeff Law
2017-12-15  3:40                 ` Martin Sebor
2017-12-15  9:08                   ` Richard Biener
2017-12-15 15:19                     ` Jeff Law
2017-10-23 16:59 ` [002/nnn] poly_int: IN_TARGET_CODE Richard Sandiford
2017-11-17  3:35   ` Jeff Law
2017-12-15  1:08     ` Richard Sandiford
2017-12-15 15:22       ` Jeff Law
2017-10-23 17:00 ` [003/nnn] poly_int: MACRO_MODE Richard Sandiford
2017-11-17  3:36   ` Jeff Law
2017-10-23 17:00 ` [004/nnn] poly_int: mode query functions Richard Sandiford
2017-11-17  3:37   ` Jeff Law
2017-10-23 17:01 ` [005/nnn] poly_int: rtx constants Richard Sandiford
2017-11-17  4:17   ` Jeff Law
2017-12-15  1:25     ` Richard Sandiford
2017-12-19  4:52       ` Jeff Law
2017-10-23 17:02 ` [006/nnn] poly_int: tree constants Richard Sandiford
2017-10-25 17:14   ` Martin Sebor
2017-10-25 21:35     ` Richard Sandiford
2017-10-26  5:52       ` Martin Sebor
2017-10-26  8:40         ` Richard Sandiford
2017-10-26 16:45           ` Martin Sebor
2017-10-26 18:05             ` Richard Sandiford
2017-10-26 23:53               ` Martin Sebor
2017-10-27  8:33                 ` Richard Sandiford
2017-10-29 16:56                   ` Martin Sebor
2017-10-30  6:36                     ` Trevor Saunders
2017-10-31 20:25                       ` Martin Sebor
2017-10-26 18:11             ` Pedro Alves
2017-10-26 19:12               ` Martin Sebor
2017-10-26 19:19                 ` Pedro Alves
2017-10-26 23:41                   ` Martin Sebor
2017-10-30 10:26                     ` Pedro Alves
2017-10-31 16:12                       ` Martin Sebor
2017-11-17  4:51   ` Jeff Law
2017-11-18 15:48     ` Richard Sandiford
2017-10-23 17:02 ` [007/nnn] poly_int: dump routines Richard Sandiford
2017-11-17  3:38   ` Jeff Law
2017-10-23 17:03 ` [008/nnn] poly_int: create_integer_operand Richard Sandiford
2017-11-17  3:40   ` Jeff Law
2017-10-23 17:04 ` [010/nnn] poly_int: REG_OFFSET Richard Sandiford
2017-11-17  3:41   ` Jeff Law
2017-10-23 17:04 ` [009/nnn] poly_int: TRULY_NOOP_TRUNCATION Richard Sandiford
2017-11-17  3:40   ` Jeff Law
2017-10-23 17:05 ` [013/nnn] poly_int: same_addr_size_stores_p Richard Sandiford
2017-11-17  4:11   ` Jeff Law
2017-10-23 17:05 ` [011/nnn] poly_int: DWARF locations Richard Sandiford
2017-11-17 17:40   ` Jeff Law
2017-10-23 17:05 ` [012/nnn] poly_int: fold_ctor_reference Richard Sandiford
2017-11-17  3:59   ` Jeff Law
2017-10-23 17:06 ` [015/nnn] poly_int: ao_ref and vn_reference_op_t Richard Sandiford
2017-11-18  4:25   ` Jeff Law
2017-10-23 17:06 ` [014/nnn] poly_int: indirect_refs_may_alias_p Richard Sandiford
2017-11-17 18:11   ` Jeff Law
2017-11-20 13:31     ` Richard Sandiford
2017-11-21  0:49       ` Jeff Law
2017-10-23 17:07 ` [016/nnn] poly_int: dse.c Richard Sandiford
2017-11-18  4:30   ` Jeff Law
2017-10-23 17:07 ` [017/nnn] poly_int: rtx_addr_can_trap_p_1 Richard Sandiford
2017-11-18  4:46   ` Jeff Law
2017-10-23 17:08 ` [020/nnn] poly_int: store_bit_field bitrange Richard Sandiford
2017-12-05 23:43   ` Jeff Law
2017-10-23 17:08 ` [019/nnn] poly_int: lra frame offsets Richard Sandiford
2017-12-06  0:16   ` Jeff Law
2017-10-23 17:08 ` [018/nnn] poly_int: MEM_OFFSET and MEM_SIZE Richard Sandiford
2017-12-06 18:27   ` Jeff Law
2017-10-23 17:09 ` [023/nnn] poly_int: store_field & co Richard Sandiford
2017-12-05 23:49   ` Jeff Law
2017-10-23 17:09 ` [021/nnn] poly_int: extract_bit_field bitrange Richard Sandiford
2017-12-05 23:46   ` Jeff Law
2017-10-23 17:09 ` [022/nnn] poly_int: C++ bitfield regions Richard Sandiford
2017-12-05 23:39   ` Jeff Law
2017-10-23 17:10 ` [025/nnn] poly_int: SUBREG_BYTE Richard Sandiford
2017-12-06 18:50   ` Jeff Law
2017-10-23 17:10 ` [024/nnn] poly_int: ira subreg liveness tracking Richard Sandiford
2017-11-28 21:10   ` Jeff Law
2017-12-05 21:54     ` Richard Sandiford
2017-10-23 17:11 ` [026/nnn] poly_int: operand_subword Richard Sandiford
2017-11-28 17:51   ` Jeff Law
2017-10-23 17:11 ` [027/nnn] poly_int: DWARF CFA offsets Richard Sandiford
2017-12-06  0:40   ` Jeff Law
2017-10-23 17:12 ` [030/nnn] poly_int: get_addr_unit_base_and_extent Richard Sandiford
2017-12-06  0:26   ` Jeff Law
2017-10-23 17:12 ` [029/nnn] poly_int: get_ref_base_and_extent Richard Sandiford
2017-12-06 20:03   ` Jeff Law
2017-10-23 17:12 ` [028/nnn] poly_int: ipa_parm_adjustment Richard Sandiford
2017-11-28 17:47   ` Jeff Law
2017-10-23 17:13 ` [033/nnn] poly_int: pointer_may_wrap_p Richard Sandiford
2017-11-28 17:44   ` Jeff Law
2017-10-23 17:13 ` [031/nnn] poly_int: aff_tree Richard Sandiford
2017-12-06  0:04   ` Jeff Law
2017-10-23 17:13 ` [032/nnn] poly_int: symbolic_number Richard Sandiford
2017-11-28 17:45   ` Jeff Law
2017-10-23 17:14 ` [035/nnn] poly_int: expand_debug_expr Richard Sandiford
2017-12-05 17:08   ` Jeff Law
2017-10-23 17:14 ` [034/nnn] poly_int: get_inner_reference_aff Richard Sandiford
2017-11-28 17:56   ` Jeff Law
2017-10-23 17:14 ` [036/nnn] poly_int: get_object_alignment_2 Richard Sandiford
2017-11-28 17:37   ` Jeff Law
2017-10-23 17:16 ` [037/nnn] poly_int: get_bit_range Richard Sandiford
2017-12-05 23:19   ` Jeff Law
2017-10-23 17:17 ` [038/nnn] poly_int: fold_comparison Richard Sandiford
2017-11-28 21:47   ` Jeff Law
2017-10-23 17:17 ` [039/nnn] poly_int: pass_store_merging::execute Richard Sandiford
2017-11-28 18:00   ` Jeff Law
2017-12-20 12:59     ` Richard Sandiford
2017-10-23 17:18 ` [040/nnn] poly_int: get_inner_reference & co Richard Sandiford
2017-12-06 17:26   ` Jeff Law
2018-12-21 11:17   ` Thomas Schwinge
2018-12-21 11:40     ` Jakub Jelinek
2018-12-28 14:34       ` Thomas Schwinge
2017-10-23 17:18 ` [042/nnn] poly_int: reload1.c Richard Sandiford
2017-12-05 17:23   ` Jeff Law
2017-10-23 17:18 ` [041/nnn] poly_int: reload.c Richard Sandiford
2017-12-05 17:10   ` Jeff Law
2017-10-23 17:19 ` [045/nnn] poly_int: REG_ARGS_SIZE Richard Sandiford
2017-12-06  0:10   ` Jeff Law
2017-12-22 21:56   ` Andreas Schwab
2017-12-23  9:36     ` Richard Sandiford
2017-12-24 12:49       ` Andreas Schwab
2017-12-28 20:37         ` RFA: Fix REG_ARGS_SIZE handling when pushing TLS addresses Richard Sandiford
2018-01-02 19:07           ` Jeff Law
2017-10-23 17:19 ` [044/nnn] poly_int: push_block/emit_push_insn Richard Sandiford
2017-11-28 22:18   ` Jeff Law
2017-10-23 17:19 ` [043/nnn] poly_int: frame allocations Richard Sandiford
2017-12-06  3:15   ` Jeff Law
2017-10-23 17:20 ` [047/nnn] poly_int: argument sizes Richard Sandiford
2017-12-06 20:57   ` Jeff Law
2017-12-20 11:37     ` Richard Sandiford
2017-10-23 17:20 ` [046/nnn] poly_int: instantiate_virtual_regs Richard Sandiford
2017-11-28 18:00   ` Jeff Law
2017-10-23 17:21 ` [048/nnn] poly_int: cfgexpand stack variables Richard Sandiford
2017-12-05 23:22   ` Jeff Law
2017-10-23 17:21 ` [050/nnn] poly_int: reload<->ira interface Richard Sandiford
2017-11-28 16:55   ` Jeff Law
2017-10-23 17:21 ` [049/nnn] poly_int: emit_inc Richard Sandiford
2017-11-28 17:30   ` Jeff Law
2017-10-23 17:22 ` [051/nnn] poly_int: emit_group_load/store Richard Sandiford
2017-12-05 23:26   ` Jeff Law
2017-10-23 17:22 ` [052/nnn] poly_int: bit_field_size/offset Richard Sandiford
2017-12-05 17:25   ` Jeff Law
2017-10-23 17:22 ` [053/nnn] poly_int: decode_addr_const Richard Sandiford
2017-11-28 16:53   ` Jeff Law
2017-10-23 17:23 ` [055/nnn] poly_int: find_bswap_or_nop_load Richard Sandiford
2017-11-28 16:52   ` Jeff Law
2017-10-23 17:23 ` [054/nnn] poly_int: adjust_ptr_info_misalignment Richard Sandiford
2017-11-28 16:53   ` Jeff Law
2017-10-23 17:24 ` [058/nnn] poly_int: get_binfo_at_offset Richard Sandiford
2017-11-28 16:50   ` Jeff Law
2017-10-23 17:24 ` [056/nnn] poly_int: MEM_REF offsets Richard Sandiford
2017-12-06  0:46   ` Jeff Law
2017-10-23 17:24 ` [057/nnn] poly_int: build_ref_for_offset Richard Sandiford
2017-11-28 16:51   ` Jeff Law
2017-10-23 17:25 ` [059/nnn] poly_int: tree-ssa-loop-ivopts.c:iv_use Richard Sandiford
2017-12-05 17:26   ` Jeff Law
2017-10-23 17:25 ` [061/nnn] poly_int: compute_data_ref_alignment Richard Sandiford
2017-11-28 16:49   ` Jeff Law
2017-10-23 17:25 ` [060/nnn] poly_int: loop versioning threshold Richard Sandiford
2017-12-05 17:31   ` Jeff Law
2017-10-23 17:26 ` [063/nnn] poly_int: vectoriser vf and uf Richard Sandiford
2017-12-06  2:46   ` Jeff Law
2018-01-03 21:23   ` [PATCH] Fix gcc.dg/vect-opt-info-1.c testcase Jakub Jelinek
2018-01-03 21:30     ` Richard Sandiford
2018-01-04 17:32     ` Jeff Law
2017-10-23 17:26 ` [062/nnn] poly_int: prune_runtime_alias_test_list Richard Sandiford
2017-12-05 17:33   ` Jeff Law
2017-10-23 17:27 ` [066/nnn] poly_int: omp_max_vf Richard Sandiford
2017-12-05 17:40   ` Jeff Law
2017-10-23 17:27 ` [065/nnn] poly_int: vect_nunits_for_cost Richard Sandiford
2017-12-05 17:35   ` Jeff Law
2017-10-23 17:27 ` [064/nnn] poly_int: SLP max_units Richard Sandiford
2017-12-05 17:41   ` Jeff Law
2017-10-23 17:28 ` [067/nnn] poly_int: get_mask_mode Richard Sandiford
2017-11-28 16:48   ` Jeff Law
2017-10-23 17:28 ` [068/nnn] poly_int: current_vector_size and TARGET_AUTOVECTORIZE_VECTOR_SIZES Richard Sandiford
2017-12-06  1:52   ` Jeff Law
2017-10-23 17:29 ` [070/nnn] poly_int: vectorizable_reduction Richard Sandiford
2017-11-22 18:11   ` Richard Sandiford
2017-12-06  0:33     ` Jeff Law
2017-10-23 17:29 ` [069/nnn] poly_int: vector_alignment_reachable_p Richard Sandiford
2017-11-28 16:48   ` Jeff Law
2017-10-23 17:29 ` [071/nnn] poly_int: vectorizable_induction Richard Sandiford
2017-12-05 17:44   ` Jeff Law
2017-10-23 17:30 ` [073/nnn] poly_int: vectorizable_load/store Richard Sandiford
2017-12-06  0:51   ` Jeff Law
2017-10-23 17:30 ` [072/nnn] poly_int: vectorizable_live_operation Richard Sandiford
2017-11-28 16:47   ` Jeff Law
2017-10-23 17:30 ` [074/nnn] poly_int: vectorizable_call Richard Sandiford
2017-11-28 16:46   ` Jeff Law
2017-10-23 17:31 ` [076/nnn] poly_int: vectorizable_conversion Richard Sandiford
2017-11-28 16:44   ` Jeff Law
2017-11-28 18:15     ` Richard Sandiford
2017-12-05 17:49       ` Jeff Law
2017-10-23 17:31 ` [075/nnn] poly_int: vectorizable_simd_clone_call Richard Sandiford
2017-11-28 16:45   ` Jeff Law
2017-10-23 17:31 ` [077/nnn] poly_int: vect_get_constant_vectors Richard Sandiford
2017-11-28 16:43   ` Jeff Law
2017-10-23 17:32 ` [080/nnn] poly_int: tree-vect-generic.c Richard Sandiford
2017-12-05 17:48   ` Jeff Law
2017-10-23 17:32 ` [079/nnn] poly_int: vect_no_alias_p Richard Sandiford
2017-12-05 17:46   ` Jeff Law
2017-10-23 17:32 ` [078/nnn] poly_int: two-operation SLP Richard Sandiford
2017-11-28 16:41   ` Jeff Law
2017-10-23 17:33 ` [082/nnn] poly_int: omp-simd-clone.c Richard Sandiford
2017-11-28 16:36   ` Jeff Law
2017-10-23 17:33 ` [081/nnn] poly_int: brig vector elements Richard Sandiford
2017-10-24  7:10   ` Pekka Jääskeläinen
2017-10-23 17:34 ` [083/nnn] poly_int: fold_indirect_ref_1 Richard Sandiford
2017-11-28 16:34   ` Jeff Law
2017-10-23 17:34 ` [085/nnn] poly_int: expand_vector_ubsan_overflow Richard Sandiford
2017-11-28 16:33   ` Jeff Law
2017-10-23 17:34 ` [084/nnn] poly_int: folding BIT_FIELD_REFs on vectors Richard Sandiford
2017-11-28 16:33   ` Jeff Law
2017-10-23 17:35 ` [088/nnn] poly_int: expand_expr_real_2 Richard Sandiford
2017-11-28  8:49   ` Jeff Law
2017-10-23 17:35 ` [087/nnn] poly_int: subreg_get_info Richard Sandiford
2017-11-28 16:29   ` Jeff Law
2017-10-23 17:35 ` [086/nnn] poly_int: REGMODE_NATURAL_SIZE Richard Sandiford
2017-12-05 23:33   ` Jeff Law
2017-10-23 17:36 ` [089/nnn] poly_int: expand_expr_real_1 Richard Sandiford
2017-11-28  8:41   ` Jeff Law
2017-10-23 17:36 ` [090/nnn] poly_int: set_inc_state Richard Sandiford
2017-11-28  8:35   ` Jeff Law
2017-10-23 17:37 ` [092/nnn] poly_int: PUSH_ROUNDING Richard Sandiford
2017-11-28 16:21   ` Jeff Law
2017-11-28 18:01     ` Richard Sandiford
2017-11-28 18:10       ` PUSH_ROUNDING Jeff Law
2017-10-23 17:37 ` [093/nnn] poly_int: adjust_mems Richard Sandiford
2017-11-28  8:32   ` Jeff Law
2017-10-23 17:37 ` [091/nnn] poly_int: emit_single_push_insn_1 Richard Sandiford
2017-11-28  8:33   ` Jeff Law
2017-10-23 17:38 ` [094/nnn] poly_int: expand_ifn_atomic_compare_exchange_into_call Richard Sandiford
2017-11-28  8:31   ` Jeff Law
2017-10-23 17:39 ` [095/nnn] poly_int: process_alt_operands Richard Sandiford
2017-11-28  8:14   ` Jeff Law
2017-10-23 17:39 ` [096/nnn] poly_int: reloading complex subregs Richard Sandiford
2017-11-28  8:09   ` Jeff Law
2017-10-23 17:40 ` [097/nnn] poly_int: alter_reg Richard Sandiford
2017-11-28  8:08   ` Jeff Law
2017-10-23 17:40 ` [099/nnn] poly_int: struct_value_size Richard Sandiford
2017-11-21  8:14   ` Jeff Law
2017-10-23 17:40 ` [098/nnn] poly_int: load_register_parameters Richard Sandiford
2017-11-28  8:08   ` Jeff Law
2017-10-23 17:41 ` [100/nnn] poly_int: memrefs_conflict_p Richard Sandiford
2017-12-05 23:29   ` Jeff Law
2017-10-23 17:41 ` [101/nnn] poly_int: GET_MODE_NUNITS Richard Sandiford
2017-12-06  2:05   ` Jeff Law
2017-10-23 17:42 ` [103/nnn] poly_int: TYPE_VECTOR_SUBPARTS Richard Sandiford
2017-10-24  9:06   ` Richard Biener
2017-10-24  9:40     ` Richard Sandiford
2017-10-24 10:01       ` Richard Biener
2017-10-24 11:20         ` Richard Sandiford
2017-10-24 11:30           ` Richard Biener
2017-10-24 16:24             ` Richard Sandiford
2017-12-06  2:31   ` Jeff Law
2017-10-23 17:42 ` [102/nnn] poly_int: vect_permute_load/store_chain Richard Sandiford
2017-11-21  8:01   ` Jeff Law
2017-10-23 17:43 ` [105/nnn] poly_int: expand_assignment Richard Sandiford
2017-11-21  7:50   ` Jeff Law
2017-10-23 17:43 ` [104/nnn] poly_int: GET_MODE_PRECISION Richard Sandiford
2017-11-28  8:07   ` Jeff Law
2017-10-23 17:43 ` [106/nnn] poly_int: GET_MODE_BITSIZE Richard Sandiford
2017-11-21  7:49   ` Jeff Law
2017-10-23 17:48 ` [107/nnn] poly_int: GET_MODE_SIZE Richard Sandiford
2017-11-21  7:48   ` Jeff Law
2017-10-24  9:25 ` [000/nnn] poly_int: representation of runtime offsets and sizes Eric Botcazou
2017-10-24  9:58   ` Richard Sandiford
2017-10-24 10:53     ` Eric Botcazou
2017-10-24 11:25       ` Richard Sandiford
2017-10-24 12:24         ` Richard Biener
2017-10-24 13:07           ` Richard Sandiford
2017-10-24 13:18             ` Richard Biener
2017-10-24 13:30               ` Richard Sandiford
2017-10-25 10:27                 ` Richard Biener
2017-10-25 10:45                   ` Jakub Jelinek
2017-10-25 11:39                   ` Richard Sandiford
2017-10-25 13:09                     ` Richard Biener
2017-11-08  9:51                       ` Richard Sandiford
2017-11-08 11:57                         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).