From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 85140 invoked by alias); 23 Oct 2017 16:54:40 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 85131 invoked by uid 89); 23 Oct 2017 16:54:40 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-6.4 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,KAM_ASCII_DIVIDERS,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wm0-f53.google.com Received: from mail-wm0-f53.google.com (HELO mail-wm0-f53.google.com) (74.125.82.53) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 23 Oct 2017 16:54:37 +0000 Received: by mail-wm0-f53.google.com with SMTP id b189so10739729wmd.4 for ; Mon, 23 Oct 2017 09:54:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:subject:date:message-id :user-agent:mime-version; bh=iKoBXSJV8hfRz+Mmh1eJG3TXSVR0bUnTjo+AjNixSJ0=; b=YCxOAiHgKVL7DNe1y/9/hiHE7Oh7aV0TJTghkIF23ikhhTDork1yqjc0GrIKo21nZV ugirQDo+vf5x2SrLoN2IEla+7GlbVs64CuOqZ5f2hDHa0xuiRYyfC1jBvf01i18sxNx8 hIkCZO1TrRBFmiB/WJ2jj8NRFtnAz9PKy6lvf1yRFH+5UIoVOPB3CrLYiOKAAtwWP09d BO3CGbCdAuoxuaq8P71jlX3VPShLkx1CKuJwEcs37EyEd8o9pQouwMjBTBXnrKs4lAGJ tROV1Kd2FnMxlJ1gO0GMHjRCqFVt7MzRIixwCtHk7KjQ3qPzwY3G6WKF/rmNFGqJ2F0q jXiA== X-Gm-Message-State: AMCzsaUKe0DIp060fQjnE9WhcTUjoPyHU3iFEdjk520TQyDbb+CP/E5T JNM1gvsSm3UDe4xzD4YeVyiRoQgO4CI= X-Google-Smtp-Source: ABhQp+TrHFzPBN/yWoYt4VHPo/jeQhH+Y4JeG507MUD3MK2TUJHZJ9lDxKifirGgevjjyS4cyfr90Q== X-Received: by 10.28.158.13 with SMTP id h13mr5518192wme.47.1508777675007; Mon, 23 Oct 2017 09:54:35 -0700 (PDT) Received: from localhost ([2.26.27.199]) by smtp.gmail.com with ESMTPSA id k19sm1470014wrg.32.2017.10.23.09.54.33 for (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 23 Oct 2017 09:54:34 -0700 (PDT) From: Richard Sandiford To: gcc-patches@gcc.gnu.org Mail-Followup-To: gcc-patches@gcc.gnu.org, richard.sandiford@linaro.org Subject: [000/nnn] poly_int: representation of runtime offsets and sizes Date: Mon, 23 Oct 2017 16:57:00 -0000 Message-ID: <871sltvm7r.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2017-10/txt/msg01499.txt.bz2 This series adds support for offsets and sizes that are a runtime invariant rather than a compile time constant. It's based on the patch posted here: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00406.html The rest of the covering note is split into: - Summary (from the message linked above) - Tree representation - RTL representation - Compile-time impact - Typical changes - Testing Summary ======= The size of an SVE register in bits can be any multiple of 128 between 128 and 2048 inclusive. The way we chose to represent this was to have a runtime indeterminate that counts the number of 128 bit blocks above the minimum of 128. If we call the indeterminate X then: * an SVE register has 128 + 128 * X bits (16 + 16 * X bytes) * the last int in an SVE vector is at byte offset 12 + 16 * X * etc. Although the maximum value of X is 15, we don't want to take advantage of that, since there's nothing particularly magical about the value. So we have two types of target: those for which there are no runtime indeterminates, and those for which there is one runtime indeterminate. We decided to generalise the interface slightly by allowing any number of indeterminates, although some parts of the underlying implementation are still limited to 0 and 1 for now. The main class for working with these runtime offsets and sizes is "poly_int". It represents a value of the form: C0 + C1 * X1 + ... + Cn * Xn where each coefficient Ci is a compile-time constant and where each indeterminate Xi is a nonnegative runtime value. The class takes two template parameters, one giving the number of coefficients and one giving the type of the coefficients. There are then typedefs for the common cases, with the number of coefficients being controlled by the target. poly_int is used for things like: - the number of elements in a VECTOR_TYPE - the size and number of units in a general machine_mode - the offset of something in the stack frame - SUBREG_BYTE - MEM_SIZE and MEM_OFFSET - mem_ref_offset (only a selective list). The patch that adds poly_int has detailed documentation, but the main points are: * there's no total ordering between poly_ints, so the best we can do when comparing them is to ask whether two values *might* or *must* be related in a particular way. E.g. if mode A has size 2 + 2X and mode B has size 4, the condition: GET_MODE_SIZE (A) <= GET_MODE_SIZE (B) is true for X<=1 and false for X>=2. This translates to: may_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == true must_le (GET_MODE_SIZE (A), GET_MODE_SIZE (B)) == false Of course, the may/must distinction already exists in things like alias analysis. * some poly_int arithmetic operations (notably division) are only possible for certain values. These operations therefore become conditional. * target-independent code is exposed to these restrictions even if the current target has no indeterminates. But: * we've tried to provide enough operations that poly_ints are easy to work with. * it means that developers working with non-SVE targets don't need to test SVE. If the code compiles on a non-SVE target, and if it doesn't use any asserting operations, it's reasonable to assume that it will work on SVE too. * for target-specific code, poly_int degenerates to a constant if there are no runtime invariants for that target. Only very minor changes are needed to non-AArch64 targets. * poly_int operations should be (and in practice seem to be) as efficient as single-coefficient operations on non-AArch64 targets. Tree representation =================== The series uses a new POLY_INT_CST node to represent a poly_int value at the tree level. It is only used on targets with runtime sizes and offsets; the associated test macro POLY_INT_CST_P is always false for other targets. The node has one INTEGER_CST per coefficient, which makes it easier to refer to the same tree as a poly_wide_int, a poly_offset_int and a poly_widest_int without copying the representation. Only low-level routines use the tree node directly. Most code uses: - poly_int_tree_p (x) Return true if X is an INTEGER_CST or a POLY_INT_CST. - wi::to_poly_wide (x) - wi::to_poly_offset (x) - wi::to_poly_widest (x) poly_int versions of the normal wi::to_wide etc. routines. These work on both INTEGER_CSTs and POLY_INT_CSTs. - poly_int_tree_p (x, &y) Test whether X is an INTEGER_CST or POLY_INT_CST and store its value in Y if so. This is defined for Y of type poly_int64 and poly_uint64; the wi::to_* routines are more efficient than return-by-pointer for wide_int-based types. - tree_to_poly_int64 (x) - tree_to_poly_uint64 (x) poly_int versions of tree_to_shwi and tree_to_uhwi. Again they work on both INTEGER_CSTs and POLY_INT_CSTs. Many tree routines now accept poly_int operands, such as: - build_int_cst - build_int_cstu - wide_int_to_tree - force_fit_type RTL representation ================== The corresponding RTL representation is CONST_POLY_INT. Again, this is only used on targets with runtime sizes and offsets, with the test macro CONST_POLY_INT_P returning false for other targets. Since RTL does not have the equivalent of the tree-level distinction between wi::to_wide, wi::to_offset and wi::to_widest, CONST_POLY_INT just stores the coefficients directly as wide_ints, using the trailing_wide_ints class for efficiency. The main routines are: - poly_int_rtx_p (x) Return true if X is CONST_SCALAR_INT_P or CONST_POLY_INT_P. - wi::to_poly_wide (x, mode) Return the value of CONST_SCALAR_INT_P or CONST_POLY_INT_P X as a poly_wide_int. - poly_int_rtx_p (x, &y) Return true if X is a CONST_INT or a CONST_POLY_INT, storing its value in Y if so. This is defined only for Y of type poly_int64. (poly_uint64 isn't much use for RTL, since constants have no inherent sign and are stored in sign- extended rather than zero-extended form. wi::to_wide is more efficient than return-by-pointer when accessing an rtx as a poly_wide_int.) - rtx_to_poly_int64 (x) A poly_int version of INTVAL, which works on both CONST_INT and CONST_POLY_INT. - split_offset (x, &y) If X is a PLUS of X' a poly_int, store the poly_int in Y and return the X'. Otherwise store 0 in Y and return X. - split_offset_and_add (x, &y) If X is a PLUS of X' a poly_int, add the poly_int to Y and return the X'. Otherwise leave Y alone and return X. Many RTL routines now accept poly_int operands, such as: - gen_int_mode - trunc_int_for_mode - plus_constant - immed_wide_int_const Compile-time impact =================== The series seems to be compile-time neutral for release builds on targets without runtime indeterminates, within a margin of about [-0.1%, 0.1%]. Also, the abstraction of poly_int<1, X> is usually compiled away. E.g.: poly_wide_int foo (poly_wide_int x, tree y) { return x + wi::to_poly_wide (x); } compiles to the same code as: wide_int foo (wide_int x, tree y) { return x + wi::to_wide (x); } in release builds. (I've tried various other combinations too.) Typical changes =============== Here's a table of the most common changes in the series. ---------------------------------------------------------------------- Before After ---------------------------------------------------------------------- wi::to_wide (x) wi::to_poly_wide (x) wi::to_offset (x) wi::to_poly_offset (x) wi::to_widest (x) wi::to_poly_widest (x) ---------------------------------------------------------------------- unsigned HOST_WIDE_INT y; poly_uint64 y; if (tree_fits_uhwi_p (x)) if (poly_int_tree_p (x, &y)) { { x = tree_to_uhwi (y); ---------------------------------------------------------------------- HOST_WIDE_INT y; poly_int64 y; if (tree_fits_shwi_p (x)) if (poly_int_tree_p (x, &y)) { { x = tree_to_shwi (y); ---------------------------------------------------------------------- HOST_WIDE_INT y; poly_int64 y; if (cst_and_fits_in_hwi (x)) if (ptrdiff_tree_p (x, &y)) { { x = int_cst_value (y); ---------------------------------------------------------------------- HOST_WIDE_INT y; poly_int64 y; if (CONST_INT_P (x)) if (poly_int_rtx_p (x, &y)) { { x = INTVAL (x); ---------------------------------------------------------------------- if (offset < limit) if (must_lt (offset, limit)) ...optimise...; ...optimise...; ---------------------------------------------------------------------- if (offset >= limit) if (may_ge (offset, limit)) ...abort optimisation...; ...abort optimisation...; ---------------------------------------------------------------------- if (offset >= limit) if (must_ge (offset, limit)) ...treat as undefined...; ...treat as undefined...; ---------------------------------------------------------------------- if (nunits1 == nunits2) if (must_eq (nunits1, nunits2)) ...treat as compatible...; ...treat as compatible...; ---------------------------------------------------------------------- if (nunits1 != nunits2) if (may_ne (nunits1, nunits2)) ...treat as incompatible...; ...treat as incompatible...; ---------------------------------------------------------------------- // Fold (eq op0 op1) // Fold (eq op0 op1) if (op0 == op1) if (must_eq (op0, op1)) ...fold to true...; ...fold to true...; ---------------------------------------------------------------------- // Fold (eq op0 op1) // Fold (eq op0 op1) if (op0 != op1) if (must_ne (op0, op1)) ...fold to false...; ...fold to false...; ---------------------------------------------------------------------- Testing ======= Tested by compiling the testsuite before and after the series on: aarch64-linux-gnu aarch64_be-linux-gnu alpha-linux-gnu arc-elf arm-linux-gnueabi arm-linux-gnueabihf avr-elf bfin-elf c6x-elf cr16-elf cris-elf epiphany-elf fr30-elf frv-linux-gnu ft32-elf h8300-elf hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf m68k-linux-gnu mcore-elf microblaze-elf mipsel-linux-gnu mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnuspe powerpc-eabispe powerpc64-linux-gnu powerpc64le-linux-gnu powerpc-ibm-aix7.0 riscv32-elf riscv64-elf rl78-elf rx-elf s390-linux-gnu s390x-linux-gnu sh-linux-gnu sparc-linux-gnu sparc64-linux-gnu sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf xstormy16-elf v850-elf vax-netbsdelf visium-elf x86_64-darwin x86_64-linux-gnu xtensa-elf There were no differences in assembly output (except on powerpc-ibm-aix7.0, where symbol names aren't stable). Also tested normally on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu. Thanks, Richard