From: Richard Sandiford <richard.sandiford@linaro.org>
To: Richard Biener <richard.guenther@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [07/13] Make vec_perm_indices use new vector encoding
Date: Tue, 02 Jan 2018 18:30:00 -0000 [thread overview]
Message-ID: <87po6snmej.fsf@linaro.org> (raw)
In-Reply-To: <CAFiYyc1VxJspedeV8QmUbZ1gvqH9b0qjwvcCSeWts-CJ0Wto3g@mail.gmail.com> (Richard Biener's message of "Tue, 2 Jan 2018 14:15:45 +0100")
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Dec 20, 2017 at 2:48 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>> This patch changes vec_perm_indices from a plain vec<> to a class
>>>>>> that stores a canonicalised permutation, using the same encoding
>>>>>> as for VECTOR_CSTs. This means that vec_perm_indices now carries
>>>>>> information about the number of vectors being permuted (currently
>>>>>> always 1 or 2) and the number of elements in each input vector.
>>>>>
>>>>> Before I dive into the C++ details can you explain why it needs this
>>>>> info and how it encodes it for variable-length vectors? To interleave
>>>>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
>>>>> can directly encode N here, can we? extract even/odd should just
>>>>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
>>>>> one or two vectors (the one vector case just has two times the same
>>>>> vector) or how many elements each of the vectors (or the result) has.
>>>>
>>>> One of the later patches switches the element types to HOST_WIDE_INT,
>>>> so that we can represent all ssizetypes. Then there's a poly_int
>>>> patch (not yet posted) to make that poly_int64, so that we can
>>>> represent the N even for variable-length vectors.
>>>>
>>>> The class needs to know the number of elements because that affects
>>>> the canonical representation. E.g. extract even on fixed-length
>>>> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },
>>>> which we can't encode as a simple series. Interleave low with both
>>>> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and
>>>> variable-length vectors.
>>>
>>> Huh? extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector
>>> are referencing concat'ed input vectors. So yes, for two same vectors
>>> that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why
>>> that should
>>> be the canonical form?
>>
>> Current practice is to use the single-input form where possible,
>> if both inputs are the same (see e.g. the VEC_PERM_EXPR handling
>> in fold-const.c). It means that things like:
>>
>> _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
>> _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>;
>> _3 = VEC_PERM_EXPR <a, b, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
>>
>> get folded to the same sequence, and so can be CSEd.
>>
>> We could instead convert the single-input form to use the two-input
>> selector, but that would be harder. The advantage of treating the
>> single-input form as canonical is that it works even for irregular
>> permutes.
>
> Ok, I see. Maybe adding a comment along this helps.
OK, thanks, installed as below with that change.
Richard
2018-01-02 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* int-vector-builder.h: New file.
* vec-perm-indices.h: Include int-vector-builder.h.
(vec_perm_indices): Redefine as an int_vector_builder.
(auto_vec_perm_indices): Delete.
(vec_perm_builder): Redefine as a stand-alone class.
(vec_perm_indices::vec_perm_indices): New function.
(vec_perm_indices::clamp): Likewise.
* vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h.
(vec_perm_indices::new_vector): New function.
(vec_perm_indices::new_expanded_vector): Update for new
vec_perm_indices class.
(vec_perm_indices::rotate_inputs): New function.
(vec_perm_indices::all_in_range_p): Operate directly on the
encoded form, without computing elided elements.
(tree_to_vec_perm_builder): Operate directly on the VECTOR_CST
encoding. Update for new vec_perm_indices class.
* optabs.c (expand_vec_perm_const): Create a vec_perm_indices for
the given vec_perm_builder.
(expand_vec_perm_var): Update vec_perm_builder constructor.
(expand_mult_highpart): Use vec_perm_builder instead of
auto_vec_perm_indices.
* optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices. Use a single
or double series encoding as appropriate.
* fold-const.c (fold_ternary_loc): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices.
* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
(vect_permute_store_chain): Likewise.
(vect_grouped_load_supported): Likewise.
(vect_permute_load_chain): Likewise.
(vect_shift_permute_load_chain): Likewise.
* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
(vect_transform_slp_perm_load): Likewise.
(vect_schedule_slp_instance): Likewise.
* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
(vectorizable_mask_load_store): Likewise.
(vectorizable_bswap): Likewise.
(vectorizable_store): Likewise.
(vectorizable_load): Likewise.
* tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices. Use
tree_to_vec_perm_builder to read the vector from a tree.
* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a
vec_perm_builder instead of a vec_perm_indices.
(have_whole_vector_shift): Use vec_perm_builder and
vec_perm_indices instead of auto_vec_perm_indices. Leave the
truncation to calc_vec_perm_mask_for_shift.
(vect_create_epilog_for_reduction): Likewise.
* config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change
from auto_vec_perm_indices to vec_perm_indices.
(aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm
instead of changing individual elements.
(aarch64_vectorize_vec_perm_const): Use new_vector to install
the vector in d.perm.
* config/arm/arm.c (expand_vec_perm_d::perm): Change
from auto_vec_perm_indices to vec_perm_indices.
(arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm
instead of changing individual elements.
(arm_vectorize_vec_perm_const): Use new_vector to install
the vector in d.perm.
* config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even):
Update vec_perm_builder constructor.
(rs6000_expand_interleave): Likewise.
* config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise.
(rs6000_expand_interleave): Likewise.
------------------------------------------------------------------------------
Index: gcc/int-vector-builder.h
===================================================================
--- /dev/null 2017-12-30 11:27:13.464311244 +0000
+++ gcc/int-vector-builder.h 2018-01-02 17:01:28.746627393 +0000
@@ -0,0 +1,90 @@
+/* A class for building vector integer constants.
+ Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3. If not see
+<http://www.gnu.org/licenses/>. */
+
+#ifndef GCC_INT_VECTOR_BUILDER_H
+#define GCC_INT_VECTOR_BUILDER_H 1
+
+#include "vector-builder.h"
+
+/* This class is used to build vectors of integer type T using the same
+ encoding as tree and rtx constants. See vector_builder for more
+ details. */
+template<typename T>
+class int_vector_builder : public vector_builder<T, int_vector_builder<T> >
+{
+ typedef vector_builder<T, int_vector_builder> parent;
+ friend class vector_builder<T, int_vector_builder>;
+
+public:
+ int_vector_builder () {}
+ int_vector_builder (unsigned int, unsigned int, unsigned int);
+
+ using parent::new_vector;
+
+private:
+ bool equal_p (T, T) const;
+ bool allow_steps_p () const { return true; }
+ bool integral_p (T) const { return true; }
+ T step (T, T) const;
+ T apply_step (T, unsigned int, T) const;
+ bool can_elide_p (T) const { return true; }
+ void note_representative (T *, T) {}
+};
+
+/* Create a new builder for a vector with FULL_NELTS elements.
+ Initially encode the value as NPATTERNS interleaved patterns with
+ NELTS_PER_PATTERN elements each. */
+
+template<typename T>
+inline
+int_vector_builder<T>::int_vector_builder (unsigned int full_nelts,
+ unsigned int npatterns,
+ unsigned int nelts_per_pattern)
+{
+ new_vector (full_nelts, npatterns, nelts_per_pattern);
+}
+
+/* Return true if elements ELT1 and ELT2 are equal. */
+
+template<typename T>
+inline bool
+int_vector_builder<T>::equal_p (T elt1, T elt2) const
+{
+ return elt1 == elt2;
+}
+
+/* Return the value of element ELT2 minus the value of element ELT1. */
+
+template<typename T>
+inline T
+int_vector_builder<T>::step (T elt1, T elt2) const
+{
+ return elt2 - elt1;
+}
+
+/* Return a vector element with the value BASE + FACTOR * STEP. */
+
+template<typename T>
+inline T
+int_vector_builder<T>::apply_step (T base, unsigned int factor, T step) const
+{
+ return base + factor * step;
+}
+
+#endif
Index: gcc/vec-perm-indices.h
===================================================================
--- gcc/vec-perm-indices.h 2018-01-02 17:01:26.633719678 +0000
+++ gcc/vec-perm-indices.h 2018-01-02 18:25:11.081076126 +0000
@@ -20,30 +20,117 @@ Software Foundation; either version 3, o
#ifndef GCC_VEC_PERN_INDICES_H
#define GCC_VEC_PERN_INDICES_H 1
+#include "int-vector-builder.h"
+
+/* A vector_builder for building constant permutation vectors.
+ The elements do not need to be clamped to a particular range
+ of input elements. */
+typedef int_vector_builder<HOST_WIDE_INT> vec_perm_builder;
+
/* This class represents a constant permutation vector, such as that used
- as the final operand to a VEC_PERM_EXPR. */
-class vec_perm_indices : public auto_vec<unsigned short, 32>
+ as the final operand to a VEC_PERM_EXPR.
+
+ Permutation vectors select indices modulo the number of input elements,
+ and the class canonicalizes each permutation vector for a particular
+ number of input vectors and for a particular number of elements per
+ input. For example, the gimple statements:
+
+ _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
+ _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>;
+ _3 = VEC_PERM_EXPR <a, a, { 0, 2, 20, 22, 24, 2, 4, 14 }>;
+
+ effectively have only a single vector input "a". If "a" has 8
+ elements, the indices select elements modulo 8, which makes all three
+ VEC_PERM_EXPRs equivalent. The canonical form is for the indices to be
+ in the range [0, number of input elements - 1], so the class treats the
+ second and third permutation vectors as though they had been the first.
+
+ The class copes with cases in which the input and output vectors have
+ different numbers of elements. */
+class vec_perm_indices
{
- typedef unsigned short element_type;
- typedef auto_vec<element_type, 32> parent_type;
+ typedef HOST_WIDE_INT element_type;
public:
- vec_perm_indices () {}
- vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}
+ vec_perm_indices ();
+ vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int);
+ void new_vector (const vec_perm_builder &, unsigned int, unsigned int);
void new_expanded_vector (const vec_perm_indices &, unsigned int);
+ void rotate_inputs (int delta);
+
+ /* Return the underlying vector encoding. */
+ const vec_perm_builder &encoding () const { return m_encoding; }
+
+ /* Return the number of output elements. This is called length ()
+ so that we present a more vec-like interface. */
+ unsigned int length () const { return m_encoding.full_nelts (); }
+ /* Return the number of input vectors being permuted. */
+ unsigned int ninputs () const { return m_ninputs; }
+
+ /* Return the number of elements in each input vector. */
+ unsigned int nelts_per_input () const { return m_nelts_per_input; }
+
+ /* Return the total number of input elements. */
+ unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; }
+
+ element_type clamp (element_type) const;
+ element_type operator[] (unsigned int i) const;
bool all_in_range_p (element_type, element_type) const;
private:
vec_perm_indices (const vec_perm_indices &);
-};
-/* Temporary. */
-typedef vec_perm_indices vec_perm_builder;
-typedef vec_perm_indices auto_vec_perm_indices;
+ vec_perm_builder m_encoding;
+ unsigned int m_ninputs;
+ unsigned int m_nelts_per_input;
+};
bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
+inline
+vec_perm_indices::vec_perm_indices ()
+ : m_ninputs (0),
+ m_nelts_per_input (0)
+{
+}
+
+/* Construct a permutation vector that selects between NINPUTS vector
+ inputs that have NELTS_PER_INPUT elements each. Take the elements of
+ the new vector from ELEMENTS, clamping each one to be in range. */
+
+inline
+vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements,
+ unsigned int ninputs,
+ unsigned int nelts_per_input)
+{
+ new_vector (elements, ninputs, nelts_per_input);
+}
+
+/* Return the canonical value for permutation vector element ELT,
+ taking into account the current number of input elements. */
+
+inline vec_perm_indices::element_type
+vec_perm_indices::clamp (element_type elt) const
+{
+ element_type limit = input_nelts ();
+ elt %= limit;
+ /* Treat negative elements as counting from the end. This only matters
+ if the vector size is not a power of 2. */
+ if (elt < 0)
+ elt += limit;
+ return elt;
+}
+
+/* Return the value of vector element I, which might or might not be
+ explicitly encoded. */
+
+inline vec_perm_indices::element_type
+vec_perm_indices::operator[] (unsigned int i) const
+{
+ return clamp (m_encoding.elt (i));
+}
+
#endif
Index: gcc/vec-perm-indices.c
===================================================================
--- gcc/vec-perm-indices.c 2018-01-02 17:01:26.632719721 +0000
+++ gcc/vec-perm-indices.c 2018-01-02 17:01:28.750627219 +0000
@@ -22,11 +22,33 @@ Software Foundation; either version 3, o
#include "coretypes.h"
#include "vec-perm-indices.h"
#include "tree.h"
+#include "fold-const.h"
+#include "tree-vector-builder.h"
#include "backend.h"
#include "rtl.h"
#include "memmodel.h"
#include "emit-rtl.h"
+/* Switch to a new permutation vector that selects between NINPUTS vector
+ inputs that have NELTS_PER_INPUT elements each. Take the elements of the
+ new permutation vector from ELEMENTS, clamping each one to be in range. */
+
+void
+vec_perm_indices::new_vector (const vec_perm_builder &elements,
+ unsigned int ninputs,
+ unsigned int nelts_per_input)
+{
+ m_ninputs = ninputs;
+ m_nelts_per_input = nelts_per_input;
+ /* Expand the encoding and clamp each element. E.g. { 0, 2, 4, ... }
+ might wrap halfway if there is only one vector input. */
+ unsigned int full_nelts = elements.full_nelts ();
+ m_encoding.new_vector (full_nelts, full_nelts, 1);
+ for (unsigned int i = 0; i < full_nelts; ++i)
+ m_encoding.quick_push (clamp (elements.elt (i)));
+ m_encoding.finalize ();
+}
+
/* Switch to a new permutation vector that selects the same input elements
as ORIG, but with each element split into FACTOR pieces. For example,
if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is
@@ -36,14 +58,31 @@ Software Foundation; either version 3, o
vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,
unsigned int factor)
{
- truncate (0);
- reserve (orig.length () * factor);
- for (unsigned int i = 0; i < orig.length (); ++i)
+ m_ninputs = orig.m_ninputs;
+ m_nelts_per_input = orig.m_nelts_per_input * factor;
+ m_encoding.new_vector (orig.m_encoding.full_nelts () * factor,
+ orig.m_encoding.npatterns () * factor,
+ orig.m_encoding.nelts_per_pattern ());
+ unsigned int encoded_nelts = orig.m_encoding.encoded_nelts ();
+ for (unsigned int i = 0; i < encoded_nelts; ++i)
{
- element_type base = orig[i] * factor;
+ element_type base = orig.m_encoding[i] * factor;
for (unsigned int j = 0; j < factor; ++j)
- quick_push (base + j);
+ m_encoding.quick_push (base + j);
}
+ m_encoding.finalize ();
+}
+
+/* Rotate the inputs of the permutation right by DELTA inputs. This changes
+ the values of the permutation vector but it doesn't change the way that
+ the elements are encoded. */
+
+void
+vec_perm_indices::rotate_inputs (int delta)
+{
+ element_type element_delta = delta * m_nelts_per_input;
+ for (unsigned int i = 0; i < m_encoding.length (); ++i)
+ m_encoding[i] = clamp (m_encoding[i] + element_delta);
}
/* Return true if all elements of the permutation vector are in the range
@@ -52,9 +91,44 @@ vec_perm_indices::new_expanded_vector (c
bool
vec_perm_indices::all_in_range_p (element_type start, element_type size) const
{
- for (unsigned int i = 0; i < length (); ++i)
- if ((*this)[i] < start || ((*this)[i] - start) >= size)
+ /* Check the first two elements of each pattern. */
+ unsigned int npatterns = m_encoding.npatterns ();
+ unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern ();
+ unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2);
+ for (unsigned int i = 0; i < base_nelts; ++i)
+ if (m_encoding[i] < start || (m_encoding[i] - start) >= size)
return false;
+
+ /* For stepped encodings, check the full range of the series. */
+ if (nelts_per_pattern == 3)
+ {
+ element_type limit = input_nelts ();
+
+ /* The number of elements in each pattern beyond the first two
+ that we checked above. */
+ unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2;
+ for (unsigned int i = 0; i < npatterns; ++i)
+ {
+ /* BASE1 has been checked but BASE2 hasn't. */
+ element_type base1 = m_encoding[i + npatterns];
+ element_type base2 = m_encoding[i + base_nelts];
+
+ /* The step to add to get from BASE1 to each subsequent value. */
+ element_type step = clamp (base2 - base1);
+
+ /* STEP has no inherent sign, so a value near LIMIT can
+ act as a negative step. The series is in range if it
+ is in range according to one of the two interpretations.
+
+ Since we're dealing with clamped values, ELEMENT_TYPE is
+ wide enough for overflow not to be a problem. */
+ element_type headroom_down = base1 - start;
+ element_type headroom_up = size - headroom_down - 1;
+ if (headroom_up < step * step_nelts
+ && headroom_down < (limit - step) * step_nelts)
+ return false;
+ }
+ }
return true;
}
@@ -65,15 +139,16 @@ vec_perm_indices::all_in_range_p (elemen
bool
tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)
{
- unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));
- for (unsigned int i = 0; i < nelts; ++i)
- if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))
+ unsigned int encoded_nelts = vector_cst_encoded_nelts (cst);
+ for (unsigned int i = 0; i < encoded_nelts; ++i)
+ if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i)))
return false;
- builder->reserve (nelts);
- for (unsigned int i = 0; i < nelts; ++i)
- builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))
- & (2 * nelts - 1));
+ builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)),
+ VECTOR_CST_NPATTERNS (cst),
+ VECTOR_CST_NELTS_PER_PATTERN (cst));
+ for (unsigned int i = 0; i < encoded_nelts; ++i)
+ builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i)));
return true;
}
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c 2018-01-02 17:01:27.762670303 +0000
+++ gcc/optabs.c 2018-01-02 17:01:28.747627350 +0000
@@ -5476,6 +5476,11 @@ expand_vec_perm_const (machine_mode mode
rtx_insn *last = get_last_insn ();
bool single_arg_p = rtx_equal_p (v0, v1);
+ /* Always specify two input vectors here and leave the target to handle
+ cases in which the inputs are equal. Not all backends can cope with
+ the single-input representation when testing for a double-input
+ target instruction. */
+ vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode));
/* See if this can be handled with a vec_shr. We only do this if the
second vector is all zeroes. */
@@ -5488,7 +5493,7 @@ expand_vec_perm_const (machine_mode mode
&& (shift_code != CODE_FOR_nothing
|| shift_code_qi != CODE_FOR_nothing))
{
- rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
+ rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices);
if (shift_amt)
{
struct expand_operand ops[3];
@@ -5520,7 +5525,7 @@ expand_vec_perm_const (machine_mode mode
else
v1 = force_reg (mode, v1);
- if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))
+ if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices))
return target;
}
@@ -5529,7 +5534,7 @@ expand_vec_perm_const (machine_mode mode
rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;
if (qimode != VOIDmode)
{
- qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));
+ qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode));
target_qi = gen_reg_rtx (qimode);
v0_qi = gen_lowpart (qimode, v0);
v1_qi = gen_lowpart (qimode, v1);
@@ -5556,7 +5561,7 @@ expand_vec_perm_const (machine_mode mode
REQUIRED_SEL_MODE is OK. */
if (sel_mode != required_sel_mode)
{
- if (!selector_fits_mode_p (required_sel_mode, sel))
+ if (!selector_fits_mode_p (required_sel_mode, indices))
{
delete_insns_since (last);
return NULL_RTX;
@@ -5567,7 +5572,7 @@ expand_vec_perm_const (machine_mode mode
insn_code icode = direct_optab_handler (vec_perm_optab, mode);
if (icode != CODE_FOR_nothing)
{
- rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);
+ rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices);
rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);
if (tmp)
return tmp;
@@ -5642,7 +5647,7 @@ expand_vec_perm_var (machine_mode mode,
gcc_assert (sel != NULL);
/* Broadcast the low byte each element into each of its bytes. */
- vec_perm_builder const_sel (w);
+ vec_perm_builder const_sel (w, w, 1);
for (i = 0; i < w; ++i)
{
int this_e = i / u * u;
@@ -5890,7 +5895,7 @@ expand_mult_highpart (machine_mode mode,
expand_insn (optab_handler (tab2, mode), 3, eops);
m2 = gen_lowpart (mode, eops[0].value);
- auto_vec_perm_indices sel (nunits);
+ vec_perm_builder sel (nunits, nunits, 1);
if (method == 2)
{
for (i = 0; i < nunits; ++i)
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c 2018-01-02 17:01:27.761670346 +0000
+++ gcc/optabs-query.c 2018-01-02 17:01:28.746627393 +0000
@@ -516,12 +516,13 @@ can_mult_highpart_p (machine_mode mode,
op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;
if (optab_handler (op, mode) != CODE_FOR_nothing)
{
- auto_vec_perm_indices sel (nunits);
+ vec_perm_builder sel (nunits, nunits, 1);
for (i = 0; i < nunits; ++i)
sel.quick_push (!BYTES_BIG_ENDIAN
+ (i & ~1)
+ ((i & 1) ? nunits : 0));
- if (can_vec_perm_const_p (mode, sel))
+ vec_perm_indices indices (sel, 2, nunits);
+ if (can_vec_perm_const_p (mode, indices))
return 2;
}
}
@@ -532,10 +533,11 @@ can_mult_highpart_p (machine_mode mode,
op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
if (optab_handler (op, mode) != CODE_FOR_nothing)
{
- auto_vec_perm_indices sel (nunits);
+ vec_perm_builder sel (nunits, nunits, 1);
for (i = 0; i < nunits; ++i)
sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
- if (can_vec_perm_const_p (mode, sel))
+ vec_perm_indices indices (sel, 2, nunits);
+ if (can_vec_perm_const_p (mode, indices))
return 3;
}
}
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c 2018-01-02 17:01:26.628719897 +0000
+++ gcc/fold-const.c 2018-01-02 17:01:28.746627393 +0000
@@ -11373,7 +11373,7 @@ fold_ternary_loc (location_t loc, enum t
{
unsigned int nelts = VECTOR_CST_NELTS (arg0), i;
gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
- auto_vec_perm_indices sel (nelts);
+ vec_perm_builder sel (nelts, nelts, 1);
for (i = 0; i < nelts; i++)
{
tree val = VECTOR_CST_ELT (arg0, i);
@@ -11384,7 +11384,8 @@ fold_ternary_loc (location_t loc, enum t
else /* Currently unreachable. */
return NULL_TREE;
}
- tree t = fold_vec_perm (type, arg1, arg2, sel);
+ tree t = fold_vec_perm (type, arg1, arg2,
+ vec_perm_indices (sel, 2, nelts));
if (t != NULL_TREE)
return t;
}
@@ -11716,8 +11717,8 @@ fold_ternary_loc (location_t loc, enum t
mask2 = 2 * nelts - 1;
mask = single_arg ? (nelts - 1) : mask2;
gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
- auto_vec_perm_indices sel (nelts);
- auto_vec_perm_indices sel2 (nelts);
+ vec_perm_builder sel (nelts, nelts, 1);
+ vec_perm_builder sel2 (nelts, nelts, 1);
for (i = 0; i < nelts; i++)
{
tree val = VECTOR_CST_ELT (arg2, i);
@@ -11762,12 +11763,13 @@ fold_ternary_loc (location_t loc, enum t
need_mask_canon = true;
}
+ vec_perm_indices indices (sel, 2, nelts);
if ((TREE_CODE (op0) == VECTOR_CST
|| TREE_CODE (op0) == CONSTRUCTOR)
&& (TREE_CODE (op1) == VECTOR_CST
|| TREE_CODE (op1) == CONSTRUCTOR))
{
- tree t = fold_vec_perm (type, op0, op1, sel);
+ tree t = fold_vec_perm (type, op0, op1, indices);
if (t != NULL_TREE)
return t;
}
@@ -11779,11 +11781,14 @@ fold_ternary_loc (location_t loc, enum t
argument permutation while still allowing an equivalent
2-argument version. */
if (need_mask_canon && arg2 == op2
- && !can_vec_perm_const_p (TYPE_MODE (type), sel, false)
- && can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
+ && !can_vec_perm_const_p (TYPE_MODE (type), indices, false)
+ && can_vec_perm_const_p (TYPE_MODE (type),
+ vec_perm_indices (sel2, 2, nelts),
+ false))
{
need_mask_canon = need_mask_canon2;
- sel = sel2;
+ sel.truncate (0);
+ sel.splice (sel2);
}
if (need_mask_canon && arg2 == op2)
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c 2018-01-02 17:01:26.630719809 +0000
+++ gcc/tree-ssa-forwprop.c 2018-01-02 17:01:28.747627350 +0000
@@ -2018,7 +2018,7 @@ simplify_vector_constructor (gimple_stmt
elem_type = TREE_TYPE (type);
elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
- auto_vec_perm_indices sel (nelts);
+ vec_perm_builder sel (nelts, nelts, 1);
orig = NULL;
conv_code = ERROR_MARK;
maybe_ident = true;
@@ -2109,7 +2109,8 @@ simplify_vector_constructor (gimple_stmt
{
tree mask_type;
- if (!can_vec_perm_const_p (TYPE_MODE (type), sel))
+ vec_perm_indices indices (sel, 1, nelts);
+ if (!can_vec_perm_const_p (TYPE_MODE (type), indices))
return false;
mask_type
= build_vector_type (build_nonstandard_integer_type (elem_size, 1),
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c 2018-01-02 17:01:26.630719809 +0000
+++ gcc/tree-vect-data-refs.c 2018-01-02 17:01:28.748627306 +0000
@@ -4579,7 +4579,7 @@ vect_grouped_store_supported (tree vecty
if (VECTOR_MODE_P (mode))
{
unsigned int i, nelt = GET_MODE_NUNITS (mode);
- auto_vec_perm_indices sel (nelt);
+ vec_perm_builder sel (nelt, nelt, 1);
sel.quick_grow (nelt);
if (count == 3)
@@ -4587,6 +4587,7 @@ vect_grouped_store_supported (tree vecty
unsigned int j0 = 0, j1 = 0, j2 = 0;
unsigned int i, j;
+ vec_perm_indices indices;
for (j = 0; j < 3; j++)
{
int nelt0 = ((3 - j) * nelt) % 3;
@@ -4601,7 +4602,8 @@ vect_grouped_store_supported (tree vecty
if (3 * i + nelt2 < nelt)
sel[3 * i + nelt2] = 0;
}
- if (!can_vec_perm_const_p (mode, sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (mode, indices))
{
if (dump_enabled_p ())
dump_printf (MSG_MISSED_OPTIMIZATION,
@@ -4618,7 +4620,8 @@ vect_grouped_store_supported (tree vecty
if (3 * i + nelt2 < nelt)
sel[3 * i + nelt2] = nelt + j2++;
}
- if (!can_vec_perm_const_p (mode, sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (mode, indices))
{
if (dump_enabled_p ())
dump_printf (MSG_MISSED_OPTIMIZATION,
@@ -4638,11 +4641,13 @@ vect_grouped_store_supported (tree vecty
sel[i * 2] = i;
sel[i * 2 + 1] = i + nelt;
}
- if (can_vec_perm_const_p (mode, sel))
+ vec_perm_indices indices (sel, 2, nelt);
+ if (can_vec_perm_const_p (mode, indices))
{
for (i = 0; i < nelt; i++)
sel[i] += nelt / 2;
- if (can_vec_perm_const_p (mode, sel))
+ indices.new_vector (sel, 2, nelt);
+ if (can_vec_perm_const_p (mode, indices))
return true;
}
}
@@ -4744,7 +4749,7 @@ vect_permute_store_chain (vec<tree> dr_c
unsigned int i, n, log_length = exact_log2 (length);
unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
- auto_vec_perm_indices sel (nelt);
+ vec_perm_builder sel (nelt, nelt, 1);
sel.quick_grow (nelt);
result_chain->quick_grow (length);
@@ -4755,6 +4760,7 @@ vect_permute_store_chain (vec<tree> dr_c
{
unsigned int j0 = 0, j1 = 0, j2 = 0;
+ vec_perm_indices indices;
for (j = 0; j < 3; j++)
{
int nelt0 = ((3 - j) * nelt) % 3;
@@ -4770,7 +4776,8 @@ vect_permute_store_chain (vec<tree> dr_c
if (3 * i + nelt2 < nelt)
sel[3 * i + nelt2] = 0;
}
- perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+ indices.new_vector (sel, 2, nelt);
+ perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
for (i = 0; i < nelt; i++)
{
@@ -4781,7 +4788,8 @@ vect_permute_store_chain (vec<tree> dr_c
if (3 * i + nelt2 < nelt)
sel[3 * i + nelt2] = nelt + j2++;
}
- perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+ indices.new_vector (sel, 2, nelt);
+ perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
vect1 = dr_chain[0];
vect2 = dr_chain[1];
@@ -4818,11 +4826,13 @@ vect_permute_store_chain (vec<tree> dr_c
sel[i * 2] = i;
sel[i * 2 + 1] = i + nelt;
}
- perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+ vec_perm_indices indices (sel, 2, nelt);
+ perm_mask_high = vect_gen_perm_mask_checked (vectype, indices);
for (i = 0; i < nelt; i++)
sel[i] += nelt / 2;
- perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+ indices.new_vector (sel, 2, nelt);
+ perm_mask_low = vect_gen_perm_mask_checked (vectype, indices);
for (i = 0, n = log_length; i < n; i++)
{
@@ -5167,11 +5177,12 @@ vect_grouped_load_supported (tree vectyp
if (VECTOR_MODE_P (mode))
{
unsigned int i, j, nelt = GET_MODE_NUNITS (mode);
- auto_vec_perm_indices sel (nelt);
+ vec_perm_builder sel (nelt, nelt, 1);
sel.quick_grow (nelt);
if (count == 3)
{
+ vec_perm_indices indices;
unsigned int k;
for (k = 0; k < 3; k++)
{
@@ -5180,7 +5191,8 @@ vect_grouped_load_supported (tree vectyp
sel[i] = 3 * i + k;
else
sel[i] = 0;
- if (!can_vec_perm_const_p (mode, sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (mode, indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5193,7 +5205,8 @@ vect_grouped_load_supported (tree vectyp
sel[i] = i;
else
sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
- if (!can_vec_perm_const_p (mode, sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (mode, indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5208,13 +5221,16 @@ vect_grouped_load_supported (tree vectyp
{
/* If length is not equal to 3 then only power of 2 is supported. */
gcc_assert (pow2p_hwi (count));
+
for (i = 0; i < nelt; i++)
sel[i] = i * 2;
- if (can_vec_perm_const_p (mode, sel))
+ vec_perm_indices indices (sel, 2, nelt);
+ if (can_vec_perm_const_p (mode, indices))
{
for (i = 0; i < nelt; i++)
sel[i] = i * 2 + 1;
- if (can_vec_perm_const_p (mode, sel))
+ indices.new_vector (sel, 2, nelt);
+ if (can_vec_perm_const_p (mode, indices))
return true;
}
}
@@ -5329,7 +5345,7 @@ vect_permute_load_chain (vec<tree> dr_ch
unsigned int i, j, log_length = exact_log2 (length);
unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
- auto_vec_perm_indices sel (nelt);
+ vec_perm_builder sel (nelt, nelt, 1);
sel.quick_grow (nelt);
result_chain->quick_grow (length);
@@ -5340,6 +5356,7 @@ vect_permute_load_chain (vec<tree> dr_ch
{
unsigned int k;
+ vec_perm_indices indices;
for (k = 0; k < 3; k++)
{
for (i = 0; i < nelt; i++)
@@ -5347,15 +5364,16 @@ vect_permute_load_chain (vec<tree> dr_ch
sel[i] = 3 * i + k;
else
sel[i] = 0;
- perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+ indices.new_vector (sel, 2, nelt);
+ perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
for (i = 0, j = 0; i < nelt; i++)
if (3 * i + k < 2 * nelt)
sel[i] = i;
else
sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
-
- perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+ indices.new_vector (sel, 2, nelt);
+ perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
first_vect = dr_chain[0];
second_vect = dr_chain[1];
@@ -5387,11 +5405,13 @@ vect_permute_load_chain (vec<tree> dr_ch
for (i = 0; i < nelt; ++i)
sel[i] = i * 2;
- perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
+ vec_perm_indices indices (sel, 2, nelt);
+ perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
for (i = 0; i < nelt; ++i)
sel[i] = i * 2 + 1;
- perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
+ indices.new_vector (sel, 2, nelt);
+ perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
for (i = 0; i < log_length; i++)
{
@@ -5527,7 +5547,7 @@ vect_shift_permute_load_chain (vec<tree>
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
- auto_vec_perm_indices sel (nelt);
+ vec_perm_builder sel (nelt, nelt, 1);
sel.quick_grow (nelt);
result_chain->quick_grow (length);
@@ -5541,7 +5561,8 @@ vect_shift_permute_load_chain (vec<tree>
sel[i] = i * 2;
for (i = 0; i < nelt / 2; ++i)
sel[nelt / 2 + i] = i * 2 + 1;
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ vec_perm_indices indices (sel, 2, nelt);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5549,13 +5570,14 @@ vect_shift_permute_load_chain (vec<tree>
supported by target\n");
return false;
}
- perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel);
+ perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices);
for (i = 0; i < nelt / 2; ++i)
sel[i] = i * 2 + 1;
for (i = 0; i < nelt / 2; ++i)
sel[nelt / 2 + i] = i * 2;
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5563,20 +5585,21 @@ vect_shift_permute_load_chain (vec<tree>
supported by target\n");
return false;
}
- perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel);
+ perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices);
/* Generating permutation constant to shift all elements.
For vector length 8 it is {4 5 6 7 8 9 10 11}. */
for (i = 0; i < nelt; i++)
sel[i] = nelt / 2 + i;
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"shift permutation is not supported by target\n");
return false;
}
- shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
+ shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
/* Generating permutation constant to select vector from 2.
For vector length 8 it is {0 1 2 3 12 13 14 15}. */
@@ -5584,14 +5607,15 @@ vect_shift_permute_load_chain (vec<tree>
sel[i] = i;
for (i = nelt / 2; i < nelt; i++)
sel[i] = nelt + i;
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"select is not supported by target\n");
return false;
}
- select_mask = vect_gen_perm_mask_checked (vectype, sel);
+ select_mask = vect_gen_perm_mask_checked (vectype, indices);
for (i = 0; i < log_length; i++)
{
@@ -5647,7 +5671,8 @@ vect_shift_permute_load_chain (vec<tree>
sel[i] = 3 * k + (l % 3);
k++;
}
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ vec_perm_indices indices (sel, 2, nelt);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5655,59 +5680,63 @@ vect_shift_permute_load_chain (vec<tree>
supported by target\n");
return false;
}
- perm3_mask = vect_gen_perm_mask_checked (vectype, sel);
+ perm3_mask = vect_gen_perm_mask_checked (vectype, indices);
/* Generating permutation constant to shift all elements.
For vector length 8 it is {6 7 8 9 10 11 12 13}. */
for (i = 0; i < nelt; i++)
sel[i] = 2 * (nelt / 3) + (nelt % 3) + i;
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"shift permutation is not supported by target\n");
return false;
}
- shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
+ shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
/* Generating permutation constant to shift all elements.
For vector length 8 it is {5 6 7 8 9 10 11 12}. */
for (i = 0; i < nelt; i++)
sel[i] = 2 * (nelt / 3) + 1 + i;
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"shift permutation is not supported by target\n");
return false;
}
- shift2_mask = vect_gen_perm_mask_checked (vectype, sel);
+ shift2_mask = vect_gen_perm_mask_checked (vectype, indices);
/* Generating permutation constant to shift all elements.
For vector length 8 it is {3 4 5 6 7 8 9 10}. */
for (i = 0; i < nelt; i++)
sel[i] = (nelt / 3) + (nelt % 3) / 2 + i;
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"shift permutation is not supported by target\n");
return false;
}
- shift3_mask = vect_gen_perm_mask_checked (vectype, sel);
+ shift3_mask = vect_gen_perm_mask_checked (vectype, indices);
/* Generating permutation constant to shift all elements.
For vector length 8 it is {5 6 7 8 9 10 11 12}. */
for (i = 0; i < nelt; i++)
sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i;
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
{
if (dump_enabled_p ())
dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
"shift permutation is not supported by target\n");
return false;
}
- shift4_mask = vect_gen_perm_mask_checked (vectype, sel);
+ shift4_mask = vect_gen_perm_mask_checked (vectype, indices);
for (k = 0; k < 3; k++)
{
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c 2018-01-02 17:01:26.632719721 +0000
+++ gcc/tree-vect-slp.c 2018-01-02 17:01:28.749627263 +0000
@@ -894,7 +894,7 @@ vect_build_slp_tree_1 (vec_info *vinfo,
&& TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
{
unsigned int count = TYPE_VECTOR_SUBPARTS (vectype);
- auto_vec_perm_indices sel (count);
+ vec_perm_builder sel (count, count, 1);
for (i = 0; i < count; ++i)
{
unsigned int elt = i;
@@ -902,7 +902,8 @@ vect_build_slp_tree_1 (vec_info *vinfo,
elt += count;
sel.quick_push (elt);
}
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ vec_perm_indices indices (sel, 2, count);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
{
for (i = 0; i < group_size; ++i)
if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)
@@ -3570,8 +3571,9 @@ vect_transform_slp_perm_load (slp_tree n
(int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
mask_type = get_vectype_for_scalar_type (mask_element_type);
nunits = TYPE_VECTOR_SUBPARTS (vectype);
- auto_vec_perm_indices mask (nunits);
+ vec_perm_builder mask (nunits, nunits, 1);
mask.quick_grow (nunits);
+ vec_perm_indices indices;
/* Initialize the vect stmts of NODE to properly insert the generated
stmts later. */
@@ -3644,10 +3646,10 @@ vect_transform_slp_perm_load (slp_tree n
noop_p = false;
mask[index++] = mask_element;
- if (index == nunits)
+ if (index == nunits && !noop_p)
{
- if (! noop_p
- && ! can_vec_perm_const_p (mode, mask))
+ indices.new_vector (mask, 2, nunits);
+ if (!can_vec_perm_const_p (mode, indices))
{
if (dump_enabled_p ())
{
@@ -3655,16 +3657,19 @@ vect_transform_slp_perm_load (slp_tree n
vect_location,
"unsupported vect permute { ");
for (i = 0; i < nunits; ++i)
- dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]);
+ dump_printf (MSG_MISSED_OPTIMIZATION,
+ HOST_WIDE_INT_PRINT_DEC " ", mask[i]);
dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
}
gcc_assert (analyze_only);
return false;
}
- if (! noop_p)
- ++*n_perms;
+ ++*n_perms;
+ }
+ if (index == nunits)
+ {
if (!analyze_only)
{
tree mask_vec = NULL_TREE;
@@ -3797,7 +3802,7 @@ vect_schedule_slp_instance (slp_tree nod
enum tree_code code0 = gimple_assign_rhs_code (stmt);
enum tree_code ocode = ERROR_MARK;
gimple *ostmt;
- auto_vec_perm_indices mask (group_size);
+ vec_perm_builder mask (group_size, group_size, 1);
FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt)
if (gimple_assign_rhs_code (ostmt) != code0)
{
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c 2018-01-02 17:01:26.632719721 +0000
+++ gcc/tree-vect-stmts.c 2018-01-02 17:01:28.750627219 +0000
@@ -1717,13 +1717,14 @@ perm_mask_for_reverse (tree vectype)
nunits = TYPE_VECTOR_SUBPARTS (vectype);
- auto_vec_perm_indices sel (nunits);
+ vec_perm_builder sel (nunits, nunits, 1);
for (i = 0; i < nunits; ++i)
sel.quick_push (nunits - 1 - i);
- if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+ vec_perm_indices indices (sel, 1, nunits);
+ if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
return NULL_TREE;
- return vect_gen_perm_mask_checked (vectype, sel);
+ return vect_gen_perm_mask_checked (vectype, indices);
}
/* A subroutine of get_load_store_type, with a subset of the same
@@ -2185,27 +2186,32 @@ vectorizable_mask_load_store (gimple *st
{
modifier = WIDEN;
- auto_vec_perm_indices sel (gather_off_nunits);
+ vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
for (i = 0; i < gather_off_nunits; ++i)
sel.quick_push (i | nunits);
- perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+ vec_perm_indices indices (sel, 1, gather_off_nunits);
+ perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+ indices);
}
else if (nunits == gather_off_nunits * 2)
{
modifier = NARROW;
- auto_vec_perm_indices sel (nunits);
+ vec_perm_builder sel (nunits, nunits, 1);
sel.quick_grow (nunits);
for (i = 0; i < nunits; ++i)
sel[i] = i < gather_off_nunits
? i : i + nunits - gather_off_nunits;
+ vec_perm_indices indices (sel, 2, nunits);
+ perm_mask = vect_gen_perm_mask_checked (vectype, indices);
- perm_mask = vect_gen_perm_mask_checked (vectype, sel);
ncopies *= 2;
+
for (i = 0; i < nunits; ++i)
sel[i] = i | gather_off_nunits;
- mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel);
+ indices.new_vector (sel, 2, gather_off_nunits);
+ mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices);
}
else
gcc_unreachable ();
@@ -2498,12 +2504,13 @@ vectorizable_bswap (gimple *stmt, gimple
unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
unsigned word_bytes = num_bytes / nunits;
- auto_vec_perm_indices elts (num_bytes);
+ vec_perm_builder elts (num_bytes, num_bytes, 1);
for (unsigned i = 0; i < nunits; ++i)
for (unsigned j = 0; j < word_bytes; ++j)
elts.quick_push ((i + 1) * word_bytes - j - 1);
- if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts))
+ vec_perm_indices indices (elts, 1, num_bytes);
+ if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices))
return false;
if (! vec_stmt)
@@ -5826,22 +5833,25 @@ vectorizable_store (gimple *stmt, gimple
{
modifier = WIDEN;
- auto_vec_perm_indices sel (scatter_off_nunits);
+ vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1);
for (i = 0; i < (unsigned int) scatter_off_nunits; ++i)
sel.quick_push (i | nunits);
- perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+ vec_perm_indices indices (sel, 1, scatter_off_nunits);
+ perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+ indices);
gcc_assert (perm_mask != NULL_TREE);
}
else if (nunits == (unsigned int) scatter_off_nunits * 2)
{
modifier = NARROW;
- auto_vec_perm_indices sel (nunits);
+ vec_perm_builder sel (nunits, nunits, 1);
for (i = 0; i < (unsigned int) nunits; ++i)
sel.quick_push (i | scatter_off_nunits);
- perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+ vec_perm_indices indices (sel, 2, nunits);
+ perm_mask = vect_gen_perm_mask_checked (vectype, indices);
gcc_assert (perm_mask != NULL_TREE);
ncopies *= 2;
}
@@ -6862,22 +6872,25 @@ vectorizable_load (gimple *stmt, gimple_
{
modifier = WIDEN;
- auto_vec_perm_indices sel (gather_off_nunits);
+ vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
for (i = 0; i < gather_off_nunits; ++i)
sel.quick_push (i | nunits);
- perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+ vec_perm_indices indices (sel, 1, gather_off_nunits);
+ perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+ indices);
}
else if (nunits == gather_off_nunits * 2)
{
modifier = NARROW;
- auto_vec_perm_indices sel (nunits);
+ vec_perm_builder sel (nunits, nunits, 1);
for (i = 0; i < nunits; ++i)
sel.quick_push (i < gather_off_nunits
? i : i + nunits - gather_off_nunits);
- perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+ vec_perm_indices indices (sel, 2, nunits);
+ perm_mask = vect_gen_perm_mask_checked (vectype, indices);
ncopies *= 2;
}
else
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c 2018-01-02 17:01:26.631719765 +0000
+++ gcc/tree-vect-generic.c 2018-01-02 17:01:28.748627306 +0000
@@ -1299,15 +1299,13 @@ lower_vec_perm (gimple_stmt_iterator *gs
mask = gimple_assign_rhs1 (def_stmt);
}
- if (TREE_CODE (mask) == VECTOR_CST)
- {
- auto_vec_perm_indices sel_int (elements);
-
- for (i = 0; i < elements; ++i)
- sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i))
- & (2 * elements - 1));
+ vec_perm_builder sel_int;
- if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int))
+ if (TREE_CODE (mask) == VECTOR_CST
+ && tree_to_vec_perm_builder (&sel_int, mask))
+ {
+ vec_perm_indices indices (sel_int, 2, elements);
+ if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices))
{
gimple_assign_set_rhs3 (stmt, mask);
update_stmt (stmt);
@@ -1319,14 +1317,14 @@ lower_vec_perm (gimple_stmt_iterator *gs
!= CODE_FOR_nothing
&& TREE_CODE (vec1) == VECTOR_CST
&& initializer_zerop (vec1)
- && sel_int[0]
- && sel_int[0] < elements)
+ && indices[0]
+ && indices[0] < elements)
{
for (i = 1; i < elements; ++i)
{
- unsigned int expected = i + sel_int[0];
+ unsigned int expected = i + indices[0];
/* Indices into the second vector are all equivalent. */
- if (MIN (elements, (unsigned) sel_int[i])
+ if (MIN (elements, (unsigned) indices[i])
!= MIN (elements, expected))
break;
}
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c 2018-01-02 17:01:26.631719765 +0000
+++ gcc/tree-vect-loop.c 2018-01-02 17:01:28.749627263 +0000
@@ -3713,12 +3713,11 @@ vect_estimate_min_profitable_iters (loop
vector elements (not bits) for a vector with NELT elements. */
static void
calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt,
- vec_perm_indices *sel)
+ vec_perm_builder *sel)
{
- unsigned int i;
-
- for (i = 0; i < nelt; i++)
- sel->quick_push ((i + offset) & (2 * nelt - 1));
+ sel->new_vector (nelt, nelt, 1);
+ for (unsigned int i = 0; i < nelt; i++)
+ sel->quick_push (i + offset);
}
/* Checks whether the target supports whole-vector shifts for vectors of mode
@@ -3731,13 +3730,13 @@ have_whole_vector_shift (machine_mode mo
return true;
unsigned int i, nelt = GET_MODE_NUNITS (mode);
- auto_vec_perm_indices sel (nelt);
-
+ vec_perm_builder sel;
+ vec_perm_indices indices;
for (i = nelt/2; i >= 1; i/=2)
{
- sel.truncate (0);
calc_vec_perm_mask_for_shift (i, nelt, &sel);
- if (!can_vec_perm_const_p (mode, sel, false))
+ indices.new_vector (sel, 2, nelt);
+ if (!can_vec_perm_const_p (mode, indices, false))
return false;
}
return true;
@@ -5055,7 +5054,8 @@ vect_create_epilog_for_reduction (vec<tr
if (reduce_with_shift && !slp_reduc)
{
int nelements = vec_size_in_bits / element_bitsize;
- auto_vec_perm_indices sel (nelements);
+ vec_perm_builder sel;
+ vec_perm_indices indices;
int elt_offset;
@@ -5079,9 +5079,9 @@ vect_create_epilog_for_reduction (vec<tr
elt_offset >= 1;
elt_offset /= 2)
{
- sel.truncate (0);
calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel);
- tree mask = vect_gen_perm_mask_any (vectype, sel);
+ indices.new_vector (sel, 2, nelements);
+ tree mask = vect_gen_perm_mask_any (vectype, indices);
epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR,
new_temp, zero_vec, mask);
new_name = make_ssa_name (vec_dest, epilog_stmt);
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2018-01-02 17:01:26.602721036 +0000
+++ gcc/config/aarch64/aarch64.c 2018-01-02 17:01:28.736627829 +0000
@@ -13252,7 +13252,7 @@ #define MAX_VECT_LEN 16
struct expand_vec_perm_d
{
rtx target, op0, op1;
- auto_vec_perm_indices perm;
+ vec_perm_indices perm;
machine_mode vmode;
bool one_vector_p;
bool testing_p;
@@ -13642,10 +13642,7 @@ aarch64_expand_vec_perm_const_1 (struct
unsigned int nelt = d->perm.length ();
if (d->perm[0] >= nelt)
{
- gcc_assert (nelt == (nelt & -nelt));
- for (unsigned int i = 0; i < nelt; ++i)
- d->perm[i] ^= nelt; /* Keep the same index, but in the other vector. */
-
+ d->perm.rotate_inputs (1);
std::swap (d->op0, d->op1);
}
@@ -13685,12 +13682,10 @@ aarch64_vectorize_vec_perm_const (machin
/* Calculate whether all elements are in one vector. */
unsigned int nelt = sel.length ();
- d.perm.reserve (nelt);
for (i = which = 0; i < nelt; ++i)
{
unsigned int ei = sel[i] & (2 * nelt - 1);
which |= (ei < nelt ? 1 : 2);
- d.perm.quick_push (ei);
}
switch (which)
@@ -13709,8 +13704,6 @@ aarch64_vectorize_vec_perm_const (machin
input vector. */
/* Fall Through. */
case 2:
- for (i = 0; i < nelt; ++i)
- d.perm[i] &= nelt - 1;
d.op0 = op1;
d.one_vector_p = true;
break;
@@ -13721,6 +13714,8 @@ aarch64_vectorize_vec_perm_const (machin
break;
}
+ d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+
if (!d.testing_p)
return aarch64_expand_vec_perm_const_1 (&d);
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c 2018-01-02 17:01:26.604720948 +0000
+++ gcc/config/arm/arm.c 2018-01-02 17:01:28.739627698 +0000
@@ -28854,7 +28854,7 @@ #define MAX_VECT_LEN 16
struct expand_vec_perm_d
{
rtx target, op0, op1;
- auto_vec_perm_indices perm;
+ vec_perm_indices perm;
machine_mode vmode;
bool one_vector_p;
bool testing_p;
@@ -29362,9 +29362,7 @@ arm_expand_vec_perm_const_1 (struct expa
unsigned int nelt = d->perm.length ();
if (d->perm[0] >= nelt)
{
- for (unsigned int i = 0; i < nelt; ++i)
- d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1);
-
+ d->perm.rotate_inputs (1);
std::swap (d->op0, d->op1);
}
@@ -29404,12 +29402,10 @@ arm_vectorize_vec_perm_const (machine_mo
d.testing_p = !target;
nelt = GET_MODE_NUNITS (d.vmode);
- d.perm.reserve (nelt);
for (i = which = 0; i < nelt; ++i)
{
int ei = sel[i] & (2 * nelt - 1);
which |= (ei < nelt ? 1 : 2);
- d.perm.quick_push (ei);
}
switch (which)
@@ -29428,8 +29424,6 @@ arm_vectorize_vec_perm_const (machine_mo
input vector. */
/* FALLTHRU */
case 2:
- for (i = 0; i < nelt; ++i)
- d.perm[i] &= nelt - 1;
d.op0 = op1;
d.one_vector_p = true;
break;
@@ -29440,6 +29434,8 @@ arm_vectorize_vec_perm_const (machine_mo
break;
}
+ d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+
if (d.testing_p)
return arm_expand_vec_perm_const_1 (&d);
Index: gcc/config/powerpcspe/powerpcspe.c
===================================================================
--- gcc/config/powerpcspe/powerpcspe.c 2018-01-02 17:01:26.617720379 +0000
+++ gcc/config/powerpcspe/powerpcspe.c 2018-01-02 17:01:28.742627568 +0000
@@ -38782,7 +38782,7 @@ rs6000_expand_extract_even (rtx target,
{
machine_mode vmode = GET_MODE (target);
unsigned i, nelt = GET_MODE_NUNITS (vmode);
- vec_perm_builder perm (nelt);
+ vec_perm_builder perm (nelt, nelt, 1);
for (i = 0; i < nelt; i++)
perm.quick_push (i * 2);
@@ -38797,7 +38797,7 @@ rs6000_expand_interleave (rtx target, rt
{
machine_mode vmode = GET_MODE (target);
unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
- vec_perm_builder perm (nelt);
+ vec_perm_builder perm (nelt, nelt, 1);
high = (highp ? 0 : nelt / 2);
for (i = 0; i < nelt / 2; i++)
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c 2018-01-02 17:01:26.621720203 +0000
+++ gcc/config/rs6000/rs6000.c 2018-01-02 17:01:28.745627437 +0000
@@ -36042,7 +36042,7 @@ rs6000_expand_extract_even (rtx target,
{
machine_mode vmode = GET_MODE (target);
unsigned i, nelt = GET_MODE_NUNITS (vmode);
- vec_perm_builder perm (nelt);
+ vec_perm_builder perm (nelt, nelt, 1);
for (i = 0; i < nelt; i++)
perm.quick_push (i * 2);
@@ -36057,7 +36057,7 @@ rs6000_expand_interleave (rtx target, rt
{
machine_mode vmode = GET_MODE (target);
unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
- vec_perm_builder perm (nelt);
+ vec_perm_builder perm (nelt, nelt, 1);
high = (highp ? 0 : nelt / 2);
for (i = 0; i < nelt / 2; i++)
next prev parent reply other threads:[~2018-01-02 18:30 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-09 23:06 [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Sandiford
2017-12-09 23:08 ` [01/13] Add a qimode_for_vec_perm helper function Richard Sandiford
2017-12-18 13:34 ` Richard Biener
2017-12-09 23:09 ` [02/13] Pass vec_perm_indices by reference Richard Sandiford
2017-12-12 14:23 ` Richard Biener
2017-12-09 23:11 ` [03/13] Split can_vec_perm_p into can_vec_perm_{var,const}_p Richard Sandiford
2017-12-12 14:25 ` Richard Biener
2017-12-09 23:13 ` [04/13] Refactor expand_vec_perm Richard Sandiford
2017-12-12 15:17 ` Richard Biener
2017-12-09 23:17 ` [05/13] Remove vec_perm_const optab Richard Sandiford
2017-12-12 15:26 ` Richard Biener
2017-12-20 13:42 ` Richard Sandiford
2017-12-09 23:18 ` [06/13] Check whether a vector of QIs can store all indices Richard Sandiford
2017-12-12 15:27 ` Richard Biener
2017-12-09 23:20 ` [08/13] Add a vec_perm_indices_to_tree helper function Richard Sandiford
2017-12-18 13:34 ` Richard Biener
2017-12-09 23:20 ` [07/13] Make vec_perm_indices use new vector encoding Richard Sandiford
2017-12-12 15:32 ` Richard Biener
2017-12-12 15:47 ` Richard Sandiford
2017-12-14 10:37 ` Richard Biener
2017-12-20 13:48 ` Richard Sandiford
2018-01-02 13:15 ` Richard Biener
2018-01-02 18:30 ` Richard Sandiford [this message]
2017-12-09 23:21 ` [09/13] Use explicit encodings for simple permutes Richard Sandiford
2017-12-19 20:37 ` Richard Sandiford
2018-01-02 13:07 ` Richard Biener
2017-12-09 23:23 ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
2017-12-09 23:24 ` [11/13] Use vec_perm_builder::series_p in shift_amt_for_vec_perm_mask Richard Sandiford
2017-12-19 20:37 ` Richard Sandiford
2018-01-02 13:08 ` Richard Biener
2017-12-09 23:25 ` [12/13] Use ssizetype selectors for autovectorised VEC_PERM_EXPRs Richard Sandiford
2017-12-19 20:37 ` Richard Sandiford
2018-01-02 13:09 ` Richard Biener
2017-12-19 20:37 ` [10/13] Rework VEC_PERM_EXPR folding Richard Sandiford
2018-01-02 13:08 ` Richard Biener
2017-12-09 23:27 ` [13/13] [AArch64] Use vec_perm_indices helper routines Richard Sandiford
2017-12-19 20:37 ` Richard Sandiford
2018-01-04 11:28 ` Richard Sandiford
2018-01-09 12:18 ` James Greenhalgh
2018-01-09 16:24 ` RFA: Expand vec_perm_indices::series_p comment Richard Sandiford
2018-01-29 20:56 ` Ping: " Richard Sandiford
2018-01-30 7:20 ` Jeff Law
2017-12-12 14:12 ` [00/13] Make VEC_PERM_EXPR work for variable-length vectors Richard Biener
2017-12-12 15:32 ` Richard Sandiford
2017-12-12 15:38 ` Richard Biener
2017-12-12 15:57 ` Richard Sandiford
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87po6snmej.fsf@linaro.org \
--to=richard.sandiford@linaro.org \
--cc=gcc-patches@gcc.gnu.org \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).