From: Yuri Rumyantsev <ysrumyan@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: Jeff Law <law@redhat.com>, gcc-patches <gcc-patches@gcc.gnu.org>,
Igor Zamyatin <izamyatin@gmail.com>
Subject: Re: [PATCH] Simple optimization for MASK_STORE.
Date: Thu, 05 Nov 2015 15:49:00 -0000 [thread overview]
Message-ID: <CAEoMCqTaEp2z048fq4X_eKg8YNXcTZUBGN3s9Dg1+czJ1xqexg@mail.gmail.com> (raw)
In-Reply-To: <CAEoMCqR=nYnqLibLbdStqXM1WOu1cwn9mVc_Se26ALmYg_ze=g@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 7230 bytes --]
Hi All!
I prepared another patch which performs insertion additional check on
zero mask for masked stores if only parameter
PARAM_ZERO_TEST_FOR_MASK_STORE has non-zero value. My attempt to use
approach proposed by Richard with simpler alternative for comparison -
use scalar type for 256-bit was not successful and I returned to
vectori comparison with scalar Boolean result.
ChangeLog:
2015-11-05 Yuri Rumyantsev <ysrumyan@gmail.com>
* config/i386/i386.c: Add conditional initialization of
PARAM_ZERO_TEST_FOR_MASK_STORE.
(ix86_expand_branch): Implement vector comparison with boolean result.
* config/i386/i386.h: New macros TARGET_OPTIMIZE_MASK_STORE.
* config/i386/sse.md (define_expand "cbranch<mode>4): Add define-expand
for vector comparion with eq/ne only.
* config/i386/x86-tune.def: New macros X86_TUNE_OPTIMIZE_MASK_STORE.
* fold-const.c (fold_relational_const): Add handling of vector
comparison with boolean result.
* params.def (PARAM_ZERO_TEST_FOR_MASK_STORE): New DEFPARAM.
* params.h (ENABLE_ZERO_TEST_FOR_MASK_STORE): New macros.
* tree-cfg.c (verify_gimple_comparison): Add argument CODE, allow
comparison of vector operands with boolean result for EQ/NE only.
(verify_gimple_assign_binary): Adjust call for verify_gimple_comparison.
(verify_gimple_cond): Likewise.
* tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not
combine vector comparison with boolean result and VEC_COND_EXPR that
has vector result.
* tree-vect-loop.c (is_valid_sink): New function.
(optimize_mask_stores): Likewise.
* tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
has_mask_store field of vect_info.
* tree-vectorizer.c (vectorize_loops): Invoke optimaze_mask_stores for
vectorized loops having masked stores.
* tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
correspondent macros.
(optimize_mask_stores): Add prototype.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
* gcc.target/i386/avx2-vect-mask-store-move2.c: Likewise.
2015-11-02 18:24 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Hi Richard,
>
> I've come back to this optimization and try to implement your proposal
> for comparison:
>> Btw, you didn't try the simpler alternative of
>>
>> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype)));
>> build2 (EQ_EXPR, boolean_type_node,
>> build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1));
>>
>> ? That is, use the GIMPLE level equivalent of
>> (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI))
>
> using the following code:
>
> vectype = TREE_TYPE (mask);
> ext_mode = mode_for_size (GET_MODE_BITSIZE (TYPE_MODE (vectype)),
> MODE_INT, 0);
> ext_type = lang_hooks.types.type_for_mode (ext_mode , 1);
>
> but I've got zero type for it. Should I miss something?
>
> Any help will be appreciated.
> Yuri.
>
>
> 2015-08-13 14:40 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi Richard,
>>>
>>> Did you have a chance to look at updated patch?
>>
>> Having a quick look now. Btw, you didn't try the simpler alternative of
>>
>> tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype)));
>> build2 (EQ_EXPR, boolean_type_node,
>> build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1));
>>
>> ? That is, use the GIMPLE level equivalent of
>>
>> (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI))
>>
>> ? That should be supported by the expander already, though again not sure if
>> the target(s) have compares that match this.
>>
>> Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction
>> on EQ/NE_EXPR
>> is missing. Operand type equality is tested anyway.
>>
>> Why do you need to restrict forward_propagate_into_comparison_1?
>>
>> Otherwise this looks better, but can you try with the VIEW_CONVERT as well?
>>
>> Thanks,
>> Richard.
>>
>>
>>> Thanks.
>>> Yuri.
>>>
>>> 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>>>> HI All,
>>>>
>>>> Here is updated patch which implements Richard proposal to use vector
>>>> comparison with boolean result instead of target hook. Support for it
>>>> was added to ix86_expand_branch.
>>>>
>>>> Any comments will be appreciated.
>>>>
>>>> Bootstrap and regression testing did not show any new failures.
>>>>
>>>> ChangeLog:
>>>> 2015-08-06 Yuri Rumyantsev <ysrumyan@gmail.com>
>>>>
>>>> * config/i386/i386.c (ix86_expand_branch): Implement vector
>>>> comparison with boolean result.
>>>> * config/i386/sse.md (define_expand "cbranch<mode>4): Add define
>>>> for vector comparion.
>>>> * fold-const.c (fold_relational_const): Add handling of vector
>>>> comparison with boolean result.
>>>> * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM.
>>>> * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros.
>>>> * tree-cfg.c (verify_gimple_comparison): Add test for vector
>>>> comparion with boolean result.
>>>> * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not
>>>> propagate vector comparion with boolean result.
>>>> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize
>>>> has_mask_store field of vect_info.
>>>> * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h.
>>>> (is_valid_sink): New function.
>>>> (optimize_mask_stores): New function.
>>>> (vectorize_loops): Invoke optimaze_mask_stores for loops having masked
>>>> stores.
>>>> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and
>>>> correspondent macros.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test.
>>>>
>>>>
>>>> 2015-07-27 11:48 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law <law@redhat.com> wrote:
>>>>>> On 07/24/2015 03:16 AM, Richard Biener wrote:
>>>>>>>>
>>>>>>>> Is there any rationale given anywhere for the transformation into
>>>>>>>> conditional expressions? ie, is there any reason why we can't have a
>>>>>>>> GIMPLE_COND where the expression is a vector condition?
>>>>>>>
>>>>>>>
>>>>>>> No rationale for equality compare which would have the semantic of
>>>>>>> having all elements equal or not equal. But you can't define a sensible
>>>>>>> ordering (that HW implements) for other compare operators and you
>>>>>>> obviously need a single boolean result, not a vector of element comparison
>>>>>>> results.
>>>>>>
>>>>>> Right. EQ/NE only as others just don't have any real meaning.
>>>>>>
>>>>>>
>>>>>>> I've already replied that I'm fine allowing ==/!= whole-vector compares.
>>>>>>> But one needs to check whether expansion does anything sensible
>>>>>>> with them (either expand to integer subreg compares or add optabs
>>>>>>> for the compares).
>>>>>>
>>>>>> Agreed, EQ/NE for whole vector compares only would be fine for me too under
>>>>>> the same conditions.
>>>>>
>>>>> Btw, you can already do this on GIMPLE by doing
>>>>>
>>>>> TImode vec_as_int = VIEW_CONVERT_EXPR <TImode> (vec_2);
>>>>> if (vec_as_int == 0)
>>>>> ...
>>>>>
>>>>> which is what the RTL will look like in the end. So not sure if making this
>>>>> higher-level in GIMPLE is good or required.
>>>>>
>>>>> Richard.
>>>>>
>>>>>> jeff
[-- Attachment #2: patch.5 --]
[-- Type: application/octet-stream, Size: 19246 bytes --]
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 2a965f6..d41741d 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -5313,6 +5313,12 @@ ix86_option_override_internal (bool main_args_p,
opts->x_param_values,
opts_set->x_param_values);
+ if (TARGET_OPTIMIZE_MASK_STORE)
+ maybe_set_param_value (PARAM_ZERO_TEST_FOR_MASK_STORE,
+ 1,
+ opts->x_param_values,
+ opts_set->x_param_values);
+
/* Enable sw prefetching at -O3 for CPUS that prefetching is helpful. */
if (opts->x_flag_prefetch_loop_arrays < 0
&& HAVE_prefetch
@@ -21590,6 +21596,38 @@ ix86_expand_branch (enum rtx_code code, rtx op0, rtx op1, rtx label)
machine_mode mode = GET_MODE (op0);
rtx tmp;
+ /* Handle special case - vector comparsion with boolean result, transform
+ it using ptest instruction. */
+ if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ {
+ rtx lhs;
+ rtx flag;
+ machine_mode p_mode = GET_MODE_SIZE (mode) == 32 ? V4DImode : V2DImode;
+ gcc_assert (code == EQ || code == NE);
+ if (!REG_P (op0))
+ op0 = force_reg (mode, op0);
+ if (!REG_P (op1))
+ op1 = force_reg (mode, op1);
+ /* Generate subtraction since we can't check that one operand is
+ zero vector. */
+ lhs = gen_reg_rtx (mode);
+ emit_insn (gen_rtx_SET (lhs,
+ gen_rtx_MINUS (mode, op0, op1)));
+ lhs = gen_rtx_SUBREG (p_mode, lhs, 0);
+ tmp = gen_rtx_SET (gen_rtx_REG (CCmode, FLAGS_REG),
+ gen_rtx_UNSPEC (CCmode,
+ gen_rtvec (2, lhs, lhs),
+ UNSPEC_PTEST));
+ emit_insn (tmp);
+ flag = gen_rtx_REG (CCZmode, FLAGS_REG);
+ tmp = gen_rtx_fmt_ee (code, VOIDmode, flag, const0_rtx);
+ tmp = gen_rtx_IF_THEN_ELSE (VOIDmode, tmp,
+ gen_rtx_LABEL_REF (VOIDmode, label),
+ pc_rtx);
+ emit_jump_insn (gen_rtx_SET (pc_rtx, tmp));
+ return;
+ }
+
switch (mode)
{
case SFmode:
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index be96c75..cafc58e 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -496,6 +496,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
ix86_tune_features[X86_TUNE_ADJUST_UNROLL]
#define TARGET_AVOID_FALSE_DEP_FOR_BMI \
ix86_tune_features[X86_TUNE_AVOID_FALSE_DEP_FOR_BMI]
+#define TARGET_OPTIMIZE_MASK_STORE \
+ ix86_tune_features[X86_TUNE_OPTIMIZE_MASK_STORE]
/* Feature tests against the various architecture variations. */
enum ix86_arch_indices {
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 43dcc6a..0f5fd39 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -17999,6 +17999,25 @@
UNSPEC_MASKMOV))]
"TARGET_AVX")
+(define_expand "cbranch<mode>4"
+ [(set (reg:CC FLAGS_REG)
+ (compare:CC (match_operand:V48_AVX2 1 "nonimmediate_operand")
+ (match_operand:V48_AVX2 2 "register_operand")))
+ (set (pc) (if_then_else
+ (match_operator 0 "bt_comparison_operator"
+ [(reg:CC FLAGS_REG) (const_int 0)])
+ (label_ref (match_operand 3))
+ (pc)))]
+ "TARGET_AVX2"
+{
+ if (MEM_P (operands[1]) && MEM_P (operands[2]))
+ operands[1] = force_reg (<MODE>mode, operands[1]);
+ ix86_expand_branch (GET_CODE (operands[0]),
+ operands[1], operands[2], operands[3]);
+ DONE;
+})
+
+
(define_insn_and_split "avx_<castmode><avxsizesuffix>_<castmode>"
[(set (match_operand:AVX256MODE2P 0 "nonimmediate_operand" "=x,m")
(unspec:AVX256MODE2P
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index b2d3921..c5e5b63 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -527,6 +527,10 @@ DEF_TUNE (X86_TUNE_AVOID_VECTOR_DECODE, "avoid_vector_decode",
DEF_TUNE (X86_TUNE_AVOID_FALSE_DEP_FOR_BMI, "avoid_false_dep_for_bmi",
m_SANDYBRIDGE | m_HASWELL | m_GENERIC)
+/* X86_TUNE_OPTMIZE_MASK_STORE: Perform masked store if is its mask is not
+ equal to zero. */
+DEF_TUNE (X86_TUNE_OPTIMIZE_MASK_STORE, "optimize_mask_store", m_HASWELL)
+
/*****************************************************************************/
/* This never worked well before. */
/*****************************************************************************/
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index ee9b349..0ff3307 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -13883,6 +13883,25 @@ fold_relational_const (enum tree_code code, tree type, tree op0, tree op1)
if (TREE_CODE (op0) == VECTOR_CST && TREE_CODE (op1) == VECTOR_CST)
{
+ if (INTEGRAL_TYPE_P (type)
+ && (TREE_CODE (type) == BOOLEAN_TYPE
+ || TYPE_PRECISION (type) == 1))
+ {
+ /* Have vector comparison with scalar boolean result. */
+ bool result = true;
+ gcc_assert (code == EQ_EXPR || code == NE_EXPR);
+ gcc_assert (VECTOR_CST_NELTS (op0) == VECTOR_CST_NELTS (op1));
+ for (unsigned i = 0; i < VECTOR_CST_NELTS (op0); i++)
+ {
+ tree elem0 = VECTOR_CST_ELT (op0, i);
+ tree elem1 = VECTOR_CST_ELT (op1, i);
+ tree tmp = fold_relational_const (code, type, elem0, elem1);
+ result &= integer_onep (tmp);
+ }
+ if (code == NE_EXPR)
+ result = !result;
+ return constant_boolean_node (result, type);
+ }
unsigned count = VECTOR_CST_NELTS (op0);
tree *elts = XALLOCAVEC (tree, count);
gcc_assert (VECTOR_CST_NELTS (op1) == count
diff --git a/gcc/params.def b/gcc/params.def
index c5d96e7..58bfaed 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1043,6 +1043,12 @@ DEFPARAM (PARAM_MAX_STORES_TO_SINK,
"Maximum number of conditional store pairs that can be sunk.",
2, 0, 0)
+/* Enable inserion test on zero mask for masked stores if non-zero. */
+DEFPARAM (PARAM_ZERO_TEST_FOR_MASK_STORE,
+ "zero-test-for-mask-store",
+ "Enable insertion of test on zero mask for masked stores",
+ 0, 0, 1)
+
/* Override CASE_VALUES_THRESHOLD of when to switch from doing switch
statements via if statements to using a table jump operation. If the value
is 0, the default CASE_VALUES_THRESHOLD will be used. */
diff --git a/gcc/params.h b/gcc/params.h
index 1090d00..2037c73 100644
--- a/gcc/params.h
+++ b/gcc/params.h
@@ -221,6 +221,8 @@ extern void init_param_values (int *params);
PARAM_VALUE (PARAM_MIN_NONDEBUG_INSN_UID)
#define MAX_STORES_TO_SINK \
PARAM_VALUE (PARAM_MAX_STORES_TO_SINK)
+#define ENABLE_ZERO_TEST_FOR_MASK_STORE \
+ PARAM_VALUE (PARAM_ZERO_TEST_FOR_MASK_STORE)
#define ALLOW_LOAD_DATA_RACES \
PARAM_VALUE (PARAM_ALLOW_LOAD_DATA_RACES)
#define ALLOW_STORE_DATA_RACES \
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
new file mode 100644
index 0000000..d926823
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move1.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-options "-march=core-avx2 -O3 -fdump-tree-vect-details" } */
+
+extern int *p1, *p2, *p3;
+int c[256];
+void foo (int n)
+{
+ int i;
+ for (i = 0; i < n; i++)
+ if (c[i])
+ {
+ p1[i] += 1;
+ p2[i] = p3[i] +2;
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 2 "vect" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move2.c b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move2.c
new file mode 100644
index 0000000..41d0596
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx2-vect-mask-store-move2.c
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target avx2 } */
+/* { dg-options "-march=core-avx2 -O3 -fdump-tree-vect-details" } */
+
+extern int *p1, *p2, *p3;
+int c[256];
+/* All masked load/stores must be put into one new bb. */
+
+void foo (int n)
+{
+ int i;
+ for (i = 0; i < n; i++)
+ if (c[i])
+ {
+ p1[i] -= 1;
+ p2[i] = p3[i];
+ }
+}
+
+/* { dg-final { scan-tree-dump-times "Move MASK_STORE to new bb" 1 "vect" } } */
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index cfed3c2..6a9dc27 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -3408,10 +3408,10 @@ verify_gimple_call (gcall *stmt)
}
/* Verifies the gimple comparison with the result type TYPE and
- the operands OP0 and OP1. */
+ the operands OP0 and OP1, comparison code is CODE. */
static bool
-verify_gimple_comparison (tree type, tree op0, tree op1)
+verify_gimple_comparison (tree type, tree op0, tree op1, enum tree_code code)
{
tree op0_type = TREE_TYPE (op0);
tree op1_type = TREE_TYPE (op1);
@@ -3448,10 +3448,20 @@ verify_gimple_comparison (tree type, tree op0, tree op1)
if (TREE_CODE (op0_type) == VECTOR_TYPE
|| TREE_CODE (op1_type) == VECTOR_TYPE)
{
- error ("vector comparison returning a boolean");
- debug_generic_expr (op0_type);
- debug_generic_expr (op1_type);
- return true;
+ /* Allow vector comparison returning boolean if operand types
+ are equal and CODE is EQ/NE. */
+ if ((code != EQ_EXPR && code != NE_EXPR)
+ || TREE_CODE (op0_type) != TREE_CODE (op1_type)
+ || TYPE_VECTOR_SUBPARTS (op0_type)
+ != TYPE_VECTOR_SUBPARTS (op1_type)
+ || GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op0_type)))
+ != GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op1_type))))
+ {
+ error ("type mismatch for vector comparison returning a boolean");
+ debug_generic_expr (op0_type);
+ debug_generic_expr (op1_type);
+ return true;
+ }
}
}
/* Or a boolean vector type with the same element count
@@ -3832,7 +3842,7 @@ verify_gimple_assign_binary (gassign *stmt)
case LTGT_EXPR:
/* Comparisons are also binary, but the result type is not
connected to the operand types. */
- return verify_gimple_comparison (lhs_type, rhs1, rhs2);
+ return verify_gimple_comparison (lhs_type, rhs1, rhs2, rhs_code);
case WIDEN_MULT_EXPR:
if (TREE_CODE (lhs_type) != INTEGER_TYPE)
@@ -4541,7 +4551,8 @@ verify_gimple_cond (gcond *stmt)
return verify_gimple_comparison (boolean_type_node,
gimple_cond_lhs (stmt),
- gimple_cond_rhs (stmt));
+ gimple_cond_rhs (stmt),
+ gimple_cond_code (stmt));
}
/* Verify the GIMPLE statement STMT. Returns true if there is an
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index b82ae3c..07c1fa7 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -422,6 +422,15 @@ forward_propagate_into_comparison_1 (gimple *stmt,
enum tree_code def_code = gimple_assign_rhs_code (def_stmt);
bool invariant_only_p = !single_use0_p;
+ /* Can't combine vector comparison with scalar boolean type of
+ the result and VEC_COND_EXPR having vector type of comparison. */
+ if (TREE_CODE (TREE_TYPE (op0)) == VECTOR_TYPE
+ && INTEGRAL_TYPE_P (type)
+ && (TREE_CODE (type) == BOOLEAN_TYPE
+ || TYPE_PRECISION (type) == 1)
+ && def_code == VEC_COND_EXPR)
+ return NULL_TREE;
+
rhs0 = rhs_to_tree (TREE_TYPE (op1), def_stmt);
/* Always combine comparisons or conversions from booleans. */
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 43ada18..f202d98 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -6566,3 +6566,170 @@ vect_transform_loop (loop_vec_info loop_vinfo)
dump_printf (MSG_NOTE, "\n");
}
}
+
+/* Helper for optimize_mask_stores: returns true if STMT sinking to end
+ of BB is valid and false otherwise. */
+
+static bool
+is_valid_sink (gimple *stmt, gimple *last_store)
+{
+ tree vdef;
+ imm_use_iterator imm_it;
+ use_operand_p use_p;
+ basic_block bb = gimple_bb (stmt);
+
+ if (is_gimple_call (stmt)
+ && !gimple_call_internal_p (stmt))
+ /* Do not consider non-internal call as valid to sink. */
+ return false;
+
+ if ((vdef = gimple_vdef (stmt)))
+ {
+ /* Check that ther are no store vuses in current bb. */
+ FOR_EACH_IMM_USE_FAST (use_p, imm_it, vdef)
+ if (gimple_bb (USE_STMT (use_p)) == bb)
+ return false;
+ return true;
+ }
+ else if (gimple_vuse (stmt) == NULL_TREE)
+ return true;
+ else if (gimple_vuse (stmt) == gimple_vuse (last_store))
+ return true;
+ return false;
+}
+
+/* The code below is trying to perform simple optimization - do not execute
+ masked store statement if its mask is zero vector since loads that follow
+ a masked store can be blocked. It puts all masked stores with the same
+ mask-vector into the new bb with a check on zero mask. */
+
+void
+optimize_mask_stores (struct loop *loop)
+{
+ basic_block bb = loop->header;
+ gimple_stmt_iterator gsi;
+ gimple *stmt;
+ auto_vec<gimple *> worklist;
+
+ if (ENABLE_ZERO_TEST_FOR_MASK_STORE == 0)
+ return;
+
+ /* Pick up all masked stores in loop if any. */
+ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+ {
+ stmt = gsi_stmt (gsi);
+ if (is_gimple_call (stmt)
+ && gimple_call_internal_p (stmt)
+ && gimple_call_internal_fn (stmt) == IFN_MASK_STORE)
+ worklist.safe_push (stmt);
+ }
+
+ if (worklist.is_empty ())
+ return;
+
+ /* Loop has masked stores. */
+ while (!worklist.is_empty ())
+ {
+ gimple *last, *def_stmt, *last_store;
+ edge e, efalse;
+ tree mask;
+ basic_block store_bb, join_bb;
+ gimple_stmt_iterator gsi_to;
+ tree arg3;
+ tree vdef, new_vdef;
+ gphi *phi;
+ bool first_dump;
+ tree vectype;
+ tree zero;
+
+ last = worklist.pop ();
+ mask = gimple_call_arg (last, 2);
+ /* Create new bb. */
+ e = split_block (bb, last);
+ join_bb = e->dest;
+ store_bb = create_empty_bb (bb);
+ add_bb_to_loop (store_bb, loop);
+ e->flags = EDGE_TRUE_VALUE;
+ efalse = make_edge (bb, store_bb, EDGE_FALSE_VALUE);
+ /* Put STORE_BB to likely part. */
+ efalse->probability = PROB_UNLIKELY;
+ store_bb->frequency = PROB_ALWAYS - EDGE_FREQUENCY (efalse);
+ make_edge (store_bb, join_bb, EDGE_FALLTHRU);
+ if (dom_info_available_p (CDI_DOMINATORS))
+ set_immediate_dominator (CDI_DOMINATORS, store_bb, bb);
+ /* Create vector comparison with boolean result. */
+ vectype = TREE_TYPE (mask);
+ zero = build_zero_cst (TREE_TYPE (vectype));
+ zero = build_vector_from_val (vectype, zero);
+ stmt = gimple_build_cond (EQ_EXPR, mask, zero, NULL_TREE, NULL_TREE);
+ gsi = gsi_last_bb (bb);
+ gsi_insert_after (&gsi, stmt, GSI_SAME_STMT);
+ /* Create new PHI node for vdef of the last masked store:
+ .MEM_2 = VDEF <.MEM_1>
+ will be converted to
+ .MEM.3 = VDEF <.MEM_1>
+ and new PHI node will be created in join bb
+ .MEM_2 = PHI <.MEM_1, .MEM_3>
+ */
+ vdef = gimple_vdef (last);
+ gcc_assert (vdef && TREE_CODE (vdef) == SSA_NAME);
+ new_vdef = make_ssa_name (gimple_vop (cfun), last);
+ phi = create_phi_node (vdef, join_bb);
+ add_phi_arg (phi, new_vdef, EDGE_SUCC (store_bb, 0), UNKNOWN_LOCATION);
+ gimple_set_vdef (last, new_vdef);
+ first_dump = true;
+
+ /* Put all masked stores with the same mask to STORE_BB if possible. */
+ while (true)
+ {
+ /* Move masked store to STORE_BB. */
+ last_store = last;
+ gsi = gsi_for_stmt (last);
+ gsi_to = gsi_start_bb (store_bb);
+ gsi_move_before (&gsi, &gsi_to);
+ update_stmt (last);
+ if (dump_enabled_p ())
+ {
+ /* Issue different messages depending on FIRST_DUMP. */
+ if (first_dump)
+ {
+ dump_printf (MSG_NOTE, "Move MASK_STORE to new bb#%d\n",
+ store_bb->index);
+ first_dump = false;
+ }
+ else
+ dump_printf (MSG_NOTE, "Move MASK_STORE to created bb\n");
+ dump_gimple_stmt (MSG_NOTE, TDF_SLIM, last, 0);
+ }
+ /* Put definition statement of stored value in STORE_BB
+ if possible. */
+ arg3 = gimple_call_arg (last, 3);
+ if (TREE_CODE (arg3) == SSA_NAME && has_single_use (arg3))
+ {
+ def_stmt = SSA_NAME_DEF_STMT (arg3);
+ /* Move def_stmt to STORE_BB if it is in the same bb and
+ it is legal. */
+ if (gimple_bb (def_stmt) == bb
+ && is_valid_sink (def_stmt, last_store))
+ {
+ if (dump_enabled_p ())
+ {
+ dump_printf (MSG_NOTE, "Move stmt to created bb\n");
+ dump_gimple_stmt (MSG_NOTE, TDF_SLIM, def_stmt, 0);
+ }
+ gsi = gsi_for_stmt (def_stmt);
+ gsi_to = gsi_start_bb (store_bb);
+ gsi_move_before (&gsi, &gsi_to);
+ update_stmt (def_stmt);
+ }
+ }
+ /* Put other masked stores with the same mask to STORE_BB. */
+ if (worklist.is_empty ()
+ || gimple_call_arg (worklist.last (), 2) != mask
+ || !is_valid_sink (worklist.last (), last_store))
+ break;
+ last = worklist.pop ();
+ }
+ add_phi_arg (phi, gimple_vuse (last_store), e, UNKNOWN_LOCATION);
+ }
+}
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index ae14075..f8c1e6d 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1968,6 +1968,7 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi,
{
tree vec_rhs = NULL_TREE, vec_mask = NULL_TREE;
prev_stmt_info = NULL;
+ LOOP_VINFO_HAS_MASK_STORE (loop_vinfo) = true;
for (i = 0; i < ncopies; i++)
{
unsigned align, misalign;
diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c
index 7b3d9a3..383b01b 100644
--- a/gcc/tree-vectorizer.c
+++ b/gcc/tree-vectorizer.c
@@ -556,12 +556,18 @@ vectorize_loops (void)
for (i = 1; i < vect_loops_num; i++)
{
loop_vec_info loop_vinfo;
+ bool has_mask_store;
loop = get_loop (cfun, i);
if (!loop)
continue;
loop_vinfo = (loop_vec_info) loop->aux;
+ has_mask_store = false;
+ if (loop_vinfo)
+ has_mask_store = LOOP_VINFO_HAS_MASK_STORE (loop_vinfo);
destroy_loop_vec_info (loop_vinfo, true);
+ if (has_mask_store)
+ optimize_mask_stores (loop);
loop->aux = NULL;
}
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index f77a4eb..9d06752 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -328,6 +328,9 @@ typedef struct _loop_vec_info : public vec_info {
loop version without if-conversion. */
struct loop *scalar_loop;
+ /* Mark loops having masked stores. */
+ bool has_mask_store;
+
} *loop_vec_info;
/* Access Functions. */
@@ -365,6 +368,7 @@ typedef struct _loop_vec_info : public vec_info {
#define LOOP_VINFO_SCALAR_LOOP(L) (L)->scalar_loop
#define LOOP_VINFO_SCALAR_ITERATION_COST(L) (L)->scalar_cost_vec
#define LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST(L) (L)->single_scalar_iteration_cost
+#define LOOP_VINFO_HAS_MASK_STORE(L) (L)->has_mask_store
#define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
((L)->may_misalign_stmts.length () > 0)
@@ -994,6 +998,7 @@ extern void vect_get_vec_defs (tree, tree, gimple *, vec<tree> *,
vec<tree> *, slp_tree, int);
extern tree vect_gen_perm_mask_any (tree, const unsigned char *);
extern tree vect_gen_perm_mask_checked (tree, const unsigned char *);
+extern void optimize_mask_stores (struct loop *);
/* In tree-vect-data-refs.c. */
extern bool vect_can_force_dr_alignment_p (const_tree, unsigned int);
next prev parent reply other threads:[~2015-11-05 15:49 UTC|newest]
Thread overview: 33+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-06 14:04 Yuri Rumyantsev
2015-05-08 9:27 ` Richard Biener
2015-05-08 18:43 ` Jeff Law
2015-05-08 19:16 ` Richard Biener
2015-05-20 14:10 ` Yuri Rumyantsev
2015-05-29 14:28 ` Yuri Rumyantsev
2015-06-09 12:15 ` Richard Biener
2015-06-18 15:41 ` Yuri Rumyantsev
2015-07-07 13:55 ` Yuri Rumyantsev
2015-07-10 5:51 ` Jeff Law
2015-07-20 15:26 ` Yuri Rumyantsev
2015-07-21 13:59 ` Richard Biener
2015-07-23 20:32 ` Jeff Law
2015-07-24 9:04 ` Yuri Rumyantsev
2015-07-24 9:24 ` Richard Biener
2015-07-24 19:26 ` Jeff Law
2015-07-27 9:04 ` Richard Biener
2015-08-06 11:07 ` Yuri Rumyantsev
2015-08-13 11:40 ` Yuri Rumyantsev
2015-08-13 11:46 ` Richard Biener
2015-11-02 15:24 ` Yuri Rumyantsev
2015-11-05 15:49 ` Yuri Rumyantsev [this message]
2015-11-06 12:56 ` Richard Biener
2015-11-06 13:29 ` Yuri Rumyantsev
2015-11-10 12:33 ` Richard Biener
2015-11-10 12:48 ` Ilya Enkovich
2015-11-10 14:46 ` Richard Biener
2015-11-10 14:56 ` Ilya Enkovich
2015-11-10 17:02 ` Mike Stump
2015-11-11 9:18 ` Richard Biener
2015-11-11 13:13 ` Yuri Rumyantsev
2015-11-12 13:59 ` Richard Biener
2015-11-19 15:20 ` Yuri Rumyantsev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAEoMCqTaEp2z048fq4X_eKg8YNXcTZUBGN3s9Dg1+czJ1xqexg@mail.gmail.com \
--to=ysrumyan@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=izamyatin@gmail.com \
--cc=law@redhat.com \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).