From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-431325-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 52293 invoked by alias); 11 Jul 2016 13:39:03 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 52102 invoked by uid 89); 11 Jul 2016 13:39:00 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.0 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=ilya.enkovich@intel.com, ilyaenkovichintelcom, builtins.h, builtinsh
X-HELO: mail-vk0-f42.google.com
Received: from mail-vk0-f42.google.com (HELO mail-vk0-f42.google.com) (209.85.213.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Mon, 11 Jul 2016 13:38:53 +0000
Received: by mail-vk0-f42.google.com with SMTP id f7so123427809vkb.3        for <gcc-patches@gcc.gnu.org>; Mon, 11 Jul 2016 06:38:53 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=FPDMGIVA5CQBYfx6hqrZQ/E1hNnV1xJd57ZrB3nm9MQ=;        b=lJwopXgQX6o7hn59cqOjNM1Yvs6kyz6zd3VyvoIpZl7MBdc99yJvJuQvwfRUGvhKvo         GddttHQyTpeu6j4Zi/xt5dKbahAtj6RNE1Ak5itz8pfWyTG92cV2mgD9vIns1mYFm1kx         990rSxPjGmUvMEKdqqCZGk5T0WrB9q3WQL3TnLQxdTlhkX7IJLXZvdmRY8HfNKpVvEvv         XhB6Ko1eX81Wd1HUOO5rzTRIVmOECFarIt6CE594zNu6B2EmyfnzCeWk/geBu6jULPNt         gn2bPrFshCzLmgYusv5sivRs9wgpBAJV9i3fDPWAfq7OvHaAJttd9NRWPmeJExsfV8SN         Izkg==
X-Gm-Message-State: ALyK8tLXhcOUrwLcW0dMqNMzaI5eXDSPiwZILVJYuB8GovoIQokNBCRpXP5J2D7eTiXBzJZ1J1JjFDSEEx4QFg==
X-Received: by 10.31.76.66 with SMTP id z63mr7014188vka.107.1468244331106; Mon, 11 Jul 2016 06:38:51 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.176.2.244 with HTTP; Mon, 11 Jul 2016 06:38:50 -0700 (PDT)
In-Reply-To: <20160623095422.GC30064@msticlxl57.ims.intel.com>
References: <20160519194208.GF40563@msticlxl57.ims.intel.com> <8c811442-df35-986a-d02d-b9c2669876d2@redhat.com> <CAMbmDYa9vvFq9eKSfWvhWXNyk3s8Or6EXB394fV3yMwWmYbwXQ@mail.gmail.com> <f59318c4-2973-b15f-dcbb-d002896850c6@redhat.com> <20160623095422.GC30064@msticlxl57.ims.intel.com>
From: Ilya Enkovich <enkovich.gnu@gmail.com>
Date: Mon, 11 Jul 2016 13:39:00 -0000
Message-ID: <CAMbmDYZNjXNXAtmoa2b8GTKYXnmWy_KhmdfKp4PA4niYc7mimg@mail.gmail.com>
Subject: Re: [PATCH, vec-tails 05/10] Check if loop can be masked
To: Jeff Law <law@redhat.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>, Yuri Rumyantsev <ysrumyan@gmail.com>, 	Igor Zamyatin <izamyatin@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2016-07/txt/msg00497.txt.bz2

Ping

2016-06-23 12:54 GMT+03:00 Ilya Enkovich <enkovich.gnu@gmail.com>:
> On 22 Jun 11:42, Jeff Law wrote:
>> On 06/22/2016 10:09 AM, Ilya Enkovich wrote:
>>
>> >>Given the common structure & duplication I can't help but wonder if a single
>> >>function should be used for widening/narrowing.  Ultimately can't you swap
>> >>mask_elems/req_elems and always go narrower to wider (using a different
>> >>optab for the two different cases)?
>> >
>> >I think we can't always go in narrower to wider direction because widening
>> >uses two optabs wand also because the way insn_data is checked.
>> OK.  Thanks for considering.
>>
>> >>
>> >>I'm guessing Richi's comment about what tree type you're looking at refers
>> >>to this and similar instances.  Doesn't this give you the type of the number
>> >>of iterations rather than the type of the iteration variable itself?
>> >>
>> >>
>> >
>> >Since I build vector IV by myself and use to compare with NITERS I
>> >feel it's safe to
>> >use type of NITERS.  Do you expect NITERS and IV types differ?
>> Since you're comparing to NITERS, it sounds like you've got it right and
>> that Richi and I have it wrong.
>>
>> It's less a question of whether or not we expect NITERS and IV to have
>> different types, but more a realization that there's nothing that inherently
>> says they have to be the same.  THey probably are the same most of the time,
>> but I don't think that's something we can or should necessarily depend on.
>>
>>
>>
>> >>>@@ -1791,6 +1870,20 @@ vectorizable_mask_load_store (gimple *stmt,
>> >>>gimple_stmt_iterator *gsi,
>> >>>               && !useless_type_conversion_p (vectype, rhs_vectype)))
>> >>>     return false;
>> >>>
>> >>>+  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
>> >>>+    {
>> >>>+      /* Check that mask conjuction is supported.  */
>> >>>+      optab tab;
>> >>>+      tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
>> >>>+      if (!tab || optab_handler (tab, TYPE_MODE (vectype)) ==
>> >>>CODE_FOR_nothing)
>> >>>+       {
>> >>>+         if (dump_enabled_p ())
>> >>>+           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >>>+                            "cannot be masked: unsupported mask
>> >>>operation\n");
>> >>>+         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>> >>>+       }
>> >>>+    }
>> >>
>> >>Should the optab querying be in optab-query.c?
>> >
>> >We always directly call optab_handler for simple operations.  There are dozens
>> >of such calls in vectorizer.
>> OK.  I would look favorably on a change to move those queries out into
>> optabs-query as a separate patch.
>>
>> >
>> >We don't embed masking capabilities into vectorizer.
>> >
>> >Actually we don't depend on masking capabilities so much.  We have to mask
>> >loads and stores and use can_mask_load_store for that which uses existing optab
>> >query.  We also require masking for reductions and use VEC_COND for that
>> >(and use existing expand_vec_cond_expr_p).  Other checks are to check if we
>> >can build required masks.  So we actually don't expose any new processor
>> >masking capabilities to GIMPLE.  I.e. all this works on targets with no
>> >rich masking capabilities.  E.g. we can mask loops for quite old SSE targets.
>> OK.  I think the key here is that load/store masking already exists and the
>> others are either VEC_COND or checking if we can build the mask rather than
>> can the operation be masked.  THanks for clarifying.
>> jeff
>
> Here is an updated version with less typos and more comments.
>
> Thanks,
> Ilya
> --
> gcc/
>
> 2016-05-23  Ilya Enkovich  <ilya.enkovich@intel.com>
>
>         * tree-vect-loop.c: Include insn-config.h and recog.h.
>         (vect_check_required_masks_widening): New.
>         (vect_check_required_masks_narrowing): New.
>         (vect_get_masking_iv_elems): New.
>         (vect_get_masking_iv_type): New.
>         (vect_get_extreme_masks): New.
>         (vect_check_required_masks): New.
>         (vect_analyze_loop_operations): Add vect_check_required_masks
>         call to compute LOOP_VINFO_CAN_BE_MASKED.
>         (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
>         LOOP_VINFO_NEED_MASKING before starting over.
>         (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
>         masking cost.
>         * tree-vect-stmts.c (can_mask_load_store): New.
>         (vect_model_load_masking_cost): New.
>         (vect_model_store_masking_cost): New.
>         (vect_model_simple_masking_cost): New.
>         (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
>         and masking cost.
>         (vectorizable_simd_clone_call): Likewise.
>         (vectorizable_store): Likewise.
>         (vectorizable_load): Likewise.
>         (vect_stmt_should_be_masked_for_epilogue): New.
>         (vect_add_required_mask_for_stmt): New.
>         (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
>         * tree-vectorizer.h (vect_model_load_masking_cost): New.
>         (vect_model_store_masking_cost): New.
>         (vect_model_simple_masking_cost): New.
>
>
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index c75d234..3b50168 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-pass.h"
>  #include "ssa.h"
>  #include "optabs-tree.h"
> +#include "insn-config.h"
> +#include "recog.h"             /* FIXME: for insn_data */
>  #include "diagnostic-core.h"
>  #include "fold-const.h"
>  #include "stor-layout.h"
> @@ -1603,6 +1605,270 @@ vect_update_vf_for_slp (loop_vec_info loop_vinfo)
>                      vectorization_factor);
>  }
>
> +/* Function vect_check_required_masks_widening.
> +
> +   Return 1 if vector mask of type MASK_TYPE can be widened
> +   to a type having REQ_ELEMS elements in a single vector.  */
> +
> +static bool
> +vect_check_required_masks_widening (loop_vec_info loop_vinfo,
> +                                   tree mask_type, unsigned req_elems)
> +{
> +  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
> +
> +  gcc_assert (mask_elems > req_elems);
> +
> +  /* Don't convert if it requires too many intermediate steps.  */
> +  int steps = exact_log2 (mask_elems / req_elems);
> +  if (steps > MAX_INTERM_CVT_STEPS + 1)
> +    return false;
> +
> +  /* Check we have conversion support for given mask mode.  */
> +  machine_mode mode = TYPE_MODE (mask_type);
> +  insn_code icode = optab_handler (vec_unpacks_lo_optab, mode);
> +  if (icode == CODE_FOR_nothing
> +      || optab_handler (vec_unpacks_hi_optab, mode) == CODE_FOR_nothing)
> +    return false;
> +
> +  /* Make recursive call for multi-step conversion.  */
> +  if (steps > 1)
> +    {
> +      mask_elems = mask_elems >> 1;
> +      mask_type = build_truth_vector_type (mask_elems, current_vector_size);
> +      if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
> +       return false;
> +
> +      if (!vect_check_required_masks_widening (loop_vinfo, mask_type,
> +                                              req_elems))
> +       return false;
> +    }
> +  else
> +    {
> +      mask_type = build_truth_vector_type (req_elems, current_vector_size);
> +      if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
> +       return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Function vect_check_required_masks_narrowing.
> +
> +   Return 1 if vector mask of type MASK_TYPE can be narrowed
> +   to a type having REQ_ELEMS elements in a single vector.  */
> +
> +static bool
> +vect_check_required_masks_narrowing (loop_vec_info loop_vinfo,
> +                                    tree mask_type, unsigned req_elems)
> +{
> +  unsigned mask_elems = TYPE_VECTOR_SUBPARTS (mask_type);
> +
> +  gcc_assert (req_elems > mask_elems);
> +
> +  /* Don't convert if it requires too many intermediate steps.  */
> +  int steps = exact_log2 (req_elems / mask_elems);
> +  if (steps > MAX_INTERM_CVT_STEPS + 1)
> +    return false;
> +
> +  /* Check we have conversion support for given mask mode.  */
> +  machine_mode mode = TYPE_MODE (mask_type);
> +  insn_code icode = optab_handler (vec_pack_trunc_optab, mode);
> +  if (icode == CODE_FOR_nothing)
> +    return false;
> +
> +  /* Make recursive call for multi-step conversion.  */
> +  if (steps > 1)
> +    {
> +      mask_elems = mask_elems << 1;
> +      mask_type = build_truth_vector_type (mask_elems, current_vector_size);
> +      if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
> +       return false;
> +
> +      if (!vect_check_required_masks_narrowing (loop_vinfo, mask_type,
> +                                               req_elems))
> +       return false;
> +    }
> +  else
> +    {
> +      mask_type = build_truth_vector_type (req_elems, current_vector_size);
> +      if (TYPE_MODE (mask_type) != insn_data[icode].operand[0].mode)
> +       return false;
> +    }
> +
> +  return true;
> +}
> +
> +/* Function vect_get_masking_iv_elems.
> +
> +   Return a number of elements in IV used for loop masking.  */
> +static int
> +vect_get_masking_iv_elems (loop_vec_info loop_vinfo)
> +{
> +  tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
> +  tree iv_vectype = get_vectype_for_scalar_type (iv_type);
> +
> +  /* We extend IV type in case it is not big enough to
> +     fill full vector.  */
> +  return MIN ((int)TYPE_VECTOR_SUBPARTS (iv_vectype),
> +             LOOP_VINFO_VECT_FACTOR (loop_vinfo));
> +}
> +
> +/* Function vect_get_masking_iv_type.
> +
> +   Return a type of IV used for loop masking.  */
> +static tree
> +vect_get_masking_iv_type (loop_vec_info loop_vinfo)
> +{
> +  /* Masking IV is to be compared to vector of NITERS and therefore
> +     type of NITERS is used as a basic type for IV.
> +     FIXME: It can be improved by using smaller size when possible
> +     for more efficient masks computation.  */
> +  tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
> +  tree iv_vectype = get_vectype_for_scalar_type (iv_type);
> +  unsigned vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +
> +  if (TYPE_VECTOR_SUBPARTS (iv_vectype) <= vf)
> +    return iv_vectype;
> +
> +  unsigned elem_size = current_vector_size * BITS_PER_UNIT / vf;
> +  iv_type = build_nonstandard_integer_type (elem_size, TYPE_UNSIGNED (iv_type));
> +
> +  return get_vectype_for_scalar_type (iv_type);
> +}
> +
> +/* Function vect_get_extreme_masks.
> +
> +   Determine minimum and maximum number of elements in masks
> +   required for masking a loop described by LOOP_VINFO.
> +   Computed values are returned in MIN_MASK_ELEMS and
> +   MAX_MASK_ELEMS. */
> +
> +static void
> +vect_get_extreme_masks (loop_vec_info loop_vinfo,
> +                       unsigned *min_mask_elems,
> +                       unsigned *max_mask_elems)
> +{
> +  unsigned required_masks = LOOP_VINFO_REQUIRED_MASKS (loop_vinfo);
> +  unsigned elems = 1;
> +
> +  *min_mask_elems = *max_mask_elems = vect_get_masking_iv_elems (loop_vinfo);
> +
> +  while (required_masks)
> +    {
> +      if (required_masks & 1)
> +       {
> +         if (elems < *min_mask_elems)
> +           *min_mask_elems = elems;
> +         if (elems > *max_mask_elems)
> +           *max_mask_elems = elems;
> +       }
> +      elems = elems << 1;
> +      required_masks = required_masks >> 1;
> +    }
> +}
> +
> +/* Function vect_check_required_masks.
> +
> +   For given LOOP_VINFO check all required masks can be computed
> +   and add computation cost into loop cost data.  */
> +
> +static void
> +vect_check_required_masks (loop_vec_info loop_vinfo)
> +{
> +  if (!LOOP_VINFO_REQUIRED_MASKS (loop_vinfo))
> +    return;
> +
> +  /* Firstly check we have a proper comparison to get
> +     an initial mask.  */
> +  tree iv_vectype = vect_get_masking_iv_type (loop_vinfo);
> +  unsigned iv_elems = TYPE_VECTOR_SUBPARTS (iv_vectype);
> +
> +  tree mask_type = build_same_sized_truth_vector_type (iv_vectype);
> +
> +  if (!expand_vec_cmp_expr_p (iv_vectype, mask_type))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "cannot be masked: required vector comparison "
> +                        "is not supported.\n");
> +      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +      return;
> +    }
> +
> +  int cmp_copies  = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / iv_elems;
> +  /* Add cost of initial iv values creation.  */
> +  add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), cmp_copies,
> +                scalar_to_vec, NULL, 0, vect_masking_prologue);
> +  /* Add cost of upper bound and step values creation.  It is the same
> +     for all copies.  */
> +  add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), 2,
> +                scalar_to_vec, NULL, 0, vect_masking_prologue);
> +  /* Add cost of vector comparisons.  */
> +  add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), cmp_copies,
> +                vector_stmt, NULL, 0, vect_masking_body);
> +  /* Add cost of iv increment.  */
> +  add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo), cmp_copies,
> +                vector_stmt, NULL, 0, vect_masking_body);
> +
> +
> +  /* Now check the widest and the narrowest masks.
> +     All intermediate values are obtained while
> +     computing extreme values.  */
> +  unsigned min_mask_elems = 0;
> +  unsigned max_mask_elems = 0;
> +
> +  vect_get_extreme_masks (loop_vinfo, &min_mask_elems, &max_mask_elems);
> +
> +  if (min_mask_elems < iv_elems)
> +    {
> +      /* Check mask widening is available.  */
> +      if (!vect_check_required_masks_widening (loop_vinfo, mask_type,
> +                                              min_mask_elems))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "cannot be masked: required mask widening "
> +                            "is not supported.\n");
> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +         return;
> +       }
> +
> +      /* Add widening cost.  We have totally (2^N - 1) vectors
> +        we need to widen per each original vector, where N is
> +        a number of conversion steps.  Each widening requires
> +        two extracts.  */
> +      int steps = exact_log2 (iv_elems / min_mask_elems);
> +      int conversions = cmp_copies * 2 * ((1 << steps) - 1);
> +      add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> +                    conversions, vec_promote_demote,
> +                    NULL, 0, vect_masking_body);
> +    }
> +
> +  if (max_mask_elems > iv_elems)
> +    {
> +      if (!vect_check_required_masks_narrowing (loop_vinfo, mask_type,
> +                                               max_mask_elems))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "cannot be masked: required mask narrowing "
> +                            "is not supported.\n");
> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +         return;
> +       }
> +
> +      /* Add narrowing cost.  We have totally (2^N - 1) vector
> +        narrowings per each resulting vector, where N is
> +        a number of conversion steps.  */
> +      int steps = exact_log2 (max_mask_elems / iv_elems);
> +      int results = cmp_copies * iv_elems / max_mask_elems;
> +      int conversions = results * ((1 << steps) - 1);
> +      add_stmt_cost (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo),
> +                    conversions, vec_promote_demote,
> +                    NULL, 0, vect_masking_body);
> +    }
> +}
> +
>  /* Function vect_analyze_loop_operations.
>
>     Scan the loop stmts and make sure they are all vectorizable.  */
> @@ -1761,6 +2027,12 @@ vect_analyze_loop_operations (loop_vec_info loop_vinfo)
>        return false;
>      }
>
> +  /* If all statements can be masked then we also need
> +     to check we may compute required masks and compute
> +     its cost.  */
> +  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +    vect_check_required_masks (loop_vinfo);
> +
>    return true;
>  }
>
> @@ -2236,6 +2508,8 @@ again:
>    LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
>    LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = false;
>    LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = 0;
> +  LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = true;
> +  LOOP_VINFO_NEED_MASKING (loop_vinfo) = false;
>
>    goto start_over;
>  }
> @@ -5428,6 +5702,7 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
>        outer_loop = loop;
>        loop = loop->inner;
>        nested_cycle = true;
> +      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>      }
>
>    /* 1. Is vectorizable reduction?  */
> @@ -5627,6 +5902,18 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
>
>    gcc_assert (ncopies >= 1);
>
> +  if (slp_node || PURE_SLP_STMT (stmt_info) || code == COND_EXPR
> +      || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info) == COND_REDUCTION
> +      || STMT_VINFO_VEC_REDUCTION_TYPE (stmt_info)
> +        == INTEGER_INDUC_COND_REDUCTION)
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "cannot be masked: unsupported conditional "
> +                        "reduction\n");
> +      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +    }
> +
>    vec_mode = TYPE_MODE (vectype_in);
>
>    if (code == COND_EXPR)
> @@ -5904,6 +6191,19 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
>           return false;
>         }
>      }
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +    {
> +      /* Check that masking of reduction is supported.  */
> +      tree mask_vtype = build_same_sized_truth_vector_type (vectype_out);
> +      if (!expand_vec_cond_expr_p (vectype_out, mask_vtype))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "cannot be masked: required vector conditional "
> +                            "expression is not supported.\n");
> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +       }
> +    }
>
>    if (!vec_stmt) /* transformation not required.  */
>      {
> @@ -5912,6 +6212,10 @@ vectorizable_reduction (gimple *stmt, gimple_stmt_iterator *gsi,
>                                          reduc_index))
>          return false;
>        STMT_VINFO_TYPE (stmt_info) = reduc_vec_info_type;
> +
> +      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +       vect_model_simple_masking_cost (stmt_info, ncopies);
> +
>        return true;
>      }
>
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index d2e16d0..b42e133 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-vectorizer.h"
>  #include "builtins.h"
>  #include "internal-fn.h"
> +#include "tree-ssa-loop-ivopts.h"
>
>  /* For lang_hooks.types.type_for_mode.  */
>  #include "langhooks.h"
> @@ -535,6 +536,38 @@ process_use (gimple *stmt, tree use, loop_vec_info loop_vinfo, bool live_p,
>    return true;
>  }
>
> +/* Return true if STMT can be converted to masked form.  */
> +
> +static bool
> +can_mask_load_store (gimple *stmt)
> +{
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +  tree vectype, mask_vectype;
> +  tree lhs, ref;
> +
> +  if (!stmt_info)
> +    return false;
> +  lhs = gimple_assign_lhs (stmt);
> +  ref = (TREE_CODE (lhs) == SSA_NAME) ? gimple_assign_rhs1 (stmt) : lhs;
> +  if (may_be_nonaddressable_p (ref))
> +    return false;
> +  vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  mask_vectype = build_same_sized_truth_vector_type (vectype);
> +  if (!can_vec_mask_load_store_p (TYPE_MODE (vectype),
> +                                 TYPE_MODE (mask_vectype),
> +                                 gimple_assign_load_p (stmt)))
> +    {
> +      if (dump_enabled_p ())
> +       {
> +         dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                          "Statement can't be masked.\n");
> +         dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
> +       }
> +
> +       return false;
> +    }
> +  return true;
> +}
>
>  /* Function vect_mark_stmts_to_be_vectorized.
>
> @@ -1193,6 +1226,56 @@ vect_get_load_cost (struct data_reference *dr, int ncopies,
>      }
>  }
>
> +/* Function vect_model_load_masking_cost.
> +
> +   Models cost for memory load masking.  */
> +
> +void
> +vect_model_load_masking_cost (stmt_vec_info stmt_info, int ncopies)
> +{
> +  /* MASK_LOAD case.  */
> +  if (gimple_code (stmt_info->stmt) == GIMPLE_CALL)
> +    add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
> +                          ncopies, vector_mask_load, stmt_info, false,
> +                          vect_masking_body);
> +  /* Other loads.  */
> +  else
> +    add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
> +                          ncopies, vector_load, stmt_info, false,
> +                          vect_masking_body);
> +}
> +
> +/* Function vect_model_store_masking_cost.
> +
> +   Models cost for memory store masking.  */
> +
> +void
> +vect_model_store_masking_cost (stmt_vec_info stmt_info, int ncopies)
> +{
> +  /* MASK_STORE case.  */
> +  if (gimple_code (stmt_info->stmt) == GIMPLE_CALL)
> +    add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
> +                          ncopies, vector_mask_store, stmt_info, false,
> +                          vect_masking_body);
> +  /* Other stores.  */
> +  else
> +    add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
> +                          ncopies, vector_store, stmt_info, false,
> +                          vect_masking_body);
> +}
> +
> +/* Function vect_model_simple_masking_cost.
> +
> +   Models cost for statement masking.  Return estimated cost.  */
> +
> +void
> +vect_model_simple_masking_cost (stmt_vec_info stmt_info, int ncopies)
> +{
> +  add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
> +                        ncopies, vector_stmt, stmt_info, false,
> +                        vect_masking_body);
> +}
> +
>  /* Insert the new stmt NEW_STMT at *GSI or at the appropriate place in
>     the loop preheader for the vectorized stmt STMT.  */
>
> @@ -1798,6 +1881,20 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi,
>                && !useless_type_conversion_p (vectype, rhs_vectype)))
>      return false;
>
> +  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +    {
> +      /* Check that mask conjuction is supported.  */
> +      optab tab;
> +      tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
> +      if (!tab || optab_handler (tab, TYPE_MODE (vectype)) == CODE_FOR_nothing)
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "cannot be masked: unsupported mask operation\n");
> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +       }
> +    }
> +
>    if (!vec_stmt) /* transformation not required.  */
>      {
>        STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
> @@ -1806,6 +1903,15 @@ vectorizable_mask_load_store (gimple *stmt, gimple_stmt_iterator *gsi,
>                                NULL, NULL, NULL);
>        else
>         vect_model_load_cost (stmt_info, ncopies, false, NULL, NULL, NULL);
> +
> +      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +       {
> +         if (is_store)
> +           vect_model_store_masking_cost (stmt_info, ncopies);
> +         else
> +           vect_model_load_masking_cost (stmt_info, ncopies);
> +       }
> +
>        return true;
>      }
>
> @@ -2802,6 +2908,18 @@ vectorizable_simd_clone_call (gimple *stmt, gimple_stmt_iterator *gsi,
>    if (slp_node)
>      return false;
>
> +  /* Masked clones are not yet supported.  But we allow
> +     calls which may be just called with no mask.  */
> +  if (!(gimple_call_flags (stmt) & ECF_CONST)
> +      || (gimple_call_flags (stmt) & ECF_LOOPING_CONST_OR_PURE))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "cannot be masked: non-const call "
> +                        "(masked calls are not supported)\n");
> +      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +    }
> +
>    /* Process function arguments.  */
>    nargs = gimple_call_num_args (stmt);
>
> @@ -5340,6 +5458,14 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>                                  "negative step and reversing not supported.\n");
>               return false;
>             }
> +         if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +           {
> +             LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                                "cannot be masked: negative step"
> +                                " is not supported.");
> +           }
>         }
>      }
>
> @@ -5348,6 +5474,16 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>        grouped_store = true;
>        first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
>        group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
> +
> +      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "cannot be masked: grouped access"
> +                            " is not supported." );
> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +      }
> +
>        if (!slp && !STMT_VINFO_STRIDED_P (stmt_info))
>         {
>           if (vect_store_lanes_supported (vectype, group_size))
> @@ -5401,6 +5537,44 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>                               "scatter index use not simple.");
>           return false;
>         }
> +      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "cannot be masked: gather/scatter is"
> +                            " not supported.");
> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +       }
> +    }
> +
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +      && STMT_VINFO_STRIDED_P (stmt_info))
> +    {
> +      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "cannot be masked: strided store is not"
> +                        " supported.\n");
> +    }
> +
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +      && integer_zerop (nested_in_vect_loop_p (loop, stmt)
> +                       ? STMT_VINFO_DR_STEP (stmt_info)
> +                       : DR_STEP (dr)))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "cannot be masked: invariant store.\n");
> +      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +    }
> +
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +      && !can_mask_load_store (stmt))
> +    {
> +      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "cannot be masked: unsupported mask store.\n");
>      }
>
>    if (!vec_stmt) /* transformation not required.  */
> @@ -5410,6 +5584,9 @@ vectorizable_store (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>        if (!PURE_SLP_STMT (stmt_info))
>         vect_model_store_cost (stmt_info, ncopies, store_lanes_p, dt,
>                                NULL, NULL, NULL);
> +      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +       vect_model_store_masking_cost (stmt_info, ncopies);
> +
>        return true;
>      }
>
> @@ -6315,6 +6492,15 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>        grouped_load = true;
>        /* FORNOW */
>        gcc_assert (!nested_in_vect_loop && !STMT_VINFO_GATHER_SCATTER_P (stmt_info));
> +      /* Not yet supported.  */
> +      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "cannot be masked: grouped access is not"
> +                            " supported.");
> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +      }
>
>        first_stmt = GROUP_FIRST_ELEMENT (stmt_info);
>        group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
> @@ -6368,6 +6554,7 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>             }
>
>           LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) = true;
> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>         }
>
>        if (slp && SLP_TREE_LOAD_PERMUTATION (slp_node).exists ())
> @@ -6421,6 +6608,16 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>        gather_decl = vect_check_gather_scatter (stmt, loop_vinfo, &gather_base,
>                                                &gather_off, &gather_scale);
>        gcc_assert (gather_decl);
> +      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +       {
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                           "cannot be masked: gather/scatter is not"
> +                           " supported.\n");
> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +       }
> +
> +
>        if (!vect_is_simple_use (gather_off, vinfo, &def_stmt, &gather_dt,
>                                &gather_off_vectype))
>         {
> @@ -6432,6 +6629,15 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>      }
>    else if (STMT_VINFO_STRIDED_P (stmt_info))
>      {
> +      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +       {
> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +         if (dump_enabled_p ())
> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                            "cannot be masked: strided load is not"
> +                            " supported.\n");
> +       }
> +
>        if (grouped_load
>           && slp
>           && (group_size > nunits
> @@ -6483,9 +6689,35 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>                                   "\n");
>               return false;
>             }
> +         if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +           {
> +             if (dump_enabled_p ())
> +               dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                               "cannot be masked: negative step "
> +                                "for masking.\n");
> +             LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +           }
>         }
>      }
>
> +  if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +      && integer_zerop (nested_in_vect_loop
> +                       ? STMT_VINFO_DR_STEP (stmt_info)
> +                       : DR_STEP (dr)))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_NOTE, vect_location,
> +                        "allow invariant load for masked loop.\n");
> +    }
> +  else if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +          && !can_mask_load_store (stmt))
> +    {
> +      if (dump_enabled_p ())
> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +                        "cannot be masked: unsupported masked load.\n");
> +      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +    }
> +
>    if (!vec_stmt) /* transformation not required.  */
>      {
>        STMT_VINFO_TYPE (stmt_info) = load_vec_info_type;
> @@ -6493,6 +6725,9 @@ vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>        if (!PURE_SLP_STMT (stmt_info))
>         vect_model_load_cost (stmt_info, ncopies, load_lanes_p,
>                               NULL, NULL, NULL);
> +      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
> +       vect_model_load_masking_cost (stmt_info, ncopies);
> +
>        return true;
>      }
>
> @@ -7889,6 +8124,43 @@ vectorizable_comparison (gimple *stmt, gimple_stmt_iterator *gsi,
>    return true;
>  }
>
> +/* Return true if vector version of STMT should be masked
> +   in a vectorized loop epilogue (considering usage of the
> +   same VF as for main loop).  */
> +
> +static bool
> +vect_stmt_should_be_masked_for_epilogue (gimple *stmt)
> +{
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +
> +  /* We should mask all statements accessing memory.  */
> +  if (STMT_VINFO_DATA_REF (stmt_info))
> +    return true;
> +
> +  /* We should also mask all recursions.  */
> +  if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def
> +      || STMT_VINFO_DEF_TYPE (stmt_info) == vect_double_reduction_def)
> +    return true;
> +
> +  return false;
> +}
> +
> +/* Add a mask required to mask STMT to LOOP_VINFO_REQUIRED_MASKS.  */
> +
> +static void
> +vect_add_required_mask_for_stmt (gimple *stmt)
> +{
> +  stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> +  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  unsigned HOST_WIDE_INT nelems = TYPE_VECTOR_SUBPARTS (vectype);
> +  int bit_no = exact_log2 (nelems);
> +
> +  gcc_assert (bit_no >= 0);
> +
> +  LOOP_VINFO_REQUIRED_MASKS (loop_vinfo) |= (1 << bit_no);
> +}
> +
>  /* Make sure the statement is vectorizable.  */
>
>  bool
> @@ -7896,6 +8168,7 @@ vect_analyze_stmt (gimple *stmt, bool *need_to_vectorize, slp_tree node)
>  {
>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>    bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>    enum vect_relevant relevance = STMT_VINFO_RELEVANT (stmt_info);
>    bool ok;
>    tree scalar_type, vectype;
> @@ -8062,6 +8335,10 @@ vect_analyze_stmt (gimple *stmt, bool *need_to_vectorize, slp_tree node)
>        STMT_VINFO_VECTYPE (stmt_info) = vectype;
>     }
>
> +  /* Masking is not supported for SLP yet.  */
> +  if (loop_vinfo && node)
> +    LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
> +
>    if (STMT_VINFO_RELEVANT_P (stmt_info))
>      {
>        gcc_assert (!VECTOR_MODE_P (TYPE_MODE (gimple_expr_type (stmt))));
> @@ -8121,6 +8398,11 @@ vect_analyze_stmt (gimple *stmt, bool *need_to_vectorize, slp_tree node)
>        return false;
>      }
>
> +  if (loop_vinfo
> +      && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
> +      && vect_stmt_should_be_masked_for_epilogue (stmt))
> +    vect_add_required_mask_for_stmt (stmt);
> +
>    if (bb_vinfo)
>      return true;
>
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 8a61690..4d13c41 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -1033,6 +1033,9 @@ extern void vect_model_store_cost (stmt_vec_info, int, bool,
>  extern void vect_model_load_cost (stmt_vec_info, int, bool, slp_tree,
>                                   stmt_vector_for_cost *,
>                                   stmt_vector_for_cost *);
> +extern void vect_model_load_masking_cost (stmt_vec_info, int);
> +extern void vect_model_store_masking_cost (stmt_vec_info, int);
> +extern void vect_model_simple_masking_cost (stmt_vec_info, int);
>  extern unsigned record_stmt_cost (stmt_vector_for_cost *, int,
>                                   enum vect_cost_for_stmt, stmt_vec_info,
>                                   int, enum vect_cost_model_location);