From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-430343-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 93141 invoked by alias); 22 Jun 2016 16:09:25 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 92001 invoked by uid 89); 22 Jun 2016 16:09:24 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=ilya.enkovich@intel.com, ilyaenkovichintelcom, grouped, acces
X-HELO: mail-vk0-f54.google.com
Received: from mail-vk0-f54.google.com (HELO mail-vk0-f54.google.com) (209.85.213.54) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Wed, 22 Jun 2016 16:09:14 +0000
Received: by mail-vk0-f54.google.com with SMTP id u64so68455061vkf.3        for <gcc-patches@gcc.gnu.org>; Wed, 22 Jun 2016 09:09:14 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=Gc/5qv1DZQFelSFGEjE8w9ndmYiG9X8uv4JrZwXU27M=;        b=SVYnGrmKUCCreLlRfQ30VhoObQZep2YO9TfHtWw1AasIx1zeH0etzy0WKtjh3aCR0+         szmjDkAe7LzPx7YtOQuzcnflE9xLbdgIGg8YJ77/8wnOmmWKdHdvZMETdn4ypgZ6fHdG         rmk9BrdWZeIaqOQVXgmdnY5Cq+WuI/0oaTVP9UBgvBeR8XSSosLmnpJHQxp3hlxc2yo9         PPqV/vZHfaJxoK5eNtRjCdHUHbmfNnh5s13cn/QG2IbEjEAxnIta6dPUqVZ844+x6tcM         T6Gqz+BGuuZcvc5lWYed+XrUfGZb/hUuu8eKk0F9npR8Y4t20PtwPBsszEGVQvMnUuEg         kMeQ==
X-Gm-Message-State: ALyK8tKPSQuWz5aPPec8O8XMdDN20Rm2holpenOYbrqXfIiK7gWGhoZzb6+zhpTaRdYxOp7rFdU/br0MGFmiaQ==
X-Received: by 10.176.3.133 with SMTP id 5mr12815085uau.93.1466611751119; Wed, 22 Jun 2016 09:09:11 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.176.5.201 with HTTP; Wed, 22 Jun 2016 09:09:10 -0700 (PDT)
In-Reply-To: <8c811442-df35-986a-d02d-b9c2669876d2@redhat.com>
References: <20160519194208.GF40563@msticlxl57.ims.intel.com> <8c811442-df35-986a-d02d-b9c2669876d2@redhat.com>
From: Ilya Enkovich <enkovich.gnu@gmail.com>
Date: Wed, 22 Jun 2016 16:09:00 -0000
Message-ID: <CAMbmDYa9vvFq9eKSfWvhWXNyk3s8Or6EXB394fV3yMwWmYbwXQ@mail.gmail.com>
Subject: Re: [PATCH, vec-tails 05/10] Check if loop can be masked
To: Jeff Law <law@redhat.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset=UTF-8
X-IsSubscribed: yes
X-SW-Source: 2016-06/txt/msg01616.txt.bz2

2016-06-16 10:08 GMT+03:00 Jeff Law <law@redhat.com>:
> On 05/19/2016 01:42 PM, Ilya Enkovich wrote:
>>
>> Hi,
>>
>> This patch introduces analysis to determine if loop can be masked
>> (compute LOOP_VINFO_CAN_BE_MASKED and LOOP_VINFO_REQUIRED_MASKS)
>> and compute how much masking costs.
>>
>> Thanks,
>> Ilya
>> --
>> gcc/
>>
>> 2016-05-19  Ilya Enkovich  <ilya.enkovich@intel.com>
>>
>>         * tree-vect-loop.c: Include insn-config.h and recog.h.
>>         (vect_check_required_masks_widening): New.
>>         (vect_check_required_masks_narrowing): New.
>>         (vect_get_masking_iv_elems): New.
>>         (vect_get_masking_iv_type): New.
>>         (vect_get_extreme_masks): New.
>>         (vect_check_required_masks): New.
>>         (vect_analyze_loop_operations): Add vect_check_required_masks
>>         call to compute LOOP_VINFO_CAN_BE_MASKED.
>>         (vect_analyze_loop_2): Initialize LOOP_VINFO_CAN_BE_MASKED and
>>         LOOP_VINFO_NEED_MASKING before starting over.
>>         (vectorizable_reduction): Compute LOOP_VINFO_CAN_BE_MASKED and
>>         masking cost.
>>         * tree-vect-stmts.c (can_mask_load_store): New.
>>         (vect_model_load_masking_cost): New.
>>         (vect_model_store_masking_cost): New.
>>         (vect_model_simple_masking_cost): New.
>>         (vectorizable_mask_load_store): Compute LOOP_VINFO_CAN_BE_MASKED
>>         and masking cost.
>>         (vectorizable_simd_clone_call): Likewise.
>>         (vectorizable_store): Likewise.
>>         (vectorizable_load): Likewise.
>>         (vect_stmt_should_be_masked_for_epilogue): New.
>>         (vect_add_required_mask_for_stmt): New.
>>         (vect_analyze_stmt): Compute LOOP_VINFO_CAN_BE_MASKED.
>>         * tree-vectorizer.h (vect_model_load_masking_cost): New.
>>         (vect_model_store_masking_cost): New.
>>         (vect_model_simple_masking_cost): New.
>>
>>
>> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
>> index e25a0ce..31360d3 100644
>> --- a/gcc/tree-vect-loop.c
>> +++ b/gcc/tree-vect-loop.c
>> @@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-pass.h"
>>  #include "ssa.h"
>>  #include "optabs-tree.h"
>> +#include "insn-config.h"
>> +#include "recog.h"             /* FIXME: for insn_data */
>
> Ick :(
>
>
>> +
>> +/* Function vect_check_required_masks_narowing.
>
> narrowing
>
>
>> +
>> +   Return 1 if vector mask of type MASK_TYPE can be narrowed
>> +   to a type having REQ_ELEMS elements in a single vector.  */
>> +
>> +static bool
>> +vect_check_required_masks_narrowing (loop_vec_info loop_vinfo,
>> +                                    tree mask_type, unsigned req_elems)
>
> Given the common structure & duplication I can't help but wonder if a single
> function should be used for widening/narrowing.  Ultimately can't you swap
> mask_elems/req_elems and always go narrower to wider (using a different
> optab for the two different cases)?

I think we can't always go in narrower to wider direction because widening
uses two optabs wand also because the way insn_data is checked.

>
>
>
>
>> +
>> +/* Function vect_get_masking_iv_elems.
>> +
>> +   Return a number of elements in IV used for loop masking.  */
>> +static int
>> +vect_get_masking_iv_elems (loop_vec_info loop_vinfo)
>> +{
>> +  tree iv_type = TREE_TYPE (LOOP_VINFO_NITERS (loop_vinfo));
>
> I'm guessing Richi's comment about what tree type you're looking at refers
> to this and similar instances.  Doesn't this give you the type of the number
> of iterations rather than the type of the iteration variable itself?
>
>

Since I build vector IV by myself and use to compare with NITERS I
feel it's safe to
use type of NITERS.  Do you expect NITERS and IV types differ?

>
>
>  +
>>
>> +  if (!expand_vec_cmp_expr_p (iv_vectype, mask_type))
>> +    {
>> +      if (dump_enabled_p ())
>> +       dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                        "cannot be masked: required vector comparison "
>> +                        "is not supported.\n");
>> +      LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>> +      return;
>> +    }
>
> On a totally unrelated topic, I was speaking with David Malcolm earlier this
> week about how to turn this kind of missed optimization information we
> currently emit into dumps into something more easily consumed by users.
>
> The general issue is that we've got customers that want to understand why
> certain optimizations fire or do not fire.  They're by far more interested
> in the vectorizer than anything else.
>
> We have a sense that much of the information those customers desire is
> sitting in the dump files, but it's buried in there with other stuff that
> isn't generally useful to users.
>
> So we're pondering what it might take to take these glorified fprintf calls
> and turn them into a first class diagnostic that could be emitted to stderr
> or into the dump file depending (of course) on the options passed to GCC.
>
> The reason I bring this up is the hope that your team might have some
> insights based on what ICC has done the in the past for its customers.
>
> Anyway, back to the code...
>
>
>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> index 9ab4af4..91ebe5a 100644
>> --- a/gcc/tree-vect-stmts.c
>> +++ b/gcc/tree-vect-stmts.c
>> @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-vectorizer.h"
>>  #include "builtins.h"
>>  #include "internal-fn.h"
>> +#include "tree-ssa-loop-ivopts.h"
>>
>>  /* For lang_hooks.types.type_for_mode.  */
>>  #include "langhooks.h"
>> @@ -535,6 +536,38 @@ process_use (gimple *stmt, tree use, loop_vec_info
>> loop_vinfo, bool live_p,
>>    return true;
>>  }
>>
>> +/* Return ture if STMT can be converted to masked form.  */
>
> s/ture/true/
>
>
>> @@ -1193,6 +1226,52 @@ vect_get_load_cost (struct data_reference *dr, int
>> ncopies,
>>      }
>>  }
>>
>> +/* Function vect_model_load_masking_cost.
>> +
>> +   Models cost for memory load masking.  */
>> +
>> +void
>> +vect_model_load_masking_cost (stmt_vec_info stmt_info, int ncopies)
>> +{
>> +  if (gimple_code (stmt_info->stmt) == GIMPLE_CALL)
>> +    add_stmt_masking_cost (stmt_info->vinfo->target_cost_data,
>> +                          ncopies, vector_mask_load, stmt_info, false,
>> +                          vect_masking_body);
>
> What GIMPLE_CALLs are going to appear here and in
> vect_model_store_masking_cost?  Seems like there ought to be a comment of
> some kind addressing that question.

This checks for MASK_LOAD case which is internal function call.  I'll add
a comment here.

>
>
>> @@ -1791,6 +1870,20 @@ vectorizable_mask_load_store (gimple *stmt,
>> gimple_stmt_iterator *gsi,
>>                && !useless_type_conversion_p (vectype, rhs_vectype)))
>>      return false;
>>
>> +  if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
>> +    {
>> +      /* Check that mask conjuction is supported.  */
>> +      optab tab;
>> +      tab = optab_for_tree_code (BIT_AND_EXPR, vectype, optab_default);
>> +      if (!tab || optab_handler (tab, TYPE_MODE (vectype)) ==
>> CODE_FOR_nothing)
>> +       {
>> +         if (dump_enabled_p ())
>> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                            "cannot be masked: unsupported mask
>> operation\n");
>> +         LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) = false;
>> +       }
>> +    }
>
> Should the optab querying be in optab-query.c?

We always directly call optab_handler for simple operations.  There are dozens
of such calls in vectorizer.

>
> General question, it looks like we're baking into various places what things
> can or can not be masked.  Isn't that a function of the masking capabilities
> of the processor?  And if so, shouldn't we be querying optabs rather than
> just declaring and open-coding knowledge that certain things can't be
> masked?
>
>
>> @@ -6312,6 +6484,15 @@ vectorizable_load (gimple *stmt,
>> gimple_stmt_iterator *gsi, gimple **vec_stmt,
>>        grouped_load = true;
>>        /* FORNOW */
>>        gcc_assert (!nested_in_vect_loop && !STMT_VINFO_GATHER_SCATTER_P
>> (stmt_info));
>> +      /* Not yet supported.  */
>> +      if (loop_vinfo && LOOP_VINFO_CAN_BE_MASKED (loop_vinfo))
>> +       {
>> +         if (dump_enabled_p ())
>> +           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                            "cannot be masked: grouped acces is not"
>
> s/acces/access/
>
> So it feels a bit like this needs some more work, particularly WRT what
> capabilities can and can not be masked.  If I'm missing something on that,
> definitely speak up, but it feels like we're embedding knowledge of the
> masking capabilities somewhere where it doesn't belong.  I'd also like to
> know more about what GIMPLE_CALLS show up in
> vect_model_{load,store}_masking_cost.

We don't embed masking capabilities into vectorizer.

Actually we don't depend on masking capabilities so much.  We have to mask
loads and stores and use can_mask_load_store for that which uses existing optab
query.  We also require masking for reductions and use VEC_COND for that
(and use existing expand_vec_cond_expr_p).  Other checks are to check if we
can build required masks.  So we actually don't expose any new processor
masking capabilities to GIMPLE.  I.e. all this works on targets with no
rich masking capabilities.  E.g. we can mask loops for quite old SSE targets.

Thanks,
Ilya

>
>
>
> Jeff
>