From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-440837-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 27898 invoked by alias); 9 Nov 2016 12:12:06 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 27882 invoked by uid 89); 9 Nov 2016 12:12:06 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=feeling, essential, ysrumyangmailcom, ysrumyan@gmail.com
X-HELO: mail-ua0-f196.google.com
Received: from mail-ua0-f196.google.com (HELO mail-ua0-f196.google.com) (209.85.217.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 09 Nov 2016 12:12:04 +0000
Received: by mail-ua0-f196.google.com with SMTP id 50so9222560uae.2        for <gcc-patches@gcc.gnu.org>; Wed, 09 Nov 2016 04:12:04 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20130820;        h=x-gm-message-state:mime-version:in-reply-to:references:from:date         :message-id:subject:to:cc;        bh=ayPAcC7zZyBQ++ZA3Lbi1EZFmhfDgEtVe7CnHXKlx1Y=;        b=jpWR56MT+B9ViUhZ6WIzWoMLnOzOi6nBqL/yAAIJTpkMhHR+s99uJ1L6IzZFi8q7sp         LBDc4hGsYR+xLrJA8Topa68ynDM+hNK4WqudkgtNMNo0dGX6gE26EjRaOghlpcIz7zOR         LWEAHOwAbMpdNTlzR/S9w7ytJqPdBEIRrgUB2MDu8BSBINgAA0JMc2hI1awU8oqkrV3A         Yu1sEJ0eHes16PH4UcyrWeKdhmulx7tNYn+6nRvXmHUMIZuwUOM4MJPDUcsEp2SA66Dm         nx730JOYakUMnlt/vBnS2adKlTZVgO61RoGvtWpDrb0OWtgFFtsa5Phg18jskRsMF8cr         GqMA==
X-Gm-Message-State: ABUngvcQQM5aB41+wlOIYX4XE7yO21x8UkE294iyAbSZAKDLD7rKoaPMiLPcaDhWyJ/sT3roL3ZCoMXXQGqUxw==
X-Received: by 10.159.34.208 with SMTP id 74mr9674340uan.99.1478693522717; Wed, 09 Nov 2016 04:12:02 -0800 (PST)
MIME-Version: 1.0
Received: by 10.176.1.17 with HTTP; Wed, 9 Nov 2016 04:12:02 -0800 (PST)
In-Reply-To: <CAHFci2_PA2umv2fB8kM5eMiSUGG=ci+2crzJwtqaVJnR-gM34Q@mail.gmail.com>
References: <CAEoMCqQa3Ebjq3K38dZ+PMyDUPBYL3gF8vyJLuq4ev04DUsjeA@mail.gmail.com> <CAHFci29N2375ucqhC-q1Srwn1Y5uBP9mCVJDpMVPWaw3U4tskQ@mail.gmail.com> <CAEoMCqQdzimw2+kx16hDCfNmpUj7pqu_59Ce84gY0nq6ohyAvQ@mail.gmail.com> <CAHFci2_PA2umv2fB8kM5eMiSUGG=ci+2crzJwtqaVJnR-gM34Q@mail.gmail.com>
From: Yuri Rumyantsev <ysrumyan@gmail.com>
Date: Wed, 09 Nov 2016 12:12:00 -0000
Message-ID: <CAEoMCqRjhmEOrdqdfC1sk4q04+T=qT7xAraDfjEB9MJhJYBAPg@mail.gmail.com>
Subject: Re: [PATCH, vec-tails] Support loop epilogue vectorization
To: "Bin.Cheng" <amker.cheng@gmail.com>
Cc: Richard Biener <rguenther@suse.de>, Jeff Law <law@redhat.com>, 	gcc-patches <gcc-patches@gcc.gnu.org>, Ilya Enkovich <enkovich.gnu@gmail.com>
Content-Type: text/plain; charset=UTF-8
X-SW-Source: 2016-11/txt/msg00823.txt.bz2

I am familiar with SVE extension and understand that implemented
approach might be not suitable for ARM. The main point is that only
load/store instructions are masked but all other calculations are not
(we did special conversion for reduction statements to implement
merging predication semantic). For SVE peeling for niters is not
required but it is not true for x86 -  we must determine what
vectorization scheme is more profitable: loop combining (the only
essential for SVE) or separate epilogue vectorization using masking or
less vectorization factor. So I'd like to have the full list of
required changes to our implementation to try to remove them.

Thanks.
Yuri.

2016-11-09 14:46 GMT+03:00 Bin.Cheng <amker.cheng@gmail.com>:
> On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Thanks Richard for your comments.
>> Your proposed to handle epilogue loop just like normal short-trip loop
>> but this is not exactly truth since e.g. epilogue must not be peeled
>> for alignment.
> Yes there must be some differences, my motivation is to minimize that
> so we don't need to specially check normal/epilogue loops at too many
> places.
> Of course it's just my feeling when going through the patch set, and
> could be wrong.
>
> Thanks,
> bin
>>
>> It is not clear for me what are my next steps? Should I re-design the
>> patch completely or simply decompose the whole patch to different
>> parts? But it means that we must start review process from beginning
>> but release is closed to its end.
>> Note also that i work for Intel till the end of year and have not idea
>> who will continue working on this project.
>>
>> Any help will be appreciated.
>>
>> Thanks.
>> Yuri.
>>
>> 2016-11-09 13:37 GMT+03:00 Bin.Cheng <amker.cheng@gmail.com>:
>>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> I re-send all patches sent by Ilya earlier for review which support
>>>> vectorization of loop epilogues and loops with low trip count. We
>>>> assume that the only patch - vec-tails-07-combine-tail.patch - was not
>>>> approved by Jeff.
>>>>
>>>> I did re-base of all patches and performed bootstrapping and
>>>> regression testing that did not show any new failures. Also all
>>>> changes related to new vect_do_peeling algorithm have been changed
>>>> accordingly.
>>>>
>>>> Is it OK for trunk?
>>>
>>> Hi,
>>> I can't approve patches, but had some comments after going through the
>>> implementation.
>>>
>>> One confusing part is cost model change, as well as the way it's used
>>> to decide how epilogue loop should be vectorized.  Given vect-tail is
>>> disabled at the moment and the cost change needs further tuning, is it
>>> reasonable to split this part out and get vectorization part
>>> reviewed/submitted independently?  For example, let user specified
>>> parameters make the decision for now.  Cost and target dependent
>>> changes should go in at last, this could make the patch easier to
>>> read.
>>>
>>> The implementation computes/shares quite amount information from main
>>> loop to epilogue loop vectorization.  Furthermore, variables/fields
>>> for such information are somehow named in a misleading way.  For
>>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the
>>> flag controlling whether epilogue loop should be vectorized with
>>> masking.  However, it's actually controlled by exactly the same flag
>>> as whether epilogue loop should be combined into the main loop with
>>> masking:
>>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo)
>>>
>>>    slpeel_make_loop_iterate_ntimes (loop, niters_vector);
>>>
>>> +  if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo))
>>> +    vect_combine_loop_epilogue (loop_vinfo);
>>> +
>>>    /* Reduce loop iterations by the vectorization factor.  */
>>>    scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf),
>>>                expected_iterations / vf);
>>>
>>> IMHO, we should decouple main loop vectorization and epilogue
>>> vectorization as much as possible by sharing as few information as we
>>> can.  The general idea is to handle epilogue loop just like normal
>>> short-trip loop.  For example, we can rename
>>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something
>>> else), and we don't differentiate its meaning between main and
>>> epilogue(short-trip) loop.  It only indicates the current loop should
>>> be vectorized with masking no matter it's a main loop or epilogue
>>> loop, and it works just like the current implementation.
>>>
>>> After this change, we can refine vectorization and make it more
>>> general for normal loop and epilogue(short trip) loop.  For example,
>>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER  for epilogue
>>> loop and use it to control how it should be vectorized:
>>> +  if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo))
>>> +    {
>>> +      LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false;
>>> +      LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false;
>>> +    }
>>> +  else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo)
>>> +       && min_profitable_combine_iters >= 0)
>>> +    {
>>>
>>> This works, but not that good for understanding or maintaining.
>>>
>>> Thanks,
>>> bin