From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 27898 invoked by alias); 9 Nov 2016 12:12:06 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 27882 invoked by uid 89); 9 Nov 2016 12:12:06 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=feeling, essential, ysrumyangmailcom, ysrumyan@gmail.com X-HELO: mail-ua0-f196.google.com Received: from mail-ua0-f196.google.com (HELO mail-ua0-f196.google.com) (209.85.217.196) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 09 Nov 2016 12:12:04 +0000 Received: by mail-ua0-f196.google.com with SMTP id 50so9222560uae.2 for ; Wed, 09 Nov 2016 04:12:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ayPAcC7zZyBQ++ZA3Lbi1EZFmhfDgEtVe7CnHXKlx1Y=; b=jpWR56MT+B9ViUhZ6WIzWoMLnOzOi6nBqL/yAAIJTpkMhHR+s99uJ1L6IzZFi8q7sp LBDc4hGsYR+xLrJA8Topa68ynDM+hNK4WqudkgtNMNo0dGX6gE26EjRaOghlpcIz7zOR LWEAHOwAbMpdNTlzR/S9w7ytJqPdBEIRrgUB2MDu8BSBINgAA0JMc2hI1awU8oqkrV3A Yu1sEJ0eHes16PH4UcyrWeKdhmulx7tNYn+6nRvXmHUMIZuwUOM4MJPDUcsEp2SA66Dm nx730JOYakUMnlt/vBnS2adKlTZVgO61RoGvtWpDrb0OWtgFFtsa5Phg18jskRsMF8cr GqMA== X-Gm-Message-State: ABUngvcQQM5aB41+wlOIYX4XE7yO21x8UkE294iyAbSZAKDLD7rKoaPMiLPcaDhWyJ/sT3roL3ZCoMXXQGqUxw== X-Received: by 10.159.34.208 with SMTP id 74mr9674340uan.99.1478693522717; Wed, 09 Nov 2016 04:12:02 -0800 (PST) MIME-Version: 1.0 Received: by 10.176.1.17 with HTTP; Wed, 9 Nov 2016 04:12:02 -0800 (PST) In-Reply-To: References: From: Yuri Rumyantsev Date: Wed, 09 Nov 2016 12:12:00 -0000 Message-ID: Subject: Re: [PATCH, vec-tails] Support loop epilogue vectorization To: "Bin.Cheng" Cc: Richard Biener , Jeff Law , gcc-patches , Ilya Enkovich Content-Type: text/plain; charset=UTF-8 X-SW-Source: 2016-11/txt/msg00823.txt.bz2 I am familiar with SVE extension and understand that implemented approach might be not suitable for ARM. The main point is that only load/store instructions are masked but all other calculations are not (we did special conversion for reduction statements to implement merging predication semantic). For SVE peeling for niters is not required but it is not true for x86 - we must determine what vectorization scheme is more profitable: loop combining (the only essential for SVE) or separate epilogue vectorization using masking or less vectorization factor. So I'd like to have the full list of required changes to our implementation to try to remove them. Thanks. Yuri. 2016-11-09 14:46 GMT+03:00 Bin.Cheng : > On Wed, Nov 9, 2016 at 11:28 AM, Yuri Rumyantsev wrote: >> Thanks Richard for your comments. >> Your proposed to handle epilogue loop just like normal short-trip loop >> but this is not exactly truth since e.g. epilogue must not be peeled >> for alignment. > Yes there must be some differences, my motivation is to minimize that > so we don't need to specially check normal/epilogue loops at too many > places. > Of course it's just my feeling when going through the patch set, and > could be wrong. > > Thanks, > bin >> >> It is not clear for me what are my next steps? Should I re-design the >> patch completely or simply decompose the whole patch to different >> parts? But it means that we must start review process from beginning >> but release is closed to its end. >> Note also that i work for Intel till the end of year and have not idea >> who will continue working on this project. >> >> Any help will be appreciated. >> >> Thanks. >> Yuri. >> >> 2016-11-09 13:37 GMT+03:00 Bin.Cheng : >>> On Tue, Nov 1, 2016 at 12:38 PM, Yuri Rumyantsev wrote: >>>> Hi All, >>>> >>>> I re-send all patches sent by Ilya earlier for review which support >>>> vectorization of loop epilogues and loops with low trip count. We >>>> assume that the only patch - vec-tails-07-combine-tail.patch - was not >>>> approved by Jeff. >>>> >>>> I did re-base of all patches and performed bootstrapping and >>>> regression testing that did not show any new failures. Also all >>>> changes related to new vect_do_peeling algorithm have been changed >>>> accordingly. >>>> >>>> Is it OK for trunk? >>> >>> Hi, >>> I can't approve patches, but had some comments after going through the >>> implementation. >>> >>> One confusing part is cost model change, as well as the way it's used >>> to decide how epilogue loop should be vectorized. Given vect-tail is >>> disabled at the moment and the cost change needs further tuning, is it >>> reasonable to split this part out and get vectorization part >>> reviewed/submitted independently? For example, let user specified >>> parameters make the decision for now. Cost and target dependent >>> changes should go in at last, this could make the patch easier to >>> read. >>> >>> The implementation computes/shares quite amount information from main >>> loop to epilogue loop vectorization. Furthermore, variables/fields >>> for such information are somehow named in a misleading way. For >>> example. LOOP_VINFO_MASK_EPILOGUE gives me the impression this is the >>> flag controlling whether epilogue loop should be vectorized with >>> masking. However, it's actually controlled by exactly the same flag >>> as whether epilogue loop should be combined into the main loop with >>> masking: >>> @@ -7338,6 +8013,9 @@ vect_transform_loop (loop_vec_info loop_vinfo) >>> >>> slpeel_make_loop_iterate_ntimes (loop, niters_vector); >>> >>> + if (LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo)) >>> + vect_combine_loop_epilogue (loop_vinfo); >>> + >>> /* Reduce loop iterations by the vectorization factor. */ >>> scale_loop_profile (loop, GCOV_COMPUTE_SCALE (1, vf), >>> expected_iterations / vf); >>> >>> IMHO, we should decouple main loop vectorization and epilogue >>> vectorization as much as possible by sharing as few information as we >>> can. The general idea is to handle epilogue loop just like normal >>> short-trip loop. For example, we can rename >>> LOOP_VINFO_COMBINE_EPILOGUE into LOOP_VINFO_VECT_MASK (or something >>> else), and we don't differentiate its meaning between main and >>> epilogue(short-trip) loop. It only indicates the current loop should >>> be vectorized with masking no matter it's a main loop or epilogue >>> loop, and it works just like the current implementation. >>> >>> After this change, we can refine vectorization and make it more >>> general for normal loop and epilogue(short trip) loop. For example, >>> this implementation sets LOOP_VINFO_PEELING_FOR_NITER for epilogue >>> loop and use it to control how it should be vectorized: >>> + if (!LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)) >>> + { >>> + LOOP_VINFO_MASK_EPILOGUE (loop_vinfo) = false; >>> + LOOP_VINFO_COMBINE_EPILOGUE (loop_vinfo) = false; >>> + } >>> + else if (LOOP_VINFO_CAN_BE_MASKED (loop_vinfo) >>> + && min_profitable_combine_iters >= 0) >>> + { >>> >>> This works, but not that good for understanding or maintaining. >>> >>> Thanks, >>> bin