From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 81986 invoked by alias); 16 Jun 2016 16:03:50 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 81977 invoked by uid 89); 16 Jun 2016 16:03:49 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.3 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-vk0-f48.google.com Received: from mail-vk0-f48.google.com (HELO mail-vk0-f48.google.com) (209.85.213.48) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Thu, 16 Jun 2016 16:03:46 +0000 Received: by mail-vk0-f48.google.com with SMTP id u64so79574277vkf.3 for ; Thu, 16 Jun 2016 09:03:46 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=YLe2NSqrTQEfXxVFY7KwjW4w1a89He88kgIztLiq56A=; b=HvS8YROvCNzKnEoH9c4V+XCYy3W7eWQpCR+ypB+pTa1Dv7PlXDR9B2Wm6SI8mHyAN4 h1fFZB75XGzwJWsjL64GmHZiu7nGV++JG97lz7LDvv3u/KxHc8Ek5uQ9RX8CBbK9uunk 5ymr19rrSBwpTOG896NilTxu9ufUsJTNA4TYRPbrksCop1XB4dcbrXa5i1S24GSAvdOO orGQW2bHu4IeiNQaokbUt8BDIrJsCsWV3+G1Ppy3Eev6ntqXTa/HCkpsZDePuqScWnzL OfJJorXc5g5bIoZyBdNi3fHKt2ylqA0ZUGBVVc9MgfH1lwJ7gHjgMh/xlrUbi5rm2KTl F04w== X-Gm-Message-State: ALyK8tJZX/8NxgFbRk3s/GGqITucMh0i/eFNxrEER1TiDyvARiGiInACRGQM3FZedjgKoJaymDUaourvG1kEAQ== X-Received: by 10.176.3.133 with SMTP id 5mr2478636uau.93.1466093024345; Thu, 16 Jun 2016 09:03:44 -0700 (PDT) MIME-Version: 1.0 Received: by 10.176.5.201 with HTTP; Thu, 16 Jun 2016 09:03:43 -0700 (PDT) In-Reply-To: <6bc81b5f-6e47-6e23-ffe7-56ca898ce3f5@redhat.com> References: <20160519194450.GH40563@msticlxl57.ims.intel.com> <6bc81b5f-6e47-6e23-ffe7-56ca898ce3f5@redhat.com> From: Ilya Enkovich Date: Thu, 16 Jun 2016 16:03:00 -0000 Message-ID: Subject: Re: [PATCH, vec-tails 07/10] Support loop epilogue combining To: Jeff Law Cc: Richard Biener , GCC Patches Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2016-06/txt/msg01258.txt.bz2 2016-06-16 18:51 GMT+03:00 Jeff Law : > On 06/16/2016 09:41 AM, Ilya Enkovich wrote: >> >> 2016-06-15 14:44 GMT+03:00 Richard Biener : >>> >>> On Thu, May 19, 2016 at 9:44 PM, Ilya Enkovich >>> wrote: >>>> >>>> Hi, >>>> >>>> This patch introduces support for loop epilogue combining. This >>>> includes >>>> support in cost estimation and all required changes required to mask >>>> vectorized loop. >>> >>> >>> I wonder why you compute a minimum number of iterations to make masking >>> of the vectorized body profitable rather than a maximum number of >>> iterations. >>> >>> I'd say masking the vectorized loop is profitable if niter/vf * >>> masking-overhead < epilogue-cost. >>> Masking the epilogue is profitable if vectorizing the epilogue with >>> masking is profitable. >>> >>> Am I missing something? >> >> >> We don't have two versions of vectorized loop. The choice is between >> vector >> and scalar loop and in this case minimum number of iterations is what we >> need. >> Generating two vectorized loop versions would be something new to >> vectorizer. > > What I think Richi is saying is that we have to multiply the cost of the > masking overhead by the number of iterations of vectorized loop to determine > the cost of masking -- the more loop iterations we have, the greater the > cost of masking in the loop becomes and those costs may be higher than the > normal epilogue sequence. Right. But we compute that dynamically. And what do we do when we see overall masking cost becomes greater than a scalar epilogue cost? The only case when this check is useful is when we have vectorized non-combined version of a loop. The original idea of combining (patches sent by Yuri last year) was to use it only in cases when masking cost is small enough (and we expect cheap masking computations are 'hidden' under heavier instructions by scheduler, so we don't loose performance even for high iterations count). Dynamically choosing between combined and non-combined versions is another story. Thanks, Ilya > > Jeff > > >