From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 105252 invoked by alias); 9 Mar 2017 08:19:10 -0000 Mailing-List: contact gcc-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-owner@gcc.gnu.org Received: (qmail 105238 invoked by uid 89); 9 Mar 2017 08:19:10 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.4 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_SORBS_SPAM,SPF_PASS autolearn=no version=3.3.2 spammy=H*f:sk:1489002, H*Ad:U*matz, H*f:sk:dSEfbaq, safelen X-HELO: mail-ot0-f174.google.com Received: from mail-ot0-f174.google.com (HELO mail-ot0-f174.google.com) (74.125.82.174) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 09 Mar 2017 08:19:08 +0000 Received: by mail-ot0-f174.google.com with SMTP id 19so51939774oti.0 for ; Thu, 09 Mar 2017 00:19:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=/H4C5fXlomuPKUFX1k2uLv+DSA9e6FDya3CbY1FTcu4=; b=RCMR4VEvhf7rwIlDWeaF1Kcu3TzC25VXOy7i0xy/6xpcgpGy3cUuLR76rYhBCG3foC fTXN4KLj5Xf6/EO6G9XYYLNOFqdLcki+r+XyyIIu2G19Wha3y9sZfwovLYLBQ9mHXvl6 zu3zKkGKaTme0WYYBp0YzR6ZjqHHejJfbv4tmbiSe07H+fEDixFHPePaoMPKmlsGAGaq HF1tO9I0OgqLgnP0x3tYp8mI0lfVj8/9PH6oATRyeF/PlVH8s7D9AQds78DU8tjfxw19 65Me1N56DiJf06s2ndMpdb2fGmH9LfrB9k2ynmcn5JzGQzQlcvUxiknoXAy3n2IVWh42 6NVA== X-Gm-Message-State: AMke39kbJePpLnM+iw35Q5EnQ3/UwOESTgWhrjtvLzhgdSVPoLk4XSG8gSRKHOsTd1+Vv9jL3aHfmLJS/bDyZw== X-Received: by 10.157.60.153 with SMTP id z25mr5814062otc.27.1489047547583; Thu, 09 Mar 2017 00:19:07 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.21.25 with HTTP; Thu, 9 Mar 2017 00:19:07 -0800 (PST) In-Reply-To: <20170309081243.GE22703@tucnak> References: <201703062237.v26MbW5e008866@sellcey-dt.caveonetworks.com> <1489002090.22552.19.camel@caviumnetworks.com> <20170309081243.GE22703@tucnak> From: Richard Biener Date: Thu, 09 Mar 2017 08:19:00 -0000 Message-ID: Subject: Re: SPEC 456.hmmer vectorization question To: Jakub Jelinek Cc: Steve Ellcey , Michael Matz , GCC Development , Jeff Law Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2017-03/txt/msg00021.txt.bz2 On Thu, Mar 9, 2017 at 9:12 AM, Jakub Jelinek wrote: > On Thu, Mar 09, 2017 at 09:02:38AM +0100, Richard Biener wrote: >> It would need to be done before graphite, and yes, the question is when >> to do this (given the non-trival text size and runtime cost). One option is >> to do sth similar like we do with IFN_LOOP_VECTORIZED, that is, after >> followup transforms decide whether the specialized version received any >> important optimization. Another option is to add value profile counters >> for aliasing and only do this with FDO when we know at runtime there >> is no aliasing. > > It doesn't have to be either/or. If we have FDO, we can do it > unconditionally if we have gathered into that there is likely no aliasing, > and optimize the other loop (for the case of aliasing) for size. > If we don't have FDO, we could do the IFN_LOOP_VERSIONED way. > For IFN_LOOP_VERSIONED, if we check all aliasing cases we could then either > use the OpenMP/Cilk/ivdep pragma loop properties (loop->safelen etc.), > or even have something stronger (that would say that there aren't > any inter-iteration memory dependencies). We can use MR_DEPENDENCE_* to partition the dependences properly as well. For loop distribution we can also check profitability before adding any dependence related edges and version according to them. Of course that needs a meaningful cost model... Similarly you can run the ISL optimizer as if there were no dependences and compare the resulting code to the original one with a cost model. This is what the vectorizer does before doing the versioning. For enablement transforms cost modeling is of course hard unless you can chain analysis parts of multiple passes (basically integrate loop passes into "one"). Of course this breaks down once you consider not disambiguating all unknown dependences but only a few (in case the transform can still handle some of those cases - the vectorizer for example cannot deal with any unknown dependences). (breaks down in complexity) Richard. > > Jakub