From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 22875 invoked by alias); 8 Nov 2018 12:09:55 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 22862 invoked by uid 89); 8 Nov 2018 12:09:54 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.7 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=32b X-HELO: mail-lf1-f68.google.com Received: from mail-lf1-f68.google.com (HELO mail-lf1-f68.google.com) (209.85.167.68) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 08 Nov 2018 12:09:53 +0000 Received: by mail-lf1-f68.google.com with SMTP id c16so13984237lfj.8 for ; Thu, 08 Nov 2018 04:09:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=K8qOCjfiQm5bfZy7jlFyJdFdV9sQ8Y8NY7OOx/v8j8o=; b=CwD1UKO247Ghiv2ssgx0ETa38Rh2DZImMWpw9HrmpUky1D5wxjL9yUFJckJN5+0lvr Vzy50dbWfzAoWbkT2bsbP2Oe4QnaUDiEJxq2BHovWSArUDb4No22gIb+IdCfcULYqVnn az4AUk5+Zg+p/HgvKjQB/oEx3un66FQud0lFwULKYD3ty9G1XUjaQc6T5Rj7OgHP/9pO pmQh+DY3r3ZUZ1fLqiYxSFB4nkZljfPiYaiPeoyyvnT/o+0h2HxJddABX/OtDGX59U8x Tp/FtBenOUwRdopGDxvsmz5IoikIuvGEQJZym8nOZsiIzajLFCAYgeRklvDgimkgCtm0 eQEQ== MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Thu, 08 Nov 2018 12:09:00 -0000 Message-ID: Subject: Re: [RFC][PATCH]Merge VEC_COND_EXPR into MASK_STORE after loop vectorization To: renlin.li@foss.arm.com Cc: GCC Patches , Richard Sandiford , Ramana Radhakrishnan , James Greenhalgh Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2018-11/txt/msg00548.txt.bz2 On Thu, Nov 8, 2018 at 12:02 PM Renlin Li wrote: > > Hi all, > > When allow-store-data-races is enabled, ifcvt would prefer to generated > conditional select and unconditional store to convert certain if statement > into: > > _ifc_1 = val > _ifc_2 = A[i] > val = cond? _ifc_1 : _ifc_2 > A[i] = val > > When the loop got vectorized, this pattern will be turned into > MASK_LOAD, VEC_COND_EXPR and MASK_STORE. This could be improved. I'm somewhat confused - the vectorizer doesn't generate a masked store when if-conversion didn't create one in the first place. In particular with allow-store-data-races=1 (what your testcase uses) there are no masked loads/stores generated at all. So at least you need a better testcase to motivate (one that doesn't load from array[i] so that we know the conditional stores might trap). So what I see (with store data races not allowed) from ifcvt is [local count: 1006632961]: # i_20 = PHI # ivtmp_18 = PHI a_10 = array[i_20]; _1 = a_10 & 1; _2 = a_10 + 1; _32 = _1 != 0; _33 = &array[i_20]; .MASK_STORE (_33, 32B, _32, _2); prephitmp_8 = _1 != 0 ? _2 : a_10; _4 = a_10 + 2; _34 = prephitmp_8 > 10; .MASK_STORE (_33, 32B, _34, _4); i_13 = i_20 + 1; ivtmp_5 = ivtmp_18 - 1; if (ivtmp_5 != 0) and what you want to do is merge prephitmp_8 = _1 != 0 ? _2 : a_10; _34 = prephitmp_8 > 10; somehow? But your patch mentions that _4 should be prephitmp_8 so it wouldn't do anything here? > The change here add a post processing function to combine the VEC_COND_EXPR > expression into MASK_STORE, and delete related dead code. > > I am a little bit conservative here. > I didn't change the default behavior of ifcvt to always generate MASK_STORE, > although it should be safe in all cases (allow or dis-allow store data race). > > MASK_STORE might not well handled in vectorization pass compared with > conditional select. It might be too early and aggressive to do that in ifcvt. > And the performance of MASK_STORE might not good for some platforms. > (We could add --param or target hook to differentiate this ifcvt behavior > on different platforms) > > Another reason I did not do that in ifcvt is the data reference > analysis. To create a MASK_STORE, a pointer is created as the first > argument to the internal function call. If the pointer is created out of > array references, e.g. x = &A[i], data reference analysis could not properly > analysis the relationship between MEM_REF (x) and ARRAY_REF (A, i). This > will create a versioned loop beside the vectorized one. Actually for your testcase it won't vectorize because there's compile-time aliasing (somehow we miss handling of dependence distance zero?!) > (I have hacks to look through the MEM_REF, and restore the reference back to > ARRAY_REF (A, i). Maybe we could do analysis on lowered address expression? > I saw we have gimple laddress pass to aid the vectorizer) An old idea of mine is to have dependence analysis fall back to lowered address form when it fails to match two references. This would require re-analysis and eventually storing an alternate "inner reference" in the data-ref structure. > The approach here comes a little bit late, on the condition that vector > MASK_STORE is generated by loop vectorizer. In this case, it is definitely > beneficial to do the code transformation. > > Any thoughts on the best way to fix the issue? > > > This patch has been tested with aarch64-none-elf, no regressions. > > Regards, > Renlin > > gcc/ChangeLog: > > 2018-11-08 Renlin Li > > * tree-vectorizer.h (combine_sel_mask_store): Declare new function. > * tree-vect-loop.c (combine_sel_mask_store): Define new function. > * tree-vectorizer.c (vectorize_loops): Call it. > > gcc/testsuite/ChangeLog: > > 2018-11-08 Renlin Li > > * gcc.target/aarch64/sve/combine_vcond_mask_store_1.c: New. >