From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 58796 invoked by alias); 23 Oct 2019 12:00:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 58714 invoked by uid 89); 23 Oct 2019 12:00:32 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*i:sk:mptr233, H*f:sk:mptr233, crafted X-HELO: mail-lj1-f172.google.com Received: from mail-lj1-f172.google.com (HELO mail-lj1-f172.google.com) (209.85.208.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 23 Oct 2019 12:00:26 +0000 Received: by mail-lj1-f172.google.com with SMTP id l21so20814567lje.4 for ; Wed, 23 Oct 2019 05:00:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0WDD5/h0R1V42X9d13UxGpa1bDmrwU/fccDX75RVwpU=; b=J0i9sApbqvDFkcDD3JQI76I48chnsoo48zAPKTq3lIa/5RGGW9RTIUPfUrRqU6X8pi X4y8le3LQ8ZCy0CLDTJf7xOss40ald1FLcz0Xeir23gDtx/JJpgsbij95DoIVm5d03ga 5HaqyChD4H8/rQGTUOAxPM1W+uxRY6MqurDCmMqP7Un+r+8b7UVtsRX+EnRBRqgPzMDn zfgqgHUBLMPrsReDWvs0ZNWDEzBPaqKEqKWNEl66DT2Nm1UtJh645p+MdOA7Fjgezng+ aLxqBN5ixUKGaDmRuH/BPMgRPYZxzkLJRB1IjjsQi4oFCG80fAw7kToStFRVzzElPH6W uzXQ== MIME-Version: 1.0 References: In-Reply-To: From: Richard Biener Date: Wed, 23 Oct 2019 12:07:00 -0000 Message-ID: Subject: Re: RFC/A: Add a targetm.vectorize.related_mode hook To: Richard Sandiford Cc: GCC Patches Content-Type: text/plain; charset="UTF-8" X-IsSubscribed: yes X-SW-Source: 2019-10/txt/msg01646.txt.bz2 On Wed, Oct 23, 2019 at 1:51 PM Richard Sandiford wrote: > > Richard Biener writes: > > On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford > > wrote: > >> > >> This patch is the first of a series that tries to remove two > >> assumptions: > >> > >> (1) that all vectors involved in vectorisation must be the same size > >> > >> (2) that there is only one vector mode for a given element mode and > >> number of elements > >> > >> Relaxing (1) helps with targets that support multiple vector sizes or > >> that require the number of elements to stay the same. E.g. if we're > >> vectorising code that operates on narrow and wide elements, and the > >> narrow elements use 64-bit vectors, then on AArch64 it would normally > >> be better to use 128-bit vectors rather than pairs of 64-bit vectors > >> for the wide elements. > >> > >> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce > >> fixed-length code for SVE. It also allows unpacked/half-size SVE > >> vectors to work with -msve-vector-bits=256. > >> > >> The patch adds a new hook that targets can use to control how we > >> move from one vector mode to another. The hook takes a starting vector > >> mode, a new element mode, and (optionally) a new number of elements. > >> The flexibility needed for (1) comes in when the number of elements > >> isn't specified. > >> > >> All callers in this patch specify the number of elements, but a later > >> vectoriser patch doesn't. I won't be posting the vectoriser patch > >> for a few days, hence the RFC/A tag. > >> > >> Tested individually on aarch64-linux-gnu and as a series on > >> x86_64-linux-gnu. OK to install? Or if not yet, does the idea > >> look OK? > > > > In isolation the idea looks good but maybe a bit limited? I see > > how it works for the same-size case but if you consider x86 > > where we have SSE, AVX256 and AVX512 what would it return > > for related_vector_mode (V4SImode, SImode, 0)? Or is this > > kind of query not intended (where the component modes match > > but nunits is zero)? > > In that case we'd normally get V4SImode back. It's an allowed > combination, but not very useful. > > > How do you get from SVE fixed 128bit to NEON fixed 128bit then? Or is > > it just used to stay in the same register set for different component > > modes? > > Yeah, the idea is to use the original vector mode as essentially > a base architecture. > > The follow-on patches replace vec_info::vector_size with > vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes > with targetm.vectorize.autovectorize_vector_modes. These are the > starting modes that would be passed to the hook in the nunits==0 case. > > E.g. for Advanced SIMD on AArch64, it would make more sense for > related_mode (V4HImode, SImode, 0) to be V4SImode rather than V2SImode. > I think things would work in a similar way for the x86_64 vector archs. > > For SVE we'd add both VNx16QImode (the SVE mode) and V16QImode (the > Advanced SIMD mode) to autovectorize_vector_modes, even though they > happen to be the same size for 128-bit SVE. We can then compare > 128-bit SVE with 128-bit Advanced SIMD, with related_mode ensuring > that we consistently use all-SVE modes or all-Advanced SIMD modes > for each attempt. > > The plan for SVE is to add 4(!) modes to autovectorize_vector_modes: > > - VNx16QImode (full vector) > - VNx8QImode (half vector) > - VNx4QImode (quarter vector) > - VNx2QImode (eighth vector) > > and then pick the one with the lowest cost. related_mode would > keep the number of units the same for nunits==0, within the limit > of the vector size. E.g.: > > - related_mode (VNx16QImode, HImode, 0) == VNx8HImode (full vector) > - related_mode (VNx8QImode, HImode, 0) == VNx8HImode (full vector) > - related_mode (VNx4QImode, HImode, 0) == VNx4HImode (half vector) > - related_mode (VNx2QImode, HImode, 0) == VNx2HImode (quarter vector) > > and: > > - related_mode (VNx16QImode, SImode, 0) == VNx4SImode (full vector) > - related_mode (VNx8QImode, SImode, 0) == VNx4SImode (full vector) > - related_mode (VNx4QImode, SImode, 0) == VNx4SImode (full vector) > - related_mode (VNx2QImode, SImode, 0) == VNx2SImode (half vector) > > So when operating on multiple element sizes, the tradeoff is between > trying to make full use of the vector size (higher base nunits) vs. > trying to remove packs and unpacks between multiple vector copies > (lower base nunits). The latter is useful because extending within > a vector is an in-lane rather than cross-lane operation and truncating > within a vector is a no-op. > > With a couple of tweaks, we seem to do a good job of guessing which > version has the lowest cost, at least for the simple cases I've tried > so far. > > Obviously there's going to be a bit of a compile-time cost > for SVE targets, but I think it's worth paying for. I would guess that immediate benefit could be seen with basic-block vectorization which simply fails when conversions are involved. x86_64 should now always support V4SImode and V2SImode so eventually a testcase can be crafted for that as well. > > As said, it looks good but I'd like to see the followups. > > > > Note I delayed thinking about relaxing the single-vector-size > > constraint in the vectorizer until after we're SLP only because > > that looked more easily done there. I also remember patches > > relaxing this a bit from RISCV folks. > > That side seemed easier than I'd expected TBH, at least after the > mostly mechanical changes above. The main missing thing was support > for extension and truncation between integer vector modes with the same > number of elements but different element sizes. But that's really just > a natural extension of the scalar extend and truncate optabs. Ah, I see - that's good to hear. Richard. > Thanks, > Richard