From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-511575-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 58796 invoked by alias); 23 Oct 2019 12:00:34 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 58714 invoked by uid 89); 23 Oct 2019 12:00:32 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=H*i:sk:mptr233, H*f:sk:mptr233, crafted
X-HELO: mail-lj1-f172.google.com
Received: from mail-lj1-f172.google.com (HELO mail-lj1-f172.google.com) (209.85.208.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 23 Oct 2019 12:00:26 +0000
Received: by mail-lj1-f172.google.com with SMTP id l21so20814567lje.4        for <gcc-patches@gcc.gnu.org>; Wed, 23 Oct 2019 05:00:21 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=gmail.com; s=20161025;        h=mime-version:references:in-reply-to:from:date:message-id:subject:to         :cc;        bh=0WDD5/h0R1V42X9d13UxGpa1bDmrwU/fccDX75RVwpU=;        b=J0i9sApbqvDFkcDD3JQI76I48chnsoo48zAPKTq3lIa/5RGGW9RTIUPfUrRqU6X8pi         X4y8le3LQ8ZCy0CLDTJf7xOss40ald1FLcz0Xeir23gDtx/JJpgsbij95DoIVm5d03ga         5HaqyChD4H8/rQGTUOAxPM1W+uxRY6MqurDCmMqP7Un+r+8b7UVtsRX+EnRBRqgPzMDn         zfgqgHUBLMPrsReDWvs0ZNWDEzBPaqKEqKWNEl66DT2Nm1UtJh645p+MdOA7Fjgezng+         aLxqBN5ixUKGaDmRuH/BPMgRPYZxzkLJRB1IjjsQi4oFCG80fAw7kToStFRVzzElPH6W         uzXQ==
MIME-Version: 1.0
References: <mptv9sfengd.fsf@arm.com> <CAFiYyc0Jh9wstAvAPbeQdziBuPhuYshgH0nSn17LfQ8rLeKN6A@mail.gmail.com> <mptr233d6if.fsf@arm.com>
In-Reply-To: <mptr233d6if.fsf@arm.com>
From: Richard Biener <richard.guenther@gmail.com>
Date: Wed, 23 Oct 2019 12:07:00 -0000
Message-ID: <CAFiYyc2hVJhi+C5=WHz6wkGa9hHWieqEJ2D23=bxdGwHLhDU+w@mail.gmail.com>
Subject: Re: RFC/A: Add a targetm.vectorize.related_mode hook
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Content-Type: text/plain; charset="UTF-8"
X-IsSubscribed: yes
X-SW-Source: 2019-10/txt/msg01646.txt.bz2

On Wed, Oct 23, 2019 at 1:51 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> This patch is the first of a series that tries to remove two
> >> assumptions:
> >>
> >> (1) that all vectors involved in vectorisation must be the same size
> >>
> >> (2) that there is only one vector mode for a given element mode and
> >>     number of elements
> >>
> >> Relaxing (1) helps with targets that support multiple vector sizes or
> >> that require the number of elements to stay the same.  E.g. if we're
> >> vectorising code that operates on narrow and wide elements, and the
> >> narrow elements use 64-bit vectors, then on AArch64 it would normally
> >> be better to use 128-bit vectors rather than pairs of 64-bit vectors
> >> for the wide elements.
> >>
> >> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce
> >> fixed-length code for SVE.  It also allows unpacked/half-size SVE
> >> vectors to work with -msve-vector-bits=256.
> >>
> >> The patch adds a new hook that targets can use to control how we
> >> move from one vector mode to another.  The hook takes a starting vector
> >> mode, a new element mode, and (optionally) a new number of elements.
> >> The flexibility needed for (1) comes in when the number of elements
> >> isn't specified.
> >>
> >> All callers in this patch specify the number of elements, but a later
> >> vectoriser patch doesn't.  I won't be posting the vectoriser patch
> >> for a few days, hence the RFC/A tag.
> >>
> >> Tested individually on aarch64-linux-gnu and as a series on
> >> x86_64-linux-gnu.  OK to install?  Or if not yet, does the idea
> >> look OK?
> >
> > In isolation the idea looks good but maybe a bit limited?  I see
> > how it works for the same-size case but if you consider x86
> > where we have SSE, AVX256 and AVX512 what would it return
> > for related_vector_mode (V4SImode, SImode, 0)?  Or is this
> > kind of query not intended (where the component modes match
> > but nunits is zero)?
>
> In that case we'd normally get V4SImode back.  It's an allowed
> combination, but not very useful.
>
> > How do you get from SVE fixed 128bit to NEON fixed 128bit then?  Or is
> > it just used to stay in the same register set for different component
> > modes?
>
> Yeah, the idea is to use the original vector mode as essentially
> a base architecture.
>
> The follow-on patches replace vec_info::vector_size with
> vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes
> with targetm.vectorize.autovectorize_vector_modes.  These are the
> starting modes that would be passed to the hook in the nunits==0 case.
>
> E.g. for Advanced SIMD on AArch64, it would make more sense for
> related_mode (V4HImode, SImode, 0) to be V4SImode rather than V2SImode.
> I think things would work in a similar way for the x86_64 vector archs.
>
> For SVE we'd add both VNx16QImode (the SVE mode) and V16QImode (the
> Advanced SIMD mode) to autovectorize_vector_modes, even though they
> happen to be the same size for 128-bit SVE.  We can then compare
> 128-bit SVE with 128-bit Advanced SIMD, with related_mode ensuring
> that we consistently use all-SVE modes or all-Advanced SIMD modes
> for each attempt.
>
> The plan for SVE is to add 4(!) modes to autovectorize_vector_modes:
>
> - VNx16QImode (full vector)
> - VNx8QImode (half vector)
> - VNx4QImode (quarter vector)
> - VNx2QImode (eighth vector)
>
> and then pick the one with the lowest cost.  related_mode would
> keep the number of units the same for nunits==0, within the limit
> of the vector size.  E.g.:
>
> - related_mode (VNx16QImode, HImode, 0) == VNx8HImode (full vector)
> - related_mode (VNx8QImode, HImode, 0) == VNx8HImode (full vector)
> - related_mode (VNx4QImode, HImode, 0) == VNx4HImode (half vector)
> - related_mode (VNx2QImode, HImode, 0) == VNx2HImode (quarter vector)
>
> and:
>
> - related_mode (VNx16QImode, SImode, 0) == VNx4SImode (full vector)
> - related_mode (VNx8QImode, SImode, 0) == VNx4SImode (full vector)
> - related_mode (VNx4QImode, SImode, 0) == VNx4SImode (full vector)
> - related_mode (VNx2QImode, SImode, 0) == VNx2SImode (half vector)
>
> So when operating on multiple element sizes, the tradeoff is between
> trying to make full use of the vector size (higher base nunits) vs.
> trying to remove packs and unpacks between multiple vector copies
> (lower base nunits).  The latter is useful because extending within
> a vector is an in-lane rather than cross-lane operation and truncating
> within a vector is a no-op.
>
> With a couple of tweaks, we seem to do a good job of guessing which
> version has the lowest cost, at least for the simple cases I've tried
> so far.
>
> Obviously there's going to be a bit of a compile-time cost
> for SVE targets, but I think it's worth paying for.

I would guess that immediate benefit could be seen with
basic-block vectorization which simply fails when conversions
are involved.  x86_64 should now always support V4SImode
and V2SImode so eventually a testcase can be crafted for that
as well.

> > As said, it looks good but I'd like to see the followups.
> >
> > Note I delayed thinking about relaxing the single-vector-size
> > constraint in the vectorizer until after we're SLP only because
> > that looked more easily done there.  I also remember patches
> > relaxing this a bit from RISCV folks.
>
> That side seemed easier than I'd expected TBH, at least after the
> mostly mechanical changes above.  The main missing thing was support
> for extension and truncation between integer vector modes with the same
> number of elements but different element sizes.  But that's really just
> a natural extension of the scalar extend and truncate optabs.

Ah, I see - that's good to hear.

Richard.

> Thanks,
> Richard