From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 121098 invoked by alias); 23 Oct 2019 11:51:47 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 121087 invoked by uid 89); 23 Oct 2019 11:51:47 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-4.6 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.1 spammy=H*i:sk:CAFiYyc, Advanced, truncation, paying X-HELO: foss.arm.com Received: from Unknown (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 23 Oct 2019 11:51:45 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C3D39494; Wed, 23 Oct 2019 04:51:37 -0700 (PDT) Received: from localhost (e121540-lin.manchester.arm.com [10.32.98.126]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 4F9603F6C4; Wed, 23 Oct 2019 04:51:37 -0700 (PDT) From: Richard Sandiford To: Richard Biener Mail-Followup-To: Richard Biener ,GCC Patches , richard.sandiford@arm.com Cc: GCC Patches Subject: Re: RFC/A: Add a targetm.vectorize.related_mode hook References: Date: Wed, 23 Oct 2019 12:00:00 -0000 In-Reply-To: (Richard Biener's message of "Wed, 23 Oct 2019 13:15:50 +0200") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-IsSubscribed: yes X-SW-Source: 2019-10/txt/msg01645.txt.bz2 Richard Biener writes: > On Wed, Oct 23, 2019 at 1:00 PM Richard Sandiford > wrote: >> >> This patch is the first of a series that tries to remove two >> assumptions: >> >> (1) that all vectors involved in vectorisation must be the same size >> >> (2) that there is only one vector mode for a given element mode and >> number of elements >> >> Relaxing (1) helps with targets that support multiple vector sizes or >> that require the number of elements to stay the same. E.g. if we're >> vectorising code that operates on narrow and wide elements, and the >> narrow elements use 64-bit vectors, then on AArch64 it would normally >> be better to use 128-bit vectors rather than pairs of 64-bit vectors >> for the wide elements. >> >> Relaxing (2) makes it possible for -msve-vector-bits=128 to preoduce >> fixed-length code for SVE. It also allows unpacked/half-size SVE >> vectors to work with -msve-vector-bits=256. >> >> The patch adds a new hook that targets can use to control how we >> move from one vector mode to another. The hook takes a starting vector >> mode, a new element mode, and (optionally) a new number of elements. >> The flexibility needed for (1) comes in when the number of elements >> isn't specified. >> >> All callers in this patch specify the number of elements, but a later >> vectoriser patch doesn't. I won't be posting the vectoriser patch >> for a few days, hence the RFC/A tag. >> >> Tested individually on aarch64-linux-gnu and as a series on >> x86_64-linux-gnu. OK to install? Or if not yet, does the idea >> look OK? > > In isolation the idea looks good but maybe a bit limited? I see > how it works for the same-size case but if you consider x86 > where we have SSE, AVX256 and AVX512 what would it return > for related_vector_mode (V4SImode, SImode, 0)? Or is this > kind of query not intended (where the component modes match > but nunits is zero)? In that case we'd normally get V4SImode back. It's an allowed combination, but not very useful. > How do you get from SVE fixed 128bit to NEON fixed 128bit then? Or is > it just used to stay in the same register set for different component > modes? Yeah, the idea is to use the original vector mode as essentially a base architecture. The follow-on patches replace vec_info::vector_size with vec_info::vector_mode and targetm.vectorize.autovectorize_vector_sizes with targetm.vectorize.autovectorize_vector_modes. These are the starting modes that would be passed to the hook in the nunits==0 case. E.g. for Advanced SIMD on AArch64, it would make more sense for related_mode (V4HImode, SImode, 0) to be V4SImode rather than V2SImode. I think things would work in a similar way for the x86_64 vector archs. For SVE we'd add both VNx16QImode (the SVE mode) and V16QImode (the Advanced SIMD mode) to autovectorize_vector_modes, even though they happen to be the same size for 128-bit SVE. We can then compare 128-bit SVE with 128-bit Advanced SIMD, with related_mode ensuring that we consistently use all-SVE modes or all-Advanced SIMD modes for each attempt. The plan for SVE is to add 4(!) modes to autovectorize_vector_modes: - VNx16QImode (full vector) - VNx8QImode (half vector) - VNx4QImode (quarter vector) - VNx2QImode (eighth vector) and then pick the one with the lowest cost. related_mode would keep the number of units the same for nunits==0, within the limit of the vector size. E.g.: - related_mode (VNx16QImode, HImode, 0) == VNx8HImode (full vector) - related_mode (VNx8QImode, HImode, 0) == VNx8HImode (full vector) - related_mode (VNx4QImode, HImode, 0) == VNx4HImode (half vector) - related_mode (VNx2QImode, HImode, 0) == VNx2HImode (quarter vector) and: - related_mode (VNx16QImode, SImode, 0) == VNx4SImode (full vector) - related_mode (VNx8QImode, SImode, 0) == VNx4SImode (full vector) - related_mode (VNx4QImode, SImode, 0) == VNx4SImode (full vector) - related_mode (VNx2QImode, SImode, 0) == VNx2SImode (half vector) So when operating on multiple element sizes, the tradeoff is between trying to make full use of the vector size (higher base nunits) vs. trying to remove packs and unpacks between multiple vector copies (lower base nunits). The latter is useful because extending within a vector is an in-lane rather than cross-lane operation and truncating within a vector is a no-op. With a couple of tweaks, we seem to do a good job of guessing which version has the lowest cost, at least for the simple cases I've tried so far. Obviously there's going to be a bit of a compile-time cost for SVE targets, but I think it's worth paying for. > As said, it looks good but I'd like to see the followups. > > Note I delayed thinking about relaxing the single-vector-size > constraint in the vectorizer until after we're SLP only because > that looked more easily done there. I also remember patches > relaxing this a bit from RISCV folks. That side seemed easier than I'd expected TBH, at least after the mostly mechanical changes above. The main missing thing was support for extension and truncation between integer vector modes with the same number of elements but different element sizes. But that's really just a natural extension of the scalar extend and truncate optabs. Thanks, Richard