From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 98369 invoked by alias); 6 Sep 2017 16:56:04 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 98348 invoked by uid 89); 6 Sep 2017 16:56:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy= X-HELO: mail-wm0-f42.google.com Received: from mail-wm0-f42.google.com (HELO mail-wm0-f42.google.com) (74.125.82.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 06 Sep 2017 16:56:01 +0000 Received: by mail-wm0-f42.google.com with SMTP id f145so17728017wme.0 for ; Wed, 06 Sep 2017 09:56:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:mail-followup-to:cc:subject:references :date:in-reply-to:message-id:user-agent:mime-version; bh=qM5YqvghoSTDP7Fr1ZAgJYQk1kSpFP0jdTDcV7xIIyk=; b=XnOzVsrtxedKvCv0nXeteGZulu6R/iyD+Cu26VV8mBqV9WPxNIf4VlyvbECh4q4luC iCahKCHbtyZ4zzvlJ8wQIyg4cinxA3bj3yyihgXMacWx7xSzb9Q9VeJFK1LWZDYP/cI6 VG4yzGqd0PfSTed0KnvD+4VzEZcJUqvLWN581sei26fxwgMxckL7o3sUye7hB1EW8KfD ugJ8xO7SifEqoZTEECgdItHRW8K/woXv+UyZnOV7SRTVBaUyV/a+Nnx+UMmEsa3zcuE9 L9m9SfzVTxmPlcY6iuSSlSRt+9POoW5s6sHv+4MU4BkdpIbp2S7E/heBmNRE2gKUMTCD 999Q== X-Gm-Message-State: AHPjjUij5khjDHz4C/emr+nq/Zc1amEY3XI5jCU7kdkNzD4PkCQsDP44 NyI9C6tjNoMTXwI7 X-Google-Smtp-Source: ADKCNb6ZkZiSJ+7PASDagTY2Ni96r1u+GhSclyjvlyYisTkq/phQf9AUom9DZIsJ+Lw6woTuYhpy7A== X-Received: by 10.28.195.132 with SMTP id t126mr301380wmf.0.1504716959066; Wed, 06 Sep 2017 09:55:59 -0700 (PDT) Received: from localhost ([217.140.96.141]) by smtp.gmail.com with ESMTPSA id o82sm1690377wmg.36.2017.09.06.09.55.58 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 06 Sep 2017 09:55:58 -0700 (PDT) From: Richard Sandiford To: Michael Collison Mail-Followup-To: Michael Collison ,Richard Biener , Richard Kenner , GCC Patches , nd , Andrew Pinski , richard.sandiford@linaro.org Cc: Richard Biener , Richard Kenner , GCC Patches , nd , Andrew Pinski Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts References: <20170807210159.86DD633CA8@vlsi1.gnat.com> <20170808015616.1386833CA8@vlsi1.gnat.com> <20170808121311.5B7D033CAD@vlsi1.gnat.com> <20170808195158.352B333ED7@vlsi1.gnat.com> <20170808200402.3B4BE33EE9@vlsi1.gnat.com> <20170808202024.8AC9533F0E@vlsi1.gnat.com> <87valgddb2.fsf@linaro.org> <87r2w4cb6w.fsf@linaro.org> <87mv6sc98y.fsf@linaro.org> <87bmmot8wu.fsf@linaro.org> <8760cvu5to.fsf@linaro.org> Date: Wed, 06 Sep 2017 16:56:00 -0000 In-Reply-To: <8760cvu5to.fsf@linaro.org> (Richard Sandiford's message of "Wed, 06 Sep 2017 17:53:23 +0100") Message-ID: <87tw0fsr4y.fsf@linaro.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2017-09/txt/msg00397.txt.bz2 Richard Sandiford writes: > Michael Collison writes: >> Richard, >> >> The problem with this approach for Aarch64 is that >> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is >> normally 0 as it based on the TARGET_SIMD flag. > > Maybe I'm wrong, but that seems like a missed optimisation in itself. > Like you say, the definition is: > > static unsigned HOST_WIDE_INT > aarch64_shift_truncation_mask (machine_mode mode) > { > return > (!SHIFT_COUNT_TRUNCATED > || aarch64_vector_mode_supported_p (mode) > || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE > || (mode) - 1); > } er, aarch64_shift_truncation_mask (machine_mode mode) { return (!SHIFT_COUNT_TRUNCATED || aarch64_vector_mode_supported_p (mode) || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1); } > SHIFT_COUNT_TRUNCATED is: > > #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD) > > and aarch64_vector_mode_supported_p always returns false for > !TARGET_SIMD: > > static bool > aarch64_vector_mode_supported_p (machine_mode mode) > { > if (TARGET_SIMD > && (mode == V4SImode || mode == V8HImode > || mode == V16QImode || mode == V2DImode > || mode == V2SImode || mode == V4HImode > || mode == V8QImode || mode == V2SFmode > || mode == V4SFmode || mode == V2DFmode > || mode == V4HFmode || mode == V8HFmode > || mode == V1DFmode)) > return true; > > return false; > } > > So when does the second || condition fire? > > I'm surprised the aarch64_vect_struct_mode_p part is needed, since > this hook describes the shift optabs, and AArch64 don't provide any > shift optabs for OI, CI or XI. > > Thanks, > Richard > >> -----Original Message----- >> From: Richard Sandiford [mailto:richard.sandiford@linaro.org] >> Sent: Wednesday, September 6, 2017 11:32 AM >> To: Michael Collison >> Cc: Richard Biener ; Richard Kenner >> ; GCC Patches ; nd >> ; Andrew Pinski >> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts >> >> Michael Collison writes: >>> Richard Sandiford do you have any objections to the patch as it stands? >>> It doesn't appear as if anything is going to change in the mid-end >>> anytime soon. >> >> I think one of the suggestions was to do it in expand, taking >> advantage of range info and TARGET_SHIFT_TRUNCATION_MASK. This would >> be like the current FMA_EXPR handling in expand_expr_real_2. >> >> I know there was talk about cleaner approaches, but at least doing the >> above seems cleaner than doing in the backend. It should also be a >> nicely-contained piece of work. >> >> Thanks, >> Richard >> >>> -----Original Message----- >>> From: Richard Sandiford [mailto:richard.sandiford@linaro.org] >>> Sent: Tuesday, August 22, 2017 9:11 AM >>> To: Richard Biener >>> Cc: Richard Kenner ; Michael Collison >>> ; GCC Patches ; nd >>> ; Andrew Pinski >>> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts >>> >>> Richard Biener writes: >>>> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford >>>> wrote: >>>>> Richard Biener writes: >>>>>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford >>>>>> wrote: >>>>>>>Richard Biener writes: >>>>>>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner >>>>>>>> wrote: >>>>>>>>>> Correct. It is truncated for integer shift, but not simd shift >>>>>>>>>> instructions. We generate a pattern in the split that only >>>>>>>generates >>>>>>>>>> the integer shift instructions. >>>>>>>>> >>>>>>>>> That's unfortunate, because it would be nice to do this in >>>>>>>simplify_rtx, >>>>>>>>> since it's machine-independent, but that has to be conditioned >>>>>>>>> on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it. >>>>>>>> >>>>>>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in >>>>>>>> the patterns, like for example with >>>>>>>> >>>>>>>> (define_insn ashlSI3 >>>>>>>> [(set (match_operand 0 "") >>>>>>>> (ashl:SI (match_operand ... ) >>>>>>>> (subreg:QI (match_operand:SI ...)))] >>>>>>>> >>>>>>>> or an explicit and:SI and combine / simplify_rtx should apply the >>>>>>>magic >>>>>>>> optimization we expect. >>>>>>> >>>>>>>The problem with the explicit AND is that you'd end up with either >>>>>>>an AND of two constants for constant shifts, or with two separate >>>>>>>patterns, one for constant shifts and one for variable shifts. >>>>>>>(And the problem in theory with two patterns is that it reduces the >>>>>>>RA's freedom, although in practice I guess we'd always want a >>>>>>>constant shift where possible for cost reasons, and so the RA would >>>>>>>never need to replace pseudos with constants itself.) >>>>>>> >>>>>>>I think all useful instances of this optimisation will be exposed >>>>>>>by the gimple optimisers, so maybe expand could to do it based on >>>>>>>TARGET_SHIFT_TRUNCATION_MASK? That describes the optab rather than >>>>>>>the rtx code and it does take the mode into account. >>>>>> >>>>>> Sure, that could work as well and also take into account range info. >>>>>> But we'd then need named expanders and the result would still have >>>>>> the explicit and or need to be an unspec or a different RTL operation. >>>>> >>>>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have >>>>> target-dependent rather than undefined behaviour, so it's OK for a >>>>> target to use shift codes with out-of-range values. >>>> >>>> Hmm, but that means simplify-rtx can't do anything with them because >>>> we need to preserve target dependent behavior. >>> >>> Yeah, it needs to punt. In practice that shouldn't matter much. >>> >>>> I think the RTL IL should be always well-defined and its semantics >>>> shouldn't have any target dependences (ideally, and if, then they >>>> should be well specified via extra target hooks/macros). >>> >>> That would be nice :-) I think the problem has traditionally been that >>>> shifts can be used in quite a few define_insn patterns besides those >>>> for shift instructions. So if your target defines shifts to have >>>> 256-bit precision (say) then you need to make sure that every >>>> define_insn with a shift rtx will honour that. >>> >>> It's more natural for target guarantees to apply to instructions than >>> to >>>> rtx codes. >>> >>>>> And >>>>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about >>>>> how the normal shift optabs behave, so I don't think we'd need new >>>>> optabs or new unspecs. >>>>> >>>>> E.g. it already works this way when expanding double-word shifts, >>>>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added. There >>>>> it's possible to use a shorter sequence if you know that the shift >>>>> optab truncates the count, so we can do that even if >>>>> SHIFT_COUNT_TRUNCATED isn't defined. >>>> >>>> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK >>>> applies to the instructions generated by the named shift patterns but >>>> _not_ general shift RTXen. But the generated pattern contains shift >>>> RTXen and how can we figure whether they were generated by the named >>>> expanders or by other means? Don't define_expand also serve as >>>> define_insn for things like combine? >>> >>> Yeah, you can't (and aren't supposed to try to) reverse-engineer the >>>> expander from the generated instructions. >>>> TARGET_SHIFT_TRUNCATION_MASK should only be used if you're expanding >>>> a shift optab. >>> >>>> That said, from looking at the docs and your description above it >>>> seems that semantics are not fully reflected in the RTL instruction stream? >>> >>> Yeah, the semantics go from being well-defined in the optab interface >>> to being target-dependent in the rtl stream. >>> >>> Thanks, >>> Richard >>> >>>> >>>> Richard. >>>> >>>>> Thanks, >>>>> Richard >>>>> >>>>>> >>>>>> Richard. >>>>>> >>>>>>>Thanks, >>>>>>>Richard