From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-461635-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 98369 invoked by alias); 6 Sep 2017 16:56:04 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 98348 invoked by uid 89); 6 Sep 2017 16:56:04 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-6.9 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_2,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=
X-HELO: mail-wm0-f42.google.com
Received: from mail-wm0-f42.google.com (HELO mail-wm0-f42.google.com) (74.125.82.42) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 06 Sep 2017 16:56:01 +0000
Received: by mail-wm0-f42.google.com with SMTP id f145so17728017wme.0        for <gcc-patches@gcc.gnu.org>; Wed, 06 Sep 2017 09:56:00 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;        d=1e100.net; s=20161025;        h=x-gm-message-state:from:to:mail-followup-to:cc:subject:references         :date:in-reply-to:message-id:user-agent:mime-version;        bh=qM5YqvghoSTDP7Fr1ZAgJYQk1kSpFP0jdTDcV7xIIyk=;        b=XnOzVsrtxedKvCv0nXeteGZulu6R/iyD+Cu26VV8mBqV9WPxNIf4VlyvbECh4q4luC         iCahKCHbtyZ4zzvlJ8wQIyg4cinxA3bj3yyihgXMacWx7xSzb9Q9VeJFK1LWZDYP/cI6         VG4yzGqd0PfSTed0KnvD+4VzEZcJUqvLWN581sei26fxwgMxckL7o3sUye7hB1EW8KfD         ugJ8xO7SifEqoZTEECgdItHRW8K/woXv+UyZnOV7SRTVBaUyV/a+Nnx+UMmEsa3zcuE9         L9m9SfzVTxmPlcY6iuSSlSRt+9POoW5s6sHv+4MU4BkdpIbp2S7E/heBmNRE2gKUMTCD         999Q==
X-Gm-Message-State: AHPjjUij5khjDHz4C/emr+nq/Zc1amEY3XI5jCU7kdkNzD4PkCQsDP44	NyI9C6tjNoMTXwI7
X-Google-Smtp-Source: ADKCNb6ZkZiSJ+7PASDagTY2Ni96r1u+GhSclyjvlyYisTkq/phQf9AUom9DZIsJ+Lw6woTuYhpy7A==
X-Received: by 10.28.195.132 with SMTP id t126mr301380wmf.0.1504716959066;        Wed, 06 Sep 2017 09:55:59 -0700 (PDT)
Received: from localhost ([217.140.96.141])        by smtp.gmail.com with ESMTPSA id o82sm1690377wmg.36.2017.09.06.09.55.58        (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256);        Wed, 06 Sep 2017 09:55:58 -0700 (PDT)
From: Richard Sandiford <richard.sandiford@linaro.org>
To: Michael Collison <Michael.Collison@arm.com>
Mail-Followup-To: Michael Collison <Michael.Collison@arm.com>,Richard Biener <richard.guenther@gmail.com>,  Richard Kenner <kenner@vlsi1.ultra.nyu.edu>,  GCC Patches <gcc-patches@gcc.gnu.org>,  nd <nd@arm.com>,  Andrew Pinski <pinskia@gmail.com>, richard.sandiford@linaro.org
Cc: Richard Biener <richard.guenther@gmail.com>,  Richard Kenner <kenner@vlsi1.ultra.nyu.edu>,  GCC Patches <gcc-patches@gcc.gnu.org>,  nd <nd@arm.com>,  Andrew Pinski <pinskia@gmail.com>
Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
References: <HE1PR0802MB2377266815DA3FBD9F6EF21D95B50@HE1PR0802MB2377.eurprd08.prod.outlook.com>	<CA+=Sn1mV0mYmmY63OhvGv3LaP64oUt_vM1wFrTQ957JceG+jTA@mail.gmail.com>	<20170807210159.86DD633CA8@vlsi1.gnat.com>	<HE1PR0802MB2377C331B4FC7BDA5C2C429795B50@HE1PR0802MB2377.eurprd08.prod.outlook.com>	<20170808015616.1386833CA8@vlsi1.gnat.com>	<HE1PR0802MB237709FE48AFAC7C1DA43361958A0@HE1PR0802MB2377.eurprd08.prod.outlook.com>	<20170808121311.5B7D033CAD@vlsi1.gnat.com>	<HE1PR0802MB2377EFC4CDECEAA44AF20F26958A0@HE1PR0802MB2377.eurprd08.prod.outlook.com>	<20170808195158.352B333ED7@vlsi1.gnat.com>	<HE1PR0802MB2377DA25B2D643A902227E32958A0@HE1PR0802MB2377.eurprd08.prod.outlook.com>	<20170808200402.3B4BE33EE9@vlsi1.gnat.com>	<20170808202024.8AC9533F0E@vlsi1.gnat.com> <87valgddb2.fsf@linaro.org>	<E171E75C-6C49-46BF-8B91-626FEAA0E7EF@gmail.com>	<87r2w4cb6w.fsf@linaro.org>	<CAFiYyc03jHmdOKEOq-5W-OShLwTW9pFsj6nA7rLB2Tb5wDau3w@mail.gmail.com>	<87mv6sc98y.fsf@linaro.org>	<HE1PR0802MB2377FA167C45D2F5A3EC18DC95970@HE1PR0802MB2377.eurprd08.prod.outlook.com>	<87bmmot8wu.fsf@linaro.org>	<HE1PR0802MB237763E2263323F17908C03295970@HE1PR0802MB2377.eurprd08.prod.outlook.com>	<8760cvu5to.fsf@linaro.org>
Date: Wed, 06 Sep 2017 16:56:00 -0000
In-Reply-To: <8760cvu5to.fsf@linaro.org> (Richard Sandiford's message of "Wed,	06 Sep 2017 17:53:23 +0100")
Message-ID: <87tw0fsr4y.fsf@linaro.org>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/25.2 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-SW-Source: 2017-09/txt/msg00397.txt.bz2

Richard Sandiford <richard.sandiford@linaro.org> writes:
> Michael Collison <Michael.Collison@arm.com> writes:
>> Richard,
>>
>> The problem with this approach for Aarch64 is that
>> TARGET_SHIFT_TRUNCATION_MASK is based on SHIFT_COUNT_TRUNCATED which is
>> normally 0 as it based on the TARGET_SIMD flag.
>
> Maybe I'm wrong, but that seems like a missed optimisation in itself.
> Like you say, the definition is:
>
>   static unsigned HOST_WIDE_INT
>   aarch64_shift_truncation_mask (machine_mode mode)
>   {
>     return
>       (!SHIFT_COUNT_TRUNCATED
>        || aarch64_vector_mode_supported_p (mode)
>        || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE
>        || (mode) - 1);
>   }

er,

aarch64_shift_truncation_mask (machine_mode mode)
{
  return
    (!SHIFT_COUNT_TRUNCATED
     || aarch64_vector_mode_supported_p (mode)
     || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1);
}

> SHIFT_COUNT_TRUNCATED is:
>
>   #define SHIFT_COUNT_TRUNCATED (!TARGET_SIMD)
>
> and aarch64_vector_mode_supported_p always returns false for
> !TARGET_SIMD:
>
>   static bool
>   aarch64_vector_mode_supported_p (machine_mode mode)
>   {
>     if (TARGET_SIMD
>         && (mode == V4SImode  || mode == V8HImode
>             || mode == V16QImode || mode == V2DImode
>             || mode == V2SImode  || mode == V4HImode
>             || mode == V8QImode || mode == V2SFmode
>             || mode == V4SFmode || mode == V2DFmode
>             || mode == V4HFmode || mode == V8HFmode
>             || mode == V1DFmode))
>       return true;
>
>     return false;
>   }
>
> So when does the second || condition fire?
>
> I'm surprised the aarch64_vect_struct_mode_p part is needed, since
> this hook describes the shift optabs, and AArch64 don't provide any
> shift optabs for OI, CI or XI.
>
> Thanks,
> Richard
>
>> -----Original Message-----
>> From: Richard Sandiford [mailto:richard.sandiford@linaro.org] 
>> Sent: Wednesday, September 6, 2017 11:32 AM
>> To: Michael Collison <Michael.Collison@arm.com>
>> Cc: Richard Biener <richard.guenther@gmail.com>; Richard Kenner
>> <kenner@vlsi1.ultra.nyu.edu>; GCC Patches <gcc-patches@gcc.gnu.org>; nd
>> <nd@arm.com>; Andrew Pinski <pinskia@gmail.com>
>> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>>
>> Michael Collison <Michael.Collison@arm.com> writes:
>>> Richard Sandiford do you have any objections to the patch as it stands?
>>> It doesn't appear as if anything is going to change in the mid-end 
>>> anytime soon.
>>
>> I think one of the suggestions was to do it in expand, taking
>> advantage of range info and TARGET_SHIFT_TRUNCATION_MASK.  This would
>> be like the current FMA_EXPR handling in expand_expr_real_2.
>>
>> I know there was talk about cleaner approaches, but at least doing the
>> above seems cleaner than doing in the backend.  It should also be a
>> nicely-contained piece of work.
>>
>> Thanks,
>> Richard
>>
>>> -----Original Message-----
>>> From: Richard Sandiford [mailto:richard.sandiford@linaro.org]
>>> Sent: Tuesday, August 22, 2017 9:11 AM
>>> To: Richard Biener <richard.guenther@gmail.com>
>>> Cc: Richard Kenner <kenner@vlsi1.ultra.nyu.edu>; Michael Collison 
>>> <Michael.Collison@arm.com>; GCC Patches <gcc-patches@gcc.gnu.org>; nd 
>>> <nd@arm.com>; Andrew Pinski <pinskia@gmail.com>
>>> Subject: Re: [PATCH] [Aarch64] Optimize subtract in shift counts
>>>
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Tue, Aug 22, 2017 at 9:29 AM, Richard Sandiford 
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>> On August 21, 2017 7:46:09 PM GMT+02:00, Richard Sandiford 
>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>>Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>>> On Tue, Aug 8, 2017 at 10:20 PM, Richard Kenner 
>>>>>>>> <kenner@vlsi1.ultra.nyu.edu> wrote:
>>>>>>>>>> Correct. It is truncated for integer shift, but not simd shift 
>>>>>>>>>> instructions. We generate a pattern in the split that only
>>>>>>>generates
>>>>>>>>>> the integer shift instructions.
>>>>>>>>>
>>>>>>>>> That's unfortunate, because it would be nice to do this in
>>>>>>>simplify_rtx,
>>>>>>>>> since it's machine-independent, but that has to be conditioned 
>>>>>>>>> on SHIFT_COUNT_TRUNCATED, so you wouldn't get the benefit of it.
>>>>>>>>
>>>>>>>> SHIFT_COUNT_TRUNCATED should go ... you should express this in 
>>>>>>>> the patterns, like for example with
>>>>>>>>
>>>>>>>> (define_insn ashlSI3
>>>>>>>>   [(set (match_operand 0 "")
>>>>>>>>          (ashl:SI (match_operand ... )
>>>>>>>>                      (subreg:QI (match_operand:SI ...)))]
>>>>>>>>
>>>>>>>> or an explicit and:SI and combine / simplify_rtx should apply the
>>>>>>>magic
>>>>>>>> optimization we expect.
>>>>>>>
>>>>>>>The problem with the explicit AND is that you'd end up with either 
>>>>>>>an AND of two constants for constant shifts, or with two separate 
>>>>>>>patterns, one for constant shifts and one for variable shifts.  
>>>>>>>(And the problem in theory with two patterns is that it reduces the 
>>>>>>>RA's freedom, although in practice I guess we'd always want a 
>>>>>>>constant shift where possible for cost reasons, and so the RA would 
>>>>>>>never need to replace pseudos with constants itself.)
>>>>>>>
>>>>>>>I think all useful instances of this optimisation will be exposed 
>>>>>>>by the gimple optimisers, so maybe expand could to do it based on 
>>>>>>>TARGET_SHIFT_TRUNCATION_MASK?  That describes the optab rather than 
>>>>>>>the rtx code and it does take the mode into account.
>>>>>>
>>>>>> Sure, that could work as well and also take into account range info. 
>>>>>> But we'd then need named expanders and the result would still have 
>>>>>> the explicit and or need to be an unspec or a different RTL operation.
>>>>>
>>>>> Without SHIFT_COUNT_TRUNCATED, out-of-range rtl shifts have 
>>>>> target-dependent rather than undefined behaviour, so it's OK for a 
>>>>> target to use shift codes with out-of-range values.
>>>>
>>>> Hmm, but that means simplify-rtx can't do anything with them because 
>>>> we need to preserve target dependent behavior.
>>>
>>> Yeah, it needs to punt.  In practice that shouldn't matter much.
>>>
>>>> I think the RTL IL should be always well-defined and its semantics 
>>>> shouldn't have any target dependences (ideally, and if, then they 
>>>> should be well specified via extra target hooks/macros).
>>>
>>> That would be nice :-) I think the problem has traditionally been that
>>>> shifts can be used in quite a few define_insn patterns besides those 
>>>> for shift instructions.  So if your target defines shifts to have 
>>>> 256-bit precision (say) then you need to make sure that every 
>>>> define_insn with a shift rtx will honour that.
>>>
>>> It's more natural for target guarantees to apply to instructions than 
>>> to
>>>> rtx codes.
>>>
>>>>>  And
>>>>> TARGET_SHIFT_TRUNCATION_MASK is a guarantee from the target about 
>>>>> how the normal shift optabs behave, so I don't think we'd need new 
>>>>> optabs or new unspecs.
>>>>>
>>>>> E.g. it already works this way when expanding double-word shifts, 
>>>>> which IIRC is why TARGET_SHIFT_TRUNCATION_MASK was added.  There 
>>>>> it's possible to use a shorter sequence if you know that the shift 
>>>>> optab truncates the count, so we can do that even if 
>>>>> SHIFT_COUNT_TRUNCATED isn't defined.
>>>>
>>>> I'm somewhat confused by docs saying TARGET_SHIFT_TRUNCATION_MASK 
>>>> applies to the instructions generated by the named shift patterns but 
>>>> _not_ general shift RTXen.  But the generated pattern contains shift 
>>>> RTXen and how can we figure whether they were generated by the named 
>>>> expanders or by other means?  Don't define_expand also serve as 
>>>> define_insn for things like combine?
>>>
>>> Yeah, you can't (and aren't supposed to try to) reverse-engineer the
>>>> expander from the generated instructions.
>>>> TARGET_SHIFT_TRUNCATION_MASK should only be used if you're expanding 
>>>> a shift optab.
>>>
>>>> That said, from looking at the docs and your description above it 
>>>> seems that semantics are not fully reflected in the RTL instruction stream?
>>>
>>> Yeah, the semantics go from being well-defined in the optab interface
>>> to being target-dependent in the rtl stream.
>>>
>>> Thanks,
>>> Richard
>>>
>>>>
>>>> Richard.
>>>>
>>>>> Thanks,
>>>>> Richard
>>>>>
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>>Thanks,
>>>>>>>Richard