From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-421596-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 84548 invoked by alias); 17 Feb 2016 10:05:54 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 84525 invoked by uid 89); 17 Feb 2016 10:05:53 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=costing, H*u:31.2.0, H*UA:31.2.0, fourth
X-HELO: foss.arm.com
Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 17 Feb 2016 10:05:47 +0000
Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249])	by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4EEB83A1;	Wed, 17 Feb 2016 02:04:55 -0800 (PST)
Received: from [10.2.206.200] (e100706-lin.cambridge.arm.com [10.2.206.200])	by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5BAD93F21A;	Wed, 17 Feb 2016 02:05:44 -0800 (PST)
Message-ID: <56C445F6.6040004@foss.arm.com>
Date: Wed, 17 Feb 2016 10:05:00 -0000
From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0
MIME-Version: 1.0
To: Christophe Lyon <christophe.lyon@linaro.org>
CC: Ramana Radhakrishnan <ramana.gcc@googlemail.com>,  Jim Wilson <jim.wilson@linaro.org>, "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE
References: <CABXYE2VXjy7=5Y=c1TCxLE8KuwLtwBYBhTB24xrWDvWAeiBwbQ@mail.gmail.com>	<CAJA7tRbQ751-o+T+sx9Z0Uaq1YoEocR507y4KtYQmYurTht8AA@mail.gmail.com>	<56C1B74D.4070009@foss.arm.com> <CAKdteOaepm0zCPvPsT1E9TtjFxopmDyKHmWKwZcdayoY6qQ6KA@mail.gmail.com>
In-Reply-To: <CAKdteOaepm0zCPvPsT1E9TtjFxopmDyKHmWKwZcdayoY6qQ6KA@mail.gmail.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit
X-SW-Source: 2016-02/txt/msg01147.txt.bz2


On 17/02/16 10:03, Christophe Lyon wrote:
> On 15 February 2016 at 12:32, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
>> On 04/02/16 08:58, Ramana Radhakrishnan wrote:
>>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson <jim.wilson@linaro.org> wrote:
>>>> This is my suggested fix for PR 65932, which is a linux kernel
>>>> miscompile with gcc-5.1.
>>>>
>>>> The problem here is caused by a chain of events.  The first is that
>>>> the relatively new eipa_sra pass creates fake parameters that behave
>>>> slightly differently than normal parameters.  The second is that the
>>>> optimizer creates phi nodes that copy local variables to fake
>>>> parameters and/or vice versa.  The third is that the ouf-of-ssa pass
>>>> assumes that it can emit simple move instructions for these phi nodes.
>>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that
>>>> forces QImode and HImode to unsigned, but a
>>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not.  So signed char and
>>>> short parameters have different in register representations than local
>>>> variables, and require a conversion when copying between them, a
>>>> conversion that the out-of-ssa pass can't easily emit.
>>>>
>>>> Ultimately, I think this is a problem in the arm backend.  It should
>>>> not have a PROMOTE_MODE macro that is changing the sign of char and
>>>> short local variables.  I also think that we should merge the
>>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to
>>>> prevent this from happening again.
>>>>
>>>> I see four general problems with the current ARM PROMOTE_MODE definition.
>>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb
>>>> instruction was added.  It is a lose for armv6 and later.
>>>> 2) Unsigned short was only faster for targets that don't support
>>>> unaligned accesses.  Support for these targets was removed a while
>>>> ago, and this PROMODE_MODE hunk should have been removed at the same
>>>> time.  It was accidentally left behind.
>>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was
>>>> converted to a function, the PROMOTE_MODE code was copied without the
>>>> UNSIGNEDP changes.  Thus it is only an accident that
>>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree.  Changing
>>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE
>>>> changes to resolve the difference are safe.
>>>> 4) There is a general principle that you should only change signedness
>>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results
>>>> in extra conversion instructions that make code slower.  The mips64
>>>> hardware for instance requires that 32-bit values be sign-extended
>>>> regardless of type, and instructions may trap if this is not true.
>>>> However, it has a set of 32-bit instructions that operate on these
>>>> values, and hence no conversions are required.  There is no similar
>>>> case on ARM. Thus the conversions are unnecessary and unwise.  This
>>>> can be seen in the testcases where gcc emits both a zero-extend and a
>>>> sign-extend inside a loop, as the sign-extend is required for a
>>>> compare, and the zero-extend is required by PROMOTE_MODE.
>>> Given Kyrill's testing with the patch and the reasonably detailed
>>> check of the effects of code generation changes - The arm.h hunk is ok
>>> - I do think we should make this explicit in the documentation that
>>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and
>>> better still maybe put in a checking assert for the same in the
>>> mid-end but that could be the subject of a follow-up patch.
>>>
>>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of
>>> the testsuite fallout separately.
>> Hi all,
>>
>> I'd like to backport the arm.h from this ( r233130) to the GCC 5
>> branch. As the CSE patch from my series had some fallout on x86_64
>> due to a deficiency in the AVX patterns that is too invasive to fix
>> at this stage (and presumably backport), I'd like to just backport
>> this arm.h fix and adjust the tests to XFAIL the fallout that comes
>> with not applying the CSE patch. The attached patch does that.
>>
>> The code quality fallout on code outside the testsuite is not
>> that gread. The SPEC benchmarks are not affected by not applying
>> the CSE change, and only a single sequence in a popular embedded benchmark
>> shows some degradation for -mtune=cortex-a9 in the same way as the
>> wmul-1.c and wmul-2.c tests.
>>
>> I think that's a fair tradeoff for fixing the wrong code bug on that branch.
>>
>> Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch?
>>
>> Thanks,
>> Kyrill
>>
>> 2016-02-15  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>      PR target/65932
>>      * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options.
>>      xfail the scan-assembler test.
>>      * gcc.target/arm/wmul-2.c: Likewise.
>>      * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb.
>>
>>
> Hi Kyrill,
>
> I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC
> configuration to:
> --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16
> (target arm-none-linux-gnueabihf)
>
> The generated code is:
>          sxth    r0, r0
>          sxth    r1, r1
>          mul     r0, r1, r0
> instead of
>          smulbb  r0, r1, r0
> on trunk.
>
> I guess we don't worry?

Hi Christophe,
Hmmm, I suspect we might want to backport https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html
to fix backend the costing logic of smulbb.
Could you please try that patch to see if it helps?

Thanks,
Kyrill

>>
>>> regards
>>> Ramana
>>>
>>>
>>>
>>>
>>>> My change was tested with an arm bootstrap, make check, and SPEC
>>>> CPU2000 run.  The original poster verified that this gives a linux
>>>> kernel that boots correctly.
>>>>
>>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail.  These
>>>> are tests to verify that smulbb and/or smlabb are generated.
>>>> Eliminating the unnecessary sign conversions causes us to get better
>>>> code that doesn't include the smulbb and smlabb instructions.  I had
>>>> to modify the testcases to get them to emit the desired instructions.
>>>> With the testcase changes there are no additional testsuite failures,
>>>> though I'm concerned that these testcases with the changes may be
>>>> fragile, and future changes may break them again.
>>>
>>>
>>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla,
>>>> then maybe we should try to add new patterns to get the instructions
>>>> emitted again for the unmodified testcases.
>>>>
>>>> Jim
>>