From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 84548 invoked by alias); 17 Feb 2016 10:05:54 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 84525 invoked by uid 89); 17 Feb 2016 10:05:53 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.9 required=5.0 tests=BAYES_00,KAM_LAZY_DOMAIN_SECURITY,RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=costing, H*u:31.2.0, H*UA:31.2.0, fourth X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 17 Feb 2016 10:05:47 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4EEB83A1; Wed, 17 Feb 2016 02:04:55 -0800 (PST) Received: from [10.2.206.200] (e100706-lin.cambridge.arm.com [10.2.206.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5BAD93F21A; Wed, 17 Feb 2016 02:05:44 -0800 (PST) Message-ID: <56C445F6.6040004@foss.arm.com> Date: Wed, 17 Feb 2016 10:05:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: Christophe Lyon CC: Ramana Radhakrishnan , Jim Wilson , "gcc-patches@gcc.gnu.org" Subject: Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE References: <56C1B74D.4070009@foss.arm.com> In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit X-SW-Source: 2016-02/txt/msg01147.txt.bz2 On 17/02/16 10:03, Christophe Lyon wrote: > On 15 February 2016 at 12:32, Kyrill Tkachov > wrote: >> On 04/02/16 08:58, Ramana Radhakrishnan wrote: >>> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson wrote: >>>> This is my suggested fix for PR 65932, which is a linux kernel >>>> miscompile with gcc-5.1. >>>> >>>> The problem here is caused by a chain of events. The first is that >>>> the relatively new eipa_sra pass creates fake parameters that behave >>>> slightly differently than normal parameters. The second is that the >>>> optimizer creates phi nodes that copy local variables to fake >>>> parameters and/or vice versa. The third is that the ouf-of-ssa pass >>>> assumes that it can emit simple move instructions for these phi nodes. >>>> And the fourth is that the ARM port has a PROMOTE_MODE macro that >>>> forces QImode and HImode to unsigned, but a >>>> TARGET_PROMOTE_FUNCTION_MODE hook that does not. So signed char and >>>> short parameters have different in register representations than local >>>> variables, and require a conversion when copying between them, a >>>> conversion that the out-of-ssa pass can't easily emit. >>>> >>>> Ultimately, I think this is a problem in the arm backend. It should >>>> not have a PROMOTE_MODE macro that is changing the sign of char and >>>> short local variables. I also think that we should merge the >>>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to >>>> prevent this from happening again. >>>> >>>> I see four general problems with the current ARM PROMOTE_MODE definition. >>>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb >>>> instruction was added. It is a lose for armv6 and later. >>>> 2) Unsigned short was only faster for targets that don't support >>>> unaligned accesses. Support for these targets was removed a while >>>> ago, and this PROMODE_MODE hunk should have been removed at the same >>>> time. It was accidentally left behind. >>>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was >>>> converted to a function, the PROMOTE_MODE code was copied without the >>>> UNSIGNEDP changes. Thus it is only an accident that >>>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree. Changing >>>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE >>>> changes to resolve the difference are safe. >>>> 4) There is a general principle that you should only change signedness >>>> in PROMOTE_MODE if the hardware forces it, as otherwise this results >>>> in extra conversion instructions that make code slower. The mips64 >>>> hardware for instance requires that 32-bit values be sign-extended >>>> regardless of type, and instructions may trap if this is not true. >>>> However, it has a set of 32-bit instructions that operate on these >>>> values, and hence no conversions are required. There is no similar >>>> case on ARM. Thus the conversions are unnecessary and unwise. This >>>> can be seen in the testcases where gcc emits both a zero-extend and a >>>> sign-extend inside a loop, as the sign-extend is required for a >>>> compare, and the zero-extend is required by PROMOTE_MODE. >>> Given Kyrill's testing with the patch and the reasonably detailed >>> check of the effects of code generation changes - The arm.h hunk is ok >>> - I do think we should make this explicit in the documentation that >>> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and >>> better still maybe put in a checking assert for the same in the >>> mid-end but that could be the subject of a follow-up patch. >>> >>> Ok to apply just the arm.h hunk as I think Kyrill has taken care of >>> the testsuite fallout separately. >> Hi all, >> >> I'd like to backport the arm.h from this ( r233130) to the GCC 5 >> branch. As the CSE patch from my series had some fallout on x86_64 >> due to a deficiency in the AVX patterns that is too invasive to fix >> at this stage (and presumably backport), I'd like to just backport >> this arm.h fix and adjust the tests to XFAIL the fallout that comes >> with not applying the CSE patch. The attached patch does that. >> >> The code quality fallout on code outside the testsuite is not >> that gread. The SPEC benchmarks are not affected by not applying >> the CSE change, and only a single sequence in a popular embedded benchmark >> shows some degradation for -mtune=cortex-a9 in the same way as the >> wmul-1.c and wmul-2.c tests. >> >> I think that's a fair tradeoff for fixing the wrong code bug on that branch. >> >> Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch? >> >> Thanks, >> Kyrill >> >> 2016-02-15 Kyrylo Tkachov >> >> PR target/65932 >> * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options. >> xfail the scan-assembler test. >> * gcc.target/arm/wmul-2.c: Likewise. >> * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb. >> >> > Hi Kyrill, > > I've noticed that wmul-3 still fails on the gcc-5 branch when forcing GCC > configuration to: > --with-cpu cortex-a5 --with-fpu vfpv3-d16-fp16 > (target arm-none-linux-gnueabihf) > > The generated code is: > sxth r0, r0 > sxth r1, r1 > mul r0, r1, r0 > instead of > smulbb r0, r1, r0 > on trunk. > > I guess we don't worry? Hi Christophe, Hmmm, I suspect we might want to backport https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01714.html to fix backend the costing logic of smulbb. Could you please try that patch to see if it helps? Thanks, Kyrill >> >>> regards >>> Ramana >>> >>> >>> >>> >>>> My change was tested with an arm bootstrap, make check, and SPEC >>>> CPU2000 run. The original poster verified that this gives a linux >>>> kernel that boots correctly. >>>> >>>> The PRMOTE_MODE change causes 3 testsuite testcases to fail. These >>>> are tests to verify that smulbb and/or smlabb are generated. >>>> Eliminating the unnecessary sign conversions causes us to get better >>>> code that doesn't include the smulbb and smlabb instructions. I had >>>> to modify the testcases to get them to emit the desired instructions. >>>> With the testcase changes there are no additional testsuite failures, >>>> though I'm concerned that these testcases with the changes may be >>>> fragile, and future changes may break them again. >>> >>> >>>> If there are ARM parts where smulbb/smlabb are faster than mul/mla, >>>> then maybe we should try to add new patterns to get the instructions >>>> emitted again for the unmodified testcases. >>>> >>>> Jim >>