From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 36438 invoked by alias); 16 Feb 2016 10:44:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 36426 invoked by uid 89); 16 Feb 2016 10:44:33 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.1 required=5.0 tests=AWL,BAYES_50,FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS autolearn=ham version=3.3.2 spammy=fourth, Eliminating, poster, eipa_sra X-HELO: mail-wm0-f50.google.com Received: from mail-wm0-f50.google.com (HELO mail-wm0-f50.google.com) (74.125.82.50) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES128-GCM-SHA256 encrypted) ESMTPS; Tue, 16 Feb 2016 10:44:32 +0000 Received: by mail-wm0-f50.google.com with SMTP id g62so185138598wme.0 for ; Tue, 16 Feb 2016 02:44:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=Jw1ud0yF6zSzcjDw1T2UsY8JcxeXyJgfYXglV4fhfC8=; b=JclXs4KvN2wBhPuAVInFTiMuOZMD4s2oDqngJS89GwWMOJjXLUa5AH1vwndHracCbK gXXys4I+PD1oKY3pUCGg5LSl2WtvSSW/ZrAMUaJyZgLXgDYZT0vnNWNIS4R8jNwGvAbN WQdV04oQe7fIYDlpDAj6CP5hzBpQ7gienRUqRGo9OxJ7fm8Ng38wL4pYaezUV3zc1i6O M9mY+QIVrhpyUea3E+5WFKqmJSC7bMl6Ydq6pmnXNzxOZPZCOK7XvIszxXL2i8mQlxWM 6Qy5g4aGIqYALJVt8SI3sz06iW21i0MBrQT91X0PPerEf/B58mgs8OelMnET7DeDCjDW wdig== X-Gm-Message-State: AG10YOQODUT7wNmFptuKmp+CuXrFxuS26eGlP7Htr84+51gEv12C7f7iwCrgkhKPRneBCrrupuBlMUhq+zLhyA== MIME-Version: 1.0 X-Received: by 10.28.46.82 with SMTP id u79mr19273487wmu.67.1455619469117; Tue, 16 Feb 2016 02:44:29 -0800 (PST) Received: by 10.27.212.200 with HTTP; Tue, 16 Feb 2016 02:44:29 -0800 (PST) In-Reply-To: <56C1B74D.4070009@foss.arm.com> References: <56C1B74D.4070009@foss.arm.com> Date: Tue, 16 Feb 2016 10:44:00 -0000 Message-ID: Subject: Re: [PATCH, ARM] stop changing signedness in PROMOTE_MODE From: Ramana Radhakrishnan To: Kyrill Tkachov Cc: Jim Wilson , "gcc-patches@gcc.gnu.org" Content-Type: text/plain; charset=UTF-8 X-IsSubscribed: yes X-SW-Source: 2016-02/txt/msg01061.txt.bz2 On Mon, Feb 15, 2016 at 11:32 AM, Kyrill Tkachov wrote: > > On 04/02/16 08:58, Ramana Radhakrishnan wrote: >> >> On Tue, Jun 30, 2015 at 2:15 AM, Jim Wilson wrote: >>> >>> This is my suggested fix for PR 65932, which is a linux kernel >>> miscompile with gcc-5.1. >>> >>> The problem here is caused by a chain of events. The first is that >>> the relatively new eipa_sra pass creates fake parameters that behave >>> slightly differently than normal parameters. The second is that the >>> optimizer creates phi nodes that copy local variables to fake >>> parameters and/or vice versa. The third is that the ouf-of-ssa pass >>> assumes that it can emit simple move instructions for these phi nodes. >>> And the fourth is that the ARM port has a PROMOTE_MODE macro that >>> forces QImode and HImode to unsigned, but a >>> TARGET_PROMOTE_FUNCTION_MODE hook that does not. So signed char and >>> short parameters have different in register representations than local >>> variables, and require a conversion when copying between them, a >>> conversion that the out-of-ssa pass can't easily emit. >>> >>> Ultimately, I think this is a problem in the arm backend. It should >>> not have a PROMOTE_MODE macro that is changing the sign of char and >>> short local variables. I also think that we should merge the >>> PROMOTE_MODE macro with the TARGET_PROMOTE_FUNCTION_MODE hook to >>> prevent this from happening again. >>> >>> I see four general problems with the current ARM PROMOTE_MODE definition. >>> 1) Unsigned char is only faster for armv5 and earlier, before the sxtb >>> instruction was added. It is a lose for armv6 and later. >>> 2) Unsigned short was only faster for targets that don't support >>> unaligned accesses. Support for these targets was removed a while >>> ago, and this PROMODE_MODE hunk should have been removed at the same >>> time. It was accidentally left behind. >>> 3) TARGET_PROMOTE_FUNCTION_MODE used to be a boolean hook, when it was >>> converted to a function, the PROMOTE_MODE code was copied without the >>> UNSIGNEDP changes. Thus it is only an accident that >>> TARGET_PROMOTE_FUNCTION_MODE and PROMOTE_MODE disagree. Changing >>> TARGET_PROMOTE_FUNCTION_MODE is an ABI change, so only PROMOTE_MODE >>> changes to resolve the difference are safe. >>> 4) There is a general principle that you should only change signedness >>> in PROMOTE_MODE if the hardware forces it, as otherwise this results >>> in extra conversion instructions that make code slower. The mips64 >>> hardware for instance requires that 32-bit values be sign-extended >>> regardless of type, and instructions may trap if this is not true. >>> However, it has a set of 32-bit instructions that operate on these >>> values, and hence no conversions are required. There is no similar >>> case on ARM. Thus the conversions are unnecessary and unwise. This >>> can be seen in the testcases where gcc emits both a zero-extend and a >>> sign-extend inside a loop, as the sign-extend is required for a >>> compare, and the zero-extend is required by PROMOTE_MODE. >> >> Given Kyrill's testing with the patch and the reasonably detailed >> check of the effects of code generation changes - The arm.h hunk is ok >> - I do think we should make this explicit in the documentation that >> TARGET_PROMOTE_MODE and TARGET_PROMOTE_FUNCTION_MODE should agree and >> better still maybe put in a checking assert for the same in the >> mid-end but that could be the subject of a follow-up patch. >> >> Ok to apply just the arm.h hunk as I think Kyrill has taken care of >> the testsuite fallout separately. > > Hi all, > > I'd like to backport the arm.h from this ( r233130) to the GCC 5 > branch. As the CSE patch from my series had some fallout on x86_64 > due to a deficiency in the AVX patterns that is too invasive to fix > at this stage (and presumably backport), I'd like to just backport > this arm.h fix and adjust the tests to XFAIL the fallout that comes > with not applying the CSE patch. The attached patch does that. I would hope that someone looks to fix the AVX port for GCC 7 - as the CSE patch is something that is useful on ARM for performance in real terms and could have benefits elsewhere. OK to apply if no regressions - thanks for pushing this through. Thanks, Ramana > > The code quality fallout on code outside the testsuite is not > that gread. The SPEC benchmarks are not affected by not applying > the CSE change, and only a single sequence in a popular embedded benchmark > shows some degradation for -mtune=cortex-a9 in the same way as the > wmul-1.c and wmul-2.c tests. > > I think that's a fair tradeoff for fixing the wrong code bug on that branch. > > Ok to backport r233130 and the attached testsuite patch to the GCC 5 branch? > > Thanks, > Kyrill > > 2016-02-15 Kyrylo Tkachov > > PR target/65932 > * gcc.target/arm/wmul-1.c: Add -mtune=cortex-a9 to dg-options. > xfail the scan-assembler test. > * gcc.target/arm/wmul-2.c: Likewise. > * gcc.target/arm/wmul-3.c: Simplify test to generate a single smulbb. > > > > >> >> regards >> Ramana >> >> >> >> >>> My change was tested with an arm bootstrap, make check, and SPEC >>> CPU2000 run. The original poster verified that this gives a linux >>> kernel that boots correctly. >>> >>> The PRMOTE_MODE change causes 3 testsuite testcases to fail. These >>> are tests to verify that smulbb and/or smlabb are generated. >>> Eliminating the unnecessary sign conversions causes us to get better >>> code that doesn't include the smulbb and smlabb instructions. I had >>> to modify the testcases to get them to emit the desired instructions. >>> With the testcase changes there are no additional testsuite failures, >>> though I'm concerned that these testcases with the changes may be >>> fragile, and future changes may break them again. >> >> >> >>> If there are ARM parts where smulbb/smlabb are faster than mul/mla, >>> then maybe we should try to add new patterns to get the instructions >>> emitted again for the unmodified testcases. >>> >>> Jim > >