From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 61211 invoked by alias); 14 Dec 2015 12:25:21 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 61194 invoked by uid 89); 14 Dec 2015 12:25:20 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=0.8 required=5.0 tests=AWL,BAYES_50,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 14 Dec 2015 12:25:19 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 93BCD49; Mon, 14 Dec 2015 04:24:49 -0800 (PST) Received: from [10.2.206.200] (e100706-lin.cambridge.arm.com [10.2.206.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B54A13F21A; Mon, 14 Dec 2015 04:25:12 -0800 (PST) Message-ID: <566EB527.6070305@foss.arm.com> Date: Mon, 14 Dec 2015 12:25:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Segher Boessenkool Subject: [PATCH][combine] PR rtl-optimization/68651 Try changing rtx from (r + r) to (r << 1) to aid recognition Content-Type: multipart/mixed; boundary="------------080305030701010002080706" X-SW-Source: 2015-12/txt/msg01386.txt.bz2 This is a multi-part message in MIME format. --------------080305030701010002080706 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-length: 2999 Hi all, For this PR I want to teach combine to deal with unrecognisable patterns that contain a sub-expression like (x + x) by transforming it into (x << 1) and trying to match the result. This is because some instruction sets like arm and aarch64 can combine shifts with other arithmetic operations or have shifts in their RTL representation of more complex operations (like the aarch64 UBFIZ instruction which can be expressed as a zero_extend+ashift pattern). Due to a change in rtx costs for -mcpu=cortex-a53 in GCC 5 we no longer expand an expression like x * 2 as x << 1 but rather as x + x, which hurts combination opportunities dues to this deficiency. This patch addresses the issue in the recog_for_combine function in combine.c in a similar way to the change_zero_ext trick. That is, if it recog_for_combine fails to match a pattern it replaces all instances of x + x in the rtx with x << 1 and tries again. This way I've been able to get combine to more aggressively generate the arithmetic+shift forms of instructions for -mcpu=cortex-a53 on aarch64 as well as instructions like ubfiz and sbfiz that contain shift-by-immediate sub-expressions. This patch shouldn't affect rtxes that already match, so it should have no fallout on other cases. Bootstrapped and tested on arm, aarch64, x86_64. For the testcase: int foo (int x) { return (x * 2) & 65535; } int bar (int x, int y) { return (x * 2) | y; } with -O2 -mcpu=cortex-a53 for aarch64 we now generate: foo: ubfiz w0, w0, 1, 15 ret bar: orr w0, w1, w0, lsl 1 ret whereas without this patch we generate: foo: add w0, w0, w0 uxth w0, w0 ret bar: add w0, w0, w0 orr w0, w0, w1 ret PR 68651 is a code quality regression for GCC 5 and GCC 6 that was introduced due to updated rtx costs for -mcpu=cortex-a53 that affected expansion. The costs changes were correct (to the extent that rtx costs have any meaning) and I think this is a deficiency in combine that should be fixed. I wouldn't propose to backport this to GCC 5. P.S. For the ubfiz testcase above to combine successfully it needs an aarch64 rtx costs issue to be resolved that I proposed a fix for in https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00526.html. Otherwise the backend wrongly rejects it on the grounds of wrong costs. Is this ok for trunk once the costs issue at https://gcc.gnu.org/ml/gcc-patches/2015-12/msg00526.html gets resolved? Thanks, Kyrill 2015-12-14 Kyrylo Tkachov PR rtl-optimization/68651 * combine.c (change_shift_by_one): New function. (change_rtx_with_func): Likewise. (recog_for_combine): Use the above to transform reg + reg sub-expressions into reg << 1 within non-recognisable patterns and try to match the result. 2015-12-14 Kyrylo Tkachov PR rtl-optimization/68651 * gcc.target/aarch64/pr68651_1.c: New test. --------------080305030701010002080706 Content-Type: text/x-patch; name="combine-shift.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="combine-shift.patch" Content-length: 4022 diff --git a/gcc/combine.c b/gcc/combine.c index c008d2a786ebeaa7560acbd60c7c2e8cdacdc9aa..9465e5927145e768f1a5fc43ce7c3621033d8aef 100644 --- a/gcc/combine.c +++ b/gcc/combine.c @@ -11063,6 +11063,60 @@ change_zero_ext (rtx *src) return changed; } +/* Replace in SRC all sub-rtxes of the form x + x + with x << 1 to help recognition on targets with combined + shift operations. Return true iff such any such change was made. */ + +static bool +change_shift_by_one (rtx *src) +{ + bool changed = false; + + subrtx_ptr_iterator::array_type array; + FOR_EACH_SUBRTX_PTR (iter, array, src, NONCONST) + { + rtx x = **iter; + machine_mode mode = GET_MODE (x); + + if (GET_CODE (x) == PLUS && GET_MODE_CLASS (mode) == MODE_INT + && !side_effects_p (XEXP (x, 0)) + && rtx_equal_p (XEXP (x, 0), XEXP (x, 1))) + x = simplify_gen_binary (ASHIFT, mode, XEXP (x, 0), const1_rtx); + else + continue; + + SUBST (**iter, x); + changed = true; + } + + return changed; +} + +/* Modify the sources of all sets in PAT using the function FUNC that takes + a pointer to the rtx to modify and returns true iff it made any + modifications. Used by recog_for_combine to twiddle non-matching patterns + into something recognisable. */ + +static bool +change_rtx_with_func (rtx pat, bool (*func) (rtx *)) +{ + bool changed = false; + + if (GET_CODE (pat) == SET) + changed = func (&SET_SRC (pat)); + else if (GET_CODE (pat) == PARALLEL) + { + int i; + for (i = 0; i < XVECLEN (pat, 0); i++) + { + rtx set = XVECEXP (pat, 0, i); + if (GET_CODE (set) == SET) + changed |= func (&SET_SRC (set)); + } + } + return changed; +} + /* Like recog, but we receive the address of a pointer to a new pattern. We try to match the rtx that the pointer points to. If that fails, we may try to modify or replace the pattern, @@ -11073,6 +11127,9 @@ change_zero_ext (rtx *src) to the equivalent AND and perhaps LSHIFTRT patterns, and try with that (and undo if that fails). + If that still doesn't match we change expressions of the form + (PLUS reg1 reg1) into (ASHIFT reg1 (const_int 1)). Undo if that fails. + PNOTES is a pointer to a location where any REG_UNUSED notes added for the CLOBBERs are placed. @@ -11090,19 +11147,7 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) void *marker = get_undo_marker (); bool changed = false; - if (GET_CODE (pat) == SET) - changed = change_zero_ext (&SET_SRC (pat)); - else if (GET_CODE (pat) == PARALLEL) - { - int i; - for (i = 0; i < XVECLEN (pat, 0); i++) - { - rtx set = XVECEXP (pat, 0, i); - if (GET_CODE (set) == SET) - changed |= change_zero_ext (&SET_SRC (set)); - } - } - + changed = change_rtx_with_func (pat, change_zero_ext); if (changed) { insn_code_number = recog_for_combine_1 (pnewpat, insn, pnotes); @@ -11111,6 +11156,20 @@ recog_for_combine (rtx *pnewpat, rtx_insn *insn, rtx *pnotes) undo_to_marker (marker); } + if (insn_code_number < 0) + { + marker = get_undo_marker (); + changed = change_rtx_with_func (pat, change_shift_by_one); + + if (changed) + { + insn_code_number = recog_for_combine_1 (pnewpat, insn, pnotes); + + if (insn_code_number < 0) + undo_to_marker (marker); + } + + } return insn_code_number; } diff --git a/gcc/testsuite/gcc.target/aarch64/pr68651_1.c b/gcc/testsuite/gcc.target/aarch64/pr68651_1.c new file mode 100644 index 0000000000000000000000000000000000000000..ef9456f538776e7db01ecf5473425aed9efd9de2 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/pr68651_1.c @@ -0,0 +1,16 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -mcpu=cortex-a53" } */ + +int +foo (int x) +{ + return (x * 2) & 65535; +} +/* { dg-final { scan-assembler "ubfiz\tw\[0-9\]*, w\[0-9\]*.*\n" } } */ + +int +bar (int x, int y) +{ + return (x * 2) | y; +} +/* { dg-final { scan-assembler "orr\tw\[0-9\]*, w\[0-9\]*, w\[0-9\]*, lsl 1.*\n" } } */ --------------080305030701010002080706--