From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 117617 invoked by alias); 17 Dec 2015 15:36:46 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 117605 invoked by uid 89); 17 Dec 2015 15:36:45 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.3 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,KAM_LOTSOFHASH,T_RP_MATCHES_RCVD autolearn=no version=3.3.2 spammy=H*u:31.2.0, representing, H*UA:31.2.0, UD:md.texi X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.101.70) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 17 Dec 2015 15:36:44 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DBDB8483; Thu, 17 Dec 2015 07:36:17 -0800 (PST) Received: from [10.2.206.200] (e100706-lin.cambridge.arm.com [10.2.206.200]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CBF6B3F459; Thu, 17 Dec 2015 07:36:41 -0800 (PST) Message-ID: <5672D688.7010403@foss.arm.com> Date: Thu, 17 Dec 2015 15:36:00 -0000 From: Kyrill Tkachov User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 MIME-Version: 1.0 To: GCC Patches CC: Marcus Shawcroft , Richard Earnshaw , James Greenhalgh Subject: [PATCH][AArch64][1/2] PR rtl-optimization/68796 Add compare-of-zero_extract pattern Content-Type: multipart/mixed; boundary="------------000900060709070501040100" X-SW-Source: 2015-12/txt/msg01778.txt.bz2 This is a multi-part message in MIME format. --------------000900060709070501040100 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-length: 2476 Hi all, In this PR I'm trying to increase the use of the aarch64 instruction TST that performs a bitwise AND with a bitmask and compares the result with zero. GCC has many ways of representing these operations in RTL. Depending on the mask, the target and the context it might be an AND-immediate, a ZERO_EXTRACT or a ZERO_EXTEND of a subreg. aarch64.md already contains a pattern for the compare with and-immediate case, which is the most general form of this, but it doesn't match in many common cases The documentation on canonicalization in md.texi says: "Equality comparisons of a group of bits (usually a single bit) with zero will be written using @code{zero_extract} rather than the equivalent @code{and} or @code{sign_extract} operations. " This means that we should define a compare with a zero-extract pattern in aarch64, which is what this patch does. It's fairly simple: it constructs the TST mask from the operands of the zero_extract and updates the SELECT_CC_MODE implementation to assign the correct CC_NZ mode to such comparisons. Note that this is valid only for equality comparisons against zero. So for the testcase: int f1 (int x) { if (x & 1) return 1; return x; } we now generate: f1: tst x0, 1 csinc w0, w0, wzr, eq ret instead of the previous: f1: and w1, w0, 1 cmp w1, wzr csinc w0, w0, wzr, eq ret and for the testcase: int f2 (long x) { return ((short) x >= 0) ? x : 0; } we now generate: f2: tst x0, 32768 csel x0, x0, xzr, eq ret instead of: f2: sxth w1, w0 cmp w1, wzr csel x0, x0, xzr, ge ret i.e. we test the sign bit rather than perform the full comparison with zero. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2015-12-17 Kyrylo Tkachov PR rtl-optimization/68796 * config/aarch64/aarch64.md (*and3nr_compare0_zextract): New pattern. * config/aarch64/aarch64.c (aarch64_select_cc_mode): Handle ZERO_EXTRACT comparison with zero. (aarch64_mask_from_zextract_ops): New function. * config/aarch64/aarch64-protos.h (aarch64_mask_from_zextract_ops): New prototype. 2015-12-17 Kyrylo Tkachov PR rtl-optimization/68796 * gcc.target/aarch64/tst_3.c: New test. * gcc.target/aarch64/tst_4.c: Likewise. --------------000900060709070501040100 Content-Type: text/x-patch; name="aarch64-cmp-zextract.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="aarch64-cmp-zextract.patch" Content-length: 4138 diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 87d6eb1358845527d7068550925949802a7e48e2..febca98d38d5f09c97b0f79adc55bb29eca217b9 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -330,6 +330,7 @@ int aarch64_uxt_size (int, HOST_WIDE_INT); int aarch64_vec_fpconst_pow_of_2 (rtx); rtx aarch64_final_eh_return_addr (void); rtx aarch64_legitimize_reload_address (rtx *, machine_mode, int, int, int); +rtx aarch64_mask_from_zextract_ops (rtx, rtx); const char *aarch64_output_move_struct (rtx *operands); rtx aarch64_return_addr (int, rtx); rtx aarch64_simd_gen_const_vector_dup (machine_mode, int); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index cb8955d5d6c909e8179bb1ab8203eb165f55e4b6..58a9fc68f391162ed9847d7fb79d70d3ee9919f5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4147,7 +4147,9 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y) && y == const0_rtx && (code == EQ || code == NE || code == LT || code == GE) && (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND - || GET_CODE (x) == NEG)) + || GET_CODE (x) == NEG + || (GET_CODE (x) == ZERO_EXTRACT && CONST_INT_P (XEXP (x, 1)) + && CONST_INT_P (XEXP (x, 2))))) return CC_NZmode; /* A compare with a shifted operand. Because of canonicalization, @@ -10757,6 +10759,21 @@ aarch64_simd_imm_zero_p (rtx x, machine_mode mode) return x == CONST0_RTX (mode); } + +/* Return the bitmask CONST_INT to select the bits required by a zero extract + operation of width WIDTH at bit position POS. */ + +rtx +aarch64_mask_from_zextract_ops (rtx width, rtx pos) +{ + gcc_assert (CONST_INT_P (width)); + gcc_assert (CONST_INT_P (pos)); + + unsigned HOST_WIDE_INT mask + = ((unsigned HOST_WIDE_INT)1 << UINTVAL (width)) - 1; + return GEN_INT (mask << UINTVAL (pos)); +} + bool aarch64_simd_imm_scalar_p (rtx x, machine_mode mode ATTRIBUTE_UNUSED) { diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 4604fd2588be87944a72224dccb3dfb32e42a1ad..fd2b3ef64f1736545948eb49e5ac6dfbd206e3e9 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3698,6 +3698,28 @@ (define_insn "*and3nr_compare0" [(set_attr "type" "logics_reg,logics_imm")] ) +(define_insn "*and3nr_compare0_zextract" + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ + (zero_extract:GPI (match_operand:GPI 0 "register_operand" "r") + (match_operand:GPI 1 "const_int_operand" "n") + (match_operand:GPI 2 "const_int_operand" "n")) + (const_int 0)))] + "INTVAL (operands[1]) > 0 + && ((INTVAL (operands[1]) + INTVAL (operands[2])) + <= GET_MODE_BITSIZE (mode)) + && aarch64_bitmask_imm ( + UINTVAL (aarch64_mask_from_zextract_ops (operands[1], + operands[2])), + mode)" + { + operands[1] + = aarch64_mask_from_zextract_ops (operands[1], operands[2]); + return "tst\\t%0, %1"; + } + [(set_attr "type" "logics_shift_imm")] +) + (define_insn "*and_3nr_compare0" [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ diff --git a/gcc/testsuite/gcc.target/aarch64/tst_3.c b/gcc/testsuite/gcc.target/aarch64/tst_3.c new file mode 100644 index 0000000000000000000000000000000000000000..2204b33f3bc2ea974b3b0a7d1a5bdca7c6b37b82 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/tst_3.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int +f1 (int x) +{ + if (x & 1) + return 1; + return x; +} + +/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*1" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/tst_4.c b/gcc/testsuite/gcc.target/aarch64/tst_4.c new file mode 100644 index 0000000000000000000000000000000000000000..2b869c05c87ec120e1632a1420349a5eb98ff895 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/tst_4.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int +f1 (long x) +{ + return ((short) x >= 0) ? x : 0; +} + +/* { dg-final { scan-assembler "tst\t(x|w)\[0-9\]*.*32768\n" } } */ --------------000900060709070501040100--