From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 107484 invoked by alias); 20 Jul 2018 09:31:32 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 107465 invoked by uid 89); 20 Jul 2018 09:31:31 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.0 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=H*f:sk:14ec9e2, Makes, H*f:sk:1487cf4, Improves X-HELO: EUR02-HE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr10049.outbound.protection.outlook.com (HELO EUR02-HE1-obe.outbound.protection.outlook.com) (40.107.1.49) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 20 Jul 2018 09:31:30 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=52kLOZXi6Rl8NYmpUUL5/DcoKzdPDeMPoTqaIRPDse8=; b=dz6muW/A+n/LZr/9XMJGyo6NER8fW4tnIlGFi0Zq2sX62dccMlK8r/H9Oq+LA2uaPSFmf+n/RboFU5zem4vHgnLT+eXeTUVx9V/1cZzrEXgC8qc3M0YLC80c6LTFpyzXeqmII8QBuikpjwzXj75B7ajY3AP31iNiWnAN/hE4GQA= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Sam.Tebbs@arm.com; Received: from [10.2.206.193] (217.140.96.140) by DB7PR08MB3434.eurprd08.prod.outlook.com (2603:10a6:10:42::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.973.16; Fri, 20 Jul 2018 09:31:26 +0000 Subject: Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks From: Sam Tebbs To: Richard Henderson , Sudakshina Das , "gcc-patches@gcc.gnu.org" Cc: nd , richard.earnshaw@arm.com, marcus.shawcroft@arm.com, james.greenhalgh@arm.com References: <14ec9e29-9ec0-cd70-8a2c-c00723e96427@arm.com> <3e07b208-6de4-1af2-2e6f-72f330239bbd@arm.com> <1487cf4e-b5d4-3ff2-ddd2-90482a607666@twiddle.net> <4fec5bc7-aa21-6470-c9e1-6cfb0dd17efe@arm.com> Message-ID: <2382d230-7c81-15df-e880-aaf7f25efb00@arm.com> Date: Fri, 20 Jul 2018 09:31:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <4fec5bc7-aa21-6470-c9e1-6cfb0dd17efe@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Return-Path: sam.tebbs@arm.com Received-SPF: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-IsSubscribed: yes X-SW-Source: 2018-07/txt/msg01146.txt.bz2 Hi all, Here is an updated patch that does the following: * Adds a new constraint in config/aarch64/constraints.md to check for a constant integer that is left consecutive. This addresses Richard Henderson's suggestion about combining the aarch64_is_left_consecutive call and the const_int match in the pattern. * Merges the two patterns defined into one. * Changes the pattern's type attribute to bfm. * Improved the comment above the aarch64_is_left_consecutive implementation. * Makes the pattern use the GPI iterator to accept smaller integer sizes (an extra test is added to check for this). * Improves the tests in combine_bfxil.c to ensure they aren't optimised away and that they check for the pattern's correctness. Below is a new changelog and the example given before. Is this OK for trunk? This patch adds an optimisation that exploits the AArch64 BFXIL instruction when or-ing the result of two bitwise and operations with non-overlapping bitmasks (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)). Example: unsigned long long combine(unsigned long long a, unsigned long long b) {   return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll); } void read(unsigned long long a, unsigned long long b, unsigned long long *c) {   *c = combine(a, b); } When compiled with -O2, read would result in: read:   and   x5, x1, #0xffffffff   and   x4, x0, #0xffffffff00000000   orr   x4, x4, x5   str   x4, [x2]   ret But with this patch results in: read:   mov    x4, x0   bfxil    x4, x1, 0, 32   str    x4, [x2]   ret Bootstrapped and regtested on aarch64-none-linux-gnu and aarch64-none-elf with no regressions. gcc/ 2018-07-11  Sam Tebbs          PR target/85628         * config/aarch64/aarch64.md (*aarch64_bfxil):         Define.         * config/aarch64/constraints.md (Ulc): Define         * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):         Define.         * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New function. gcc/testsuite 2018-07-11  Sam Tebbs          PR target/85628         * gcc.target/aarch64/combine_bfxil.c: New file.         * gcc.target/aarch64/combine_bfxil_2.c: New file. On 07/19/2018 02:02 PM, Sam Tebbs wrote: > Hi Richard, > > Thanks for the feedback. I find that using "is_left_consecutive" is > more descriptive than checking for it being a power of 2 - 1, since it > describes the requirement (having consecutive ones from the MSB) more > explicitly. I would be happy to change it though if that is the > consensus. > > I have addressed your point about just returning the string instead of > using output_asm_insn and have changed it locally. I'll send an > updated patch soon. > > > On 07/17/2018 02:33 AM, Richard Henderson wrote: >> On 07/16/2018 10:10 AM, Sam Tebbs wrote: >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode (unsigned >>> regno, unsigned, >>>       return SImode; >>>   } >>>   +/* Implement IS_LEFT_CONSECUTIVE.  Check if an integer's bits are >>> consecutive >>> +   ones from the MSB.  */ >>> +bool >>> +aarch64_is_left_consecutive (HOST_WIDE_INT i) >>> +{ >>> +  return (i | (i - 1)) == HOST_WIDE_INT_M1; >>> +} >>> + >> ... >>> +(define_insn "*aarch64_bfxil" >>> +  [(set (match_operand:DI 0 "register_operand" "=r") >>> +    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r") >>> +            (match_operand 3 "const_int_operand")) >>> +        (and:DI (match_operand:DI 2 "register_operand" "0") >>> +            (match_operand 4 "const_int_operand"))))] >>> +  "INTVAL (operands[3]) == ~INTVAL (operands[4]) >>> +    && aarch64_is_left_consecutive (INTVAL (operands[4]))" >> Better is to use a define_predicate to merge both that second test >> and the >> const_int_operand. >> >> (I'm not sure about the "left_consecutive" language either. >> Isn't it more descriptive to say that op3 is a power of 2 minus 1?) >> >> (define_predicate "pow2m1_operand" >>    (and (match_code "const_int") >>         (match_test "exact_pow2 (INTVAL(op) + 1) > 0"))) >> >> and use >> >>    (match_operand:DI 3 "pow2m1_operand") >> >> and then just the >> >>    INTVAL (operands[3]) == ~INTVAL (operands[4]) >> >> test. >> >> Also, don't omit the modes for the constants. >> Also, there's no reason this applies only to DI mode; >> use the GPI iterator and % in the output template. >> >>> +    HOST_WIDE_INT op3 = INTVAL (operands[3]); >>> +    operands[3] = GEN_INT (ceil_log2 (op3)); >>> +    output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands); >>> +    return ""; >> You can just return the string that you passed to output_asm_insn. >> >>> +  } >>> +  [(set_attr "type" "bfx")] >> The other aliases of the BFM insn use type "bfm"; >> "bfx" appears to be aliases of UBFM and SBFM. >> Not that it appears to matter to the scheduling >> descriptions, but it is inconsistent. >> >> >> r~ >