From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 103905 invoked by alias); 23 Jul 2018 13:15:03 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 103843 invoked by uid 89); 23 Jul 2018 13:15:01 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-8.2 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_3,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=Regard X-HELO: EUR01-HE1-obe.outbound.protection.outlook.com Received: from mail-eopbgr130078.outbound.protection.outlook.com (HELO EUR01-HE1-obe.outbound.protection.outlook.com) (40.107.13.78) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 23 Jul 2018 13:14:58 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=KN0kdnhhFHC0vbdNyMPyOLOiuG9YbDWZC+vzQD4+qQc=; b=DVFc29pZozbyGFwBU18Ie9LSWKyDwPJOSyqZUiKb3D4ji96X3JXfC+l3AxxFPWCkrdTQQhotmLEwLPs1372V54XErlJJXQKVTIl0VnbYLRHpOQ2ohD17ZyS1ZxIexZLELToDtNbA1iUch+VsWhD/qV9LS1PANT9sG1l/kx4I7yE= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Sam.Tebbs@arm.com; Received: from [10.2.206.193] (217.140.96.140) by VI1PR08MB3437.eurprd08.prod.outlook.com (2603:10a6:803:7c::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.973.21; Mon, 23 Jul 2018 13:14:53 +0000 Subject: Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks To: Renlin Li , Richard Henderson , Sudakshina Das , "gcc-patches@gcc.gnu.org" Cc: nd , richard.earnshaw@arm.com, marcus.shawcroft@arm.com, james.greenhalgh@arm.com References: <7795cc9e-7294-96ba-7e80-3465e22c77f9@arm.com> <874eaa28-8a58-3585-b5ca-d2d3ec7f5626@foss.arm.com> From: Sam Tebbs Message-ID: <76a6968b-9da7-68c2-05b6-b4bcf0baddf9@arm.com> Date: Mon, 23 Jul 2018 13:15:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <874eaa28-8a58-3585-b5ca-d2d3ec7f5626@foss.arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Return-Path: sam.tebbs@arm.com Received-SPF: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-IsSubscribed: yes X-SW-Source: 2018-07/txt/msg01257.txt.bz2 On 07/23/2018 12:38 PM, Renlin Li wrote: >> +(define_insn "*aarch64_bfxil" >> +  [(set (match_operand:GPI 0 "register_operand" "=r,r") >> +    (ior:GPI (and:GPI (match_operand:GPI 1 "register_operand" "r,0") >> +            (match_operand:GPI 3 "const_int_operand" "n, Ulc")) >> +        (and:GPI (match_operand:GPI 2 "register_operand" "0,r") >> +            (match_operand:GPI 4 "const_int_operand" "Ulc, n"))))] >> +  "INTVAL (operands[3]) == ~INTVAL (operands[4])" >> +  { >> +    switch (which_alternative) >> +    { >> +      case 0: >> +    operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[3]))); >> +    return "bfxil\\t%0, %1, 0, %3"; >> +      case 1: >> +    operands[3] = GEN_INT (ceil_log2 (INTVAL (operands[4]))); >> +    return "bfxil\\t%0, %2, 0, %3"; >> +      default: >> +    gcc_unreachable (); >> +    } >> +  } >> +  [(set_attr "type" "bfm")] >> +) > > Hi Sam, > > Is it possible that operand[3] or operand[4] is 1? > > ceil_log2() could return 0 if the argument is 1. > The width field of the resulting instruction will be 0. Is it still > correct? > > Regard, > Renlin > > Hi Renlin, I think you have found an edge-case that I didn't think of and that the code would fail under. I have added a check for this situation and will send the updated patch soon. Thanks, Sam > > On 07/20/2018 10:33 AM, Sam Tebbs wrote: >> Please disregard the original patch and see this updated version. >> >> >> On 07/20/2018 10:31 AM, Sam Tebbs wrote: >>> Hi all, >>> >>> Here is an updated patch that does the following: >>> >>> * Adds a new constraint in config/aarch64/constraints.md to check >>> for a constant integer that is left consecutive. This addresses >>> Richard Henderson's suggestion about combining the >>> aarch64_is_left_consecutive call and the const_int match in the >>> pattern. >>> >>> * Merges the two patterns defined into one. >>> >>> * Changes the pattern's type attribute to bfm. >>> >>> * Improved the comment above the aarch64_is_left_consecutive >>> implementation. >>> >>> * Makes the pattern use the GPI iterator to accept smaller integer >>> sizes (an extra test is added to check for this). >>> >>> * Improves the tests in combine_bfxil.c to ensure they aren't >>> optimised away and that they check for the pattern's correctness. >>> >>> Below is a new changelog and the example given before. >>> >>> Is this OK for trunk? >>> >>> This patch adds an optimisation that exploits the AArch64 BFXIL >>> instruction >>> when or-ing the result of two bitwise and operations with >>> non-overlapping >>> bitmasks (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)). >>> >>> Example: >>> >>> unsigned long long combine(unsigned long long a, unsigned long long >>> b) { >>>   return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll); >>> } >>> >>> void read(unsigned long long a, unsigned long long b, unsigned long >>> long *c) { >>>   *c = combine(a, b); >>> } >>> >>> When compiled with -O2, read would result in: >>> >>> read: >>>   and   x5, x1, #0xffffffff >>>   and   x4, x0, #0xffffffff00000000 >>>   orr   x4, x4, x5 >>>   str   x4, [x2] >>>   ret >>> >>> But with this patch results in: >>> >>> read: >>>   mov    x4, x0 >>>   bfxil    x4, x1, 0, 32 >>>   str    x4, [x2] >>>   ret >>> >>> >>> >>> Bootstrapped and regtested on aarch64-none-linux-gnu and >>> aarch64-none-elf with no regressions. >>> >>> >>> gcc/ >>> 2018-07-11  Sam Tebbs  >>> >>>         PR target/85628 >>>         * config/aarch64/aarch64.md (*aarch64_bfxil): >>>         Define. >>>         * config/aarch64/constraints.md (Ulc): Define >>>         * config/aarch64/aarch64-protos.h >>> (aarch64_is_left_consecutive): >>>         Define. >>>         * config/aarch64/aarch64.c (aarch64_is_left_consecutive): >>> New function. >>> >>> gcc/testsuite >>> 2018-07-11  Sam Tebbs  >>> >>>         PR target/85628 >>>         * gcc.target/aarch64/combine_bfxil.c: New file. >>>         * gcc.target/aarch64/combine_bfxil_2.c: New file. >>> >>> >>> On 07/19/2018 02:02 PM, Sam Tebbs wrote: >>>> Hi Richard, >>>> >>>> Thanks for the feedback. I find that using "is_left_consecutive" is >>>> more descriptive than checking for it being a power of 2 - 1, since >>>> it describes the requirement (having consecutive ones from the MSB) >>>> more explicitly. I would be happy to change it though if that is >>>> the consensus. >>>> >>>> I have addressed your point about just returning the string instead >>>> of using output_asm_insn and have changed it locally. I'll send an >>>> updated patch soon. >>>> >>>> >>>> On 07/17/2018 02:33 AM, Richard Henderson wrote: >>>>> On 07/16/2018 10:10 AM, Sam Tebbs wrote: >>>>>> +++ b/gcc/config/aarch64/aarch64.c >>>>>> @@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode >>>>>> (unsigned regno, unsigned, >>>>>>       return SImode; >>>>>>   } >>>>>>   +/* Implement IS_LEFT_CONSECUTIVE.  Check if an integer's bits >>>>>> are consecutive >>>>>> +   ones from the MSB.  */ >>>>>> +bool >>>>>> +aarch64_is_left_consecutive (HOST_WIDE_INT i) >>>>>> +{ >>>>>> +  return (i | (i - 1)) == HOST_WIDE_INT_M1; >>>>>> +} >>>>>> + >>>>> ... >>>>>> +(define_insn "*aarch64_bfxil" >>>>>> +  [(set (match_operand:DI 0 "register_operand" "=r") >>>>>> +    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r") >>>>>> +            (match_operand 3 "const_int_operand")) >>>>>> +        (and:DI (match_operand:DI 2 "register_operand" "0") >>>>>> +            (match_operand 4 "const_int_operand"))))] >>>>>> +  "INTVAL (operands[3]) == ~INTVAL (operands[4]) >>>>>> +    && aarch64_is_left_consecutive (INTVAL (operands[4]))" >>>>> Better is to use a define_predicate to merge both that second test >>>>> and the >>>>> const_int_operand. >>>>> >>>>> (I'm not sure about the "left_consecutive" language either. >>>>> Isn't it more descriptive to say that op3 is a power of 2 minus 1?) >>>>> >>>>> (define_predicate "pow2m1_operand" >>>>>    (and (match_code "const_int") >>>>>         (match_test "exact_pow2 (INTVAL(op) + 1) > 0"))) >>>>> >>>>> and use >>>>> >>>>>    (match_operand:DI 3 "pow2m1_operand") >>>>> >>>>> and then just the >>>>> >>>>>    INTVAL (operands[3]) == ~INTVAL (operands[4]) >>>>> >>>>> test. >>>>> >>>>> Also, don't omit the modes for the constants. >>>>> Also, there's no reason this applies only to DI mode; >>>>> use the GPI iterator and % in the output template. >>>>> >>>>>> +    HOST_WIDE_INT op3 = INTVAL (operands[3]); >>>>>> +    operands[3] = GEN_INT (ceil_log2 (op3)); >>>>>> +    output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands); >>>>>> +    return ""; >>>>> You can just return the string that you passed to output_asm_insn. >>>>> >>>>>> +  } >>>>>> +  [(set_attr "type" "bfx")] >>>>> The other aliases of the BFM insn use type "bfm"; >>>>> "bfx" appears to be aliases of UBFM and SBFM. >>>>> Not that it appears to matter to the scheduling >>>>> descriptions, but it is inconsistent. >>>>> >>>>> >>>>> r~ >>>> >>> >>