Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Sam Tebbs <sam.tebbs@arm.com>
To: Sudakshina Das <sudi.das@arm.com>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Cc: nd <nd@arm.com>,
	richard.earnshaw@arm.com, marcus.shawcroft@arm.com,
	james.greenhalgh@arm.com,
	Richard Earnshaw <Richard.Earnshaw@arm.com>,
	James Greenhalgh <James.Greenhalgh@arm.com>,
	Marcus Shawcroft <Marcus.Shawcroft@arm.com>
Subject: Re: [GCC][PATCH][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks
Date: Mon, 16 Jul 2018 17:11:00 -0000	[thread overview]
Message-ID: <3e07b208-6de4-1af2-2e6f-72f330239bbd@arm.com> (raw)
In-Reply-To: <14ec9e29-9ec0-cd70-8a2c-c00723e96427@arm.com>

[-- Attachment #1: Type: text/plain, Size: 5106 bytes --]

Hi Sudi,

Thanks for noticing that, I have attached an improved patch file that 
fixes this issue.

Below is an updated description and changelog:

This patch adds an optimisation that exploits the AArch64 BFXIL instruction
when or-ing the result of two bitwise and operations with non-overlapping
bitmasks (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)).

Example:

unsigned long long combine(unsigned long long a, unsigned long long b) {
 Â  return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
}

void read(unsigned long long a, unsigned long long b, unsigned long long 
*c) {
 Â  *c = combine(a, b);
}

When compiled with -O2, read would result in:

read:
 Â  andÂ Â  x5, x1, #0xffffffff
 Â  andÂ Â  x4, x0, #0xffffffff00000000
 Â  orrÂ Â  x4, x4, x5
 Â  strÂ Â  x4, [x2]
 Â  ret

But with this patch results in:

read:
 Â  movÂ Â  Â x4, x0
 Â  bfxilÂ Â  Â x4, x1, 0, 32
 Â  strÂ Â  Â x4, [x2]
 Â  ret

Bootstrapped and regtested on aarch64-none-linux-gnu and 
aarch64-none-elf with no regressions.


gcc/
2018-07-11Â  Sam TebbsÂ  <sam.tebbs@arm.com>

 Â Â Â Â Â Â Â  * config/aarch64/aarch64.md (*aarch64_bfxil, *aarch64_bfxil_alt):
 Â Â Â Â Â Â Â  Define.
 Â Â Â Â Â Â Â  * config/aarch64/aarch64-protos.h (aarch64_is_left_consecutive):
 Â Â Â Â Â Â Â  Define.
 Â Â Â Â Â Â Â  * config/aarch64/aarch64.c (aarch64_is_left_consecutive): New 
function.

gcc/testsuite
2018-07-11Â  Sam TebbsÂ  <sam.tebbs@arm.com>

 Â Â Â Â Â Â Â  * gcc.target/aarch64/combine_bfxil.c: New file.
 Â Â Â Â Â Â Â  * gcc.target/aarch64/combine_bfxil_2.c: New file.


On 07/16/2018 11:54 AM, Sudakshina Das wrote:
> Hi Sam
>
> On 13/07/18 17:09, Sam Tebbs wrote:
>> Hi all,
>>
>> This patch adds an optimisation that exploits the AArch64 BFXIL 
>> instruction
>> when or-ing the result of two bitwise and operations with 
>> non-overlapping
>> bitmasks (e.g. (a & 0xFFFF0000) | (b & 0x0000FFFF)).
>>
>> Example:
>>
>> unsigned long long combine(unsigned long long a, unsigned long long b) {
>> Â Â  return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
>> }
>>
>> void read2(unsigned long long a, unsigned long long b, unsigned long 
>> long *c,
>> Â Â  unsigned long long *d) {
>> Â Â  *c = combine(a, b); *d = combine(b, a);
>> }
>>
>> When compiled with -O2, read2 would result in:
>>
>> read2:
>> Â Â  andÂ Â  x5, x1, #0xffffffff
>> Â Â  andÂ Â  x4, x0, #0xffffffff00000000
>> Â Â  orrÂ Â  x4, x4, x5
>> Â Â  andÂ Â  x1, x1, #0xffffffff00000000
>> Â Â  andÂ Â  x0, x0, #0xffffffff
>> Â Â  strÂ Â  x4, [x2]
>> Â Â  orrÂ Â  x0, x0, x1
>> Â Â  strÂ Â  x0, [x3]
>> Â Â  ret
>>
>> But with this patch results in:
>>
>> read2:
>> Â Â  movÂ Â  x4, x1
>> Â Â  bfxil x4, x0, 0, 32
>> Â Â  strÂ Â  x4, [x2]
>> Â Â  bfxil x0, x1, 0, 32
>> Â Â  strÂ Â  x0, [x3]
>> Â Â  ret
>> Â Â  Bootstrapped and regtested on aarch64-none-linux-gnu and 
>> aarch64-none-elf with no regressions.
>>
> I am not a maintainer but I have a question about this patch. I may be 
> missing something or reading
> it wrong. So feel free to point it out:
>
> +(define_insn "*aarch64_bfxil"
> +Â  [(set (match_operand:DI 0 "register_operand" "=r")
> +Â Â Â  (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
> +Â Â Â  Â Â Â  Â Â Â  (match_operand 3 "const_int_operand"))
> +Â Â Â  Â Â Â  (and:DI (match_operand:DI 2 "register_operand" "0")
> +Â Â Â  Â Â Â  Â Â Â  (match_operand 4 "const_int_operand"))))]
> +Â  "INTVAL (operands[3]) == ~INTVAL (operands[4])
> +Â Â Â  && aarch64_is_left_consecutive (INTVAL (operands[3]))"
> +Â  {
> +Â Â Â  HOST_WIDE_INT op4 = INTVAL (operands[4]);
> +Â Â Â  operands[3] = GEN_INT (64 - ceil_log2 (op4));
> +Â Â Â  output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);
>
> In the BFXIL you are reading %3 LSB bits from operand 1 and putting it 
> in the LSBs of %0.
> This means that the pattern should be masking the 32-%3 MSB of %0 and
> %3 LSB of %1. So shouldn't operand 4 is LEFT_CONSECUTIVE>
>
> Can you please compare a simpler version of the above example you gave to
> make sure the generated assembly is equivalent before and after the 
> patch:
>
> void read2(unsigned long long a, unsigned long long b, unsigned long 
> long *c) {
> Â  *c = combine(a, b);
> }
>
>
> From the above text
>
> read2:
> Â  andÂ Â  x5, x1, #0xffffffff
> Â  andÂ Â  x4, x0, #0xffffffff00000000
> Â  orrÂ Â  x4, x4, x5
>
> read2:
> Â  movÂ Â  x4, x1
> Â  bfxil x4, x0, 0, 32
>
> This does not seem equivalent to me.
>
> Thanks
> Sudi
>
> +Â Â Â  return "";
> +Â  }
> +Â  [(set_attr "type" "bfx")]
> +)
>> gcc/
>> 2018-07-11Â  Sam TebbsÂ  <sam.tebbs@arm.com>
>>
>> Â Â Â Â Â Â Â Â  * config/aarch64/aarch64.md (*aarch64_bfxil, 
>> *aarch64_bfxil_alt):
>> Â Â Â Â Â Â Â Â  Define.
>> Â Â Â Â Â Â Â Â  * config/aarch64/aarch64-protos.h 
>> (aarch64_is_left_consecutive):
>> Â Â Â Â Â Â Â Â  Define.
>> Â Â Â Â Â Â Â Â  * config/aarch64/aarch64.c (aarch64_is_left_consecutive): 
>> New function.
>>
>> gcc/testsuite
>> 2018-07-11Â  Sam TebbsÂ  <sam.tebbs@arm.com>
>>
>> Â Â Â Â Â Â Â Â  * gcc.target/aarch64/combine_bfxil.c: New file.
>> Â Â Â Â Â Â Â Â  * gcc.target/aarch64/combine_bfxil_2.c: New file.
>>
>>
>


[-- Attachment #2: fix.patch --]
[-- Type: text/x-patch, Size: 4982 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 514ddc4..b025cd6 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -558,4 +558,6 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
 
 poly_uint64 aarch64_regmode_natural_size (machine_mode);
 
+bool aarch64_is_left_consecutive (HOST_WIDE_INT);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index d75d45f..884958b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1439,6 +1439,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned,
     return SImode;
 }
 
+/* Implement IS_LEFT_CONSECUTIVE.  Check if an integer's bits are consecutive
+   ones from the MSB.  */
+bool
+aarch64_is_left_consecutive (HOST_WIDE_INT i)
+{
+  return (i | (i - 1)) == HOST_WIDE_INT_M1;
+}
+
 /* Implement TARGET_CONSTANT_ALIGNMENT.  Make strings word-aligned so
    that strcpy from constants will be faster.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index a014a01..78ec4cf 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4844,6 +4844,42 @@
   [(set_attr "type" "rev")]
 )
 
+(define_insn "*aarch64_bfxil"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "r")
+		    (match_operand 3 "const_int_operand"))
+	    (and:DI (match_operand:DI 2 "register_operand" "0")
+		    (match_operand 4 "const_int_operand"))))]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+    && aarch64_is_left_consecutive (INTVAL (operands[4]))"
+  {
+    HOST_WIDE_INT op3 = INTVAL (operands[3]);
+    operands[3] = GEN_INT (ceil_log2 (op3));
+    output_asm_insn ("bfxil\\t%0, %1, 0, %3", operands);
+    return "";
+  }
+  [(set_attr "type" "bfx")]
+)
+
+; An alternate bfxil pattern where the second bitmask is the smallest, and so
+; the first register used is changed instead of the second
+(define_insn "*aarch64_bfxil_alt"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+    (ior:DI (and:DI (match_operand:DI 1 "register_operand" "0")
+		    (match_operand 3 "const_int_operand"))
+	    (and:DI (match_operand:DI 2 "register_operand" "r")
+		    (match_operand 4 "const_int_operand"))))]
+  "INTVAL (operands[3]) == ~INTVAL (operands[4])
+    && aarch64_is_left_consecutive (INTVAL (operands[3]))"
+  {
+    HOST_WIDE_INT op4 = INTVAL (operands[4]);
+    operands[3] = GEN_INT (ceil_log2 (op4));
+    output_asm_insn ("bfxil\\t%0, %2, 0, %3", operands);
+    return "";
+  }
+  [(set_attr "type" "bfx")]
+)
+
 ;; There are no canonicalisation rules for the position of the lshiftrt, ashift
 ;; operations within an IOR/AND RTX, therefore we have two patterns matching
 ;; each valid permutation.
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
new file mode 100644
index 0000000..7189b80
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+extern void abort(void);
+
+unsigned long long
+combine_balanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xffffffff00000000ll) | (b & 0x00000000ffffffffll);
+}
+
+
+unsigned long long
+combine_unbalanced (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xffffffffff000000ll) | (b & 0x0000000000ffffffll);
+}
+
+void
+foo2 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_balanced(a, b);
+  *d = combine_balanced(b, a);
+}
+
+void
+foo3 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d)
+{
+  *c = combine_unbalanced(a, b);
+  *d = combine_unbalanced(b, a);
+}
+
+int
+main(void)
+{
+  unsigned long long a = 0x0123456789ABCDEF, b = 0xFEDCBA9876543210, c, d;
+  foo3(a, b, &c, &d);
+  if(c != 0x0123456789543210) abort();
+  if(d != 0xfedcba9876abcdef) abort();
+  foo2(a, b, &c, &d);
+  if(c != 0x0123456776543210) abort();
+  if(d != 0xfedcba9889abcdef) abort(); 
+  return 0;
+}
+
+/* { dg-final { scan-assembler-times "bfxil\\t" 4 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c
new file mode 100644
index 0000000..8237d94
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil_2.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+unsigned long long
+combine_non_consecutive (unsigned long long a, unsigned long long b)
+{
+  return (a & 0xfffffff200f00000ll) | (b & 0x00001000ffffffffll);
+}
+
+void
+foo4 (unsigned long long a, unsigned long long b, unsigned long long *c,
+  unsigned long long *d) {
+  /* { dg-final { scan-assembler-not "bfxil\\t" } } */
+  *c = combine_non_consecutive(a, b);
+  *d = combine_non_consecutive(b, a);
+}

next prev parent reply	other threads:[~2018-07-16 17:11 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-13 16:09 Sam Tebbs
2018-07-16 10:55 ` Sudakshina Das
2018-07-16 17:11   ` Sam Tebbs [this message]
2018-07-17  1:34     ` Richard Henderson
2018-07-17 13:33       ` Richard Earnshaw (lists)
2018-07-17 15:46         ` Richard Henderson
2018-07-19 13:03       ` Sam Tebbs
2018-07-20  9:31         ` Sam Tebbs
2018-07-20  9:33           ` Sam Tebbs
2018-07-23 11:38             ` Renlin Li
2018-07-23 13:15               ` Sam Tebbs
2018-07-24 16:24                 ` Sam Tebbs
2018-07-30 11:31                   ` Sam Tebbs

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3e07b208-6de4-1af2-2e6f-72f330239bbd@arm.com \
    --to=sam.tebbs@arm.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=james.greenhalgh@arm.com \
    --cc=marcus.shawcroft@arm.com \
    --cc=nd@arm.com \
    --cc=richard.earnshaw@arm.com \
    --cc=sudi.das@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).