[PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
@ 2014-05-08 17:36 Ian Bolton
  2014-05-16  9:17 ` Ian Bolton
  2014-05-16 12:35 ` Richard Earnshaw
  0 siblings, 2 replies; 10+ messages in thread
From: Ian Bolton @ 2014-05-08 17:36 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1030 bytes --]

Hi,

It currently takes 4 instructions to generate certain immediates on
AArch64 (unless we put them in the constant pool).

For example ...

  long long
  ffffbeefcafebabe ()
  {
    return 0xFFFFBEEFCAFEBABEll;
  }

leads to ...

  mov x0, 0x47806
  mov x0, 0xcafe, lsl 16
  mov x0, 0xbeef, lsl 32
  orr x0, x0, -281474976710656

The above case is tackled in this patch by employing MOVN
to generate the top 32-bits in a single instruction ...

  mov x0, -71536975282177
  movk x0, 0xcafe, lsl 16
  movk x0, 0xbabe, lsl 0

Note that where at least two half-words are 0xffff, existing
code that does the immediate in two instructions is still used.)

Tested on standard gcc regressions and the attached test case.

OK for commit?

Cheers,
Ian


2014-05-08  Ian Bolton  <ian.bolton@arm.com>

gcc/
	* config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
	Use MOVN when top-most half-word (and only that half-word)
	is 0xffff.
gcc/testsuite/
	* gcc.target/aarch64/movn_1.c: New test.

[-- Attachment #2: aarch64-movn-exploitation-patch-v5.txt --]
[-- Type: text/plain, Size: 1576 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 43a83566..a8e504e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1177,6 +1177,18 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	}
     }
 
+  /* Look for case where upper 16 bits are set, so we can use MOVN.  */
+  if ((val & 0xffff000000000000ll) == 0xffff000000000000ll)
+    {
+      emit_insn (gen_rtx_SET (VOIDmode, dest,
+			      GEN_INT (~ (~val & (0xffffll << 32)))));
+      emit_insn (gen_insv_immdi (dest, GEN_INT (16),
+				 GEN_INT ((val >> 16) & 0xffff)));
+      emit_insn (gen_insv_immdi (dest, GEN_INT (0),
+				 GEN_INT (val & 0xffff)));
+      return;
+    }
+
  simple_sequence:
   first = true;
   mask = 0xffff;
diff --git a/gcc/testsuite/gcc.target/aarch64/movn_1.c b/gcc/testsuite/gcc.target/aarch64/movn_1.c
new file mode 100644
index 0000000..cc11ade
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/movn_1.c
@@ -0,0 +1,27 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-inline --save-temps" } */
+
+extern void abort (void);
+
+long long
+foo ()
+{
+  /* { dg-final { scan-assembler "mov\tx\[0-9\]+, -71536975282177" } } */
+  return 0xffffbeefcafebabell;
+}
+
+long long
+merge4 (int a, int b, int c, int d)
+{
+  return ((long long) a << 48 | (long long) b << 32
+	  | (long long) c << 16 | (long long) d);
+}
+
+int main ()
+{
+  if (foo () != merge4 (0xffff, 0xbeef, 0xcafe, 0xbabe))
+    abort ();
+  return 0;
+}
+
+/* { dg-final { cleanup-saved-temps } } */

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
  2014-05-08 17:36 [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible Ian Bolton
@ 2014-05-16  9:17 ` Ian Bolton
  2014-05-16 12:35 ` Richard Earnshaw
  1 sibling, 0 replies; 10+ messages in thread
From: Ian Bolton @ 2014-05-16  9:17 UTC (permalink / raw)
  To: Ian Bolton, gcc-patches

Ping.  This should be relatively simple to review.

Many thanks.

> -----Original Message-----
> From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches-
> owner@gcc.gnu.org] On Behalf Of Ian Bolton
> Sent: 08 May 2014 18:36
> To: gcc-patches
> Subject: [PATCH, AArch64] Use MOVN to generate 64-bit negative
> immediates where sensible
> 
> Hi,
> 
> It currently takes 4 instructions to generate certain immediates on
> AArch64 (unless we put them in the constant pool).
> 
> For example ...
> 
>   long long
>   ffffbeefcafebabe ()
>   {
>     return 0xFFFFBEEFCAFEBABEll;
>   }
> 
> leads to ...
> 
>   mov x0, 0x47806
>   mov x0, 0xcafe, lsl 16
>   mov x0, 0xbeef, lsl 32
>   orr x0, x0, -281474976710656
> 
> The above case is tackled in this patch by employing MOVN
> to generate the top 32-bits in a single instruction ...
> 
>   mov x0, -71536975282177
>   movk x0, 0xcafe, lsl 16
>   movk x0, 0xbabe, lsl 0
> 
> Note that where at least two half-words are 0xffff, existing
> code that does the immediate in two instructions is still used.)
> 
> Tested on standard gcc regressions and the attached test case.
> 
> OK for commit?
> 
> Cheers,
> Ian
> 
> 
> 2014-05-08  Ian Bolton  <ian.bolton@arm.com>
> 
> gcc/
> 	* config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
> 	Use MOVN when top-most half-word (and only that half-word)
> 	is 0xffff.
> gcc/testsuite/
> 	* gcc.target/aarch64/movn_1.c: New test.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
  2014-05-08 17:36 [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible Ian Bolton
  2014-05-16  9:17 ` Ian Bolton
@ 2014-05-16 12:35 ` Richard Earnshaw
  2014-08-07 11:32   ` Kyrill Tkachov
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Earnshaw @ 2014-05-16 12:35 UTC (permalink / raw)
  To: Ian Bolton; +Cc: gcc-patches

On 08/05/14 18:36, Ian Bolton wrote:
> Hi,
> 
> It currently takes 4 instructions to generate certain immediates on
> AArch64 (unless we put them in the constant pool).
> 
> For example ...
> 
>   long long
>   ffffbeefcafebabe ()
>   {
>     return 0xFFFFBEEFCAFEBABEll;
>   }
> 
> leads to ...
> 
>   mov x0, 0x47806
>   mov x0, 0xcafe, lsl 16
>   mov x0, 0xbeef, lsl 32
>   orr x0, x0, -281474976710656
> 
> The above case is tackled in this patch by employing MOVN
> to generate the top 32-bits in a single instruction ...
> 
>   mov x0, -71536975282177
>   movk x0, 0xcafe, lsl 16
>   movk x0, 0xbabe, lsl 0
> 
> Note that where at least two half-words are 0xffff, existing
> code that does the immediate in two instructions is still used.)
> 
> Tested on standard gcc regressions and the attached test case.
> 
> OK for commit?

What about:

long long a()
{
  return 0x1234ffff56789abcll;
}

long long b()
{
  return 0x12345678ffff9abcll;
}

long long c()
{
  return 0x123456789abcffffll;
}

?

Surely these can also benefit from this sort of optimization, but it
looks as though you only handle the top 16 bits being set.

R.

> 
> Cheers,
> Ian
> 
> 
> 2014-05-08  Ian Bolton  <ian.bolton@arm.com>
> 
> gcc/
> 	* config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
> 	Use MOVN when top-most half-word (and only that half-word)
> 	is 0xffff.
> gcc/testsuite/
> 	* gcc.target/aarch64/movn_1.c: New test.
> 
> 
> aarch64-movn-exploitation-patch-v5.txt
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 43a83566..a8e504e 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1177,6 +1177,18 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>  	}
>      }
>  
> +  /* Look for case where upper 16 bits are set, so we can use MOVN.  */
> +  if ((val & 0xffff000000000000ll) == 0xffff000000000000ll)
> +    {
> +      emit_insn (gen_rtx_SET (VOIDmode, dest,
> +			      GEN_INT (~ (~val & (0xffffll << 32)))));
> +      emit_insn (gen_insv_immdi (dest, GEN_INT (16),
> +				 GEN_INT ((val >> 16) & 0xffff)));
> +      emit_insn (gen_insv_immdi (dest, GEN_INT (0),
> +				 GEN_INT (val & 0xffff)));
> +      return;
> +    }
> +
>   simple_sequence:
>    first = true;
>    mask = 0xffff;
> diff --git a/gcc/testsuite/gcc.target/aarch64/movn_1.c b/gcc/testsuite/gcc.target/aarch64/movn_1.c
> new file mode 100644
> index 0000000..cc11ade
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/movn_1.c
> @@ -0,0 +1,27 @@
> +/* { dg-do run } */
> +/* { dg-options "-O2 -fno-inline --save-temps" } */
> +
> +extern void abort (void);
> +
> +long long
> +foo ()
> +{
> +  /* { dg-final { scan-assembler "mov\tx\[0-9\]+, -71536975282177" } } */
> +  return 0xffffbeefcafebabell;
> +}
> +
> +long long
> +merge4 (int a, int b, int c, int d)
> +{
> +  return ((long long) a << 48 | (long long) b << 32
> +	  | (long long) c << 16 | (long long) d);
> +}
> +
> +int main ()
> +{
> +  if (foo () != merge4 (0xffff, 0xbeef, 0xcafe, 0xbabe))
> +    abort ();
> +  return 0;
> +}
> +
> +/* { dg-final { cleanup-saved-temps } } */
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
  2014-05-16 12:35 ` Richard Earnshaw
@ 2014-08-07 11:32   ` Kyrill Tkachov
  2014-08-07 12:46     ` Richard Earnshaw
  0 siblings, 1 reply; 10+ messages in thread
From: Kyrill Tkachov @ 2014-08-07 11:32 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2704 bytes --]


On 16/05/14 13:35, Richard Earnshaw wrote:
> On 08/05/14 18:36, Ian Bolton wrote:
>> Hi,
>>
>> It currently takes 4 instructions to generate certain immediates on
>> AArch64 (unless we put them in the constant pool).
>>
>> For example ...
>>
>>    long long
>>    ffffbeefcafebabe ()
>>    {
>>      return 0xFFFFBEEFCAFEBABEll;
>>    }
>>
>> leads to ...
>>
>>    mov x0, 0x47806
>>    mov x0, 0xcafe, lsl 16
>>    mov x0, 0xbeef, lsl 32
>>    orr x0, x0, -281474976710656
>>
>> The above case is tackled in this patch by employing MOVN
>> to generate the top 32-bits in a single instruction ...
>>
>>    mov x0, -71536975282177
>>    movk x0, 0xcafe, lsl 16
>>    movk x0, 0xbabe, lsl 0
>>
>> Note that where at least two half-words are 0xffff, existing
>> code that does the immediate in two instructions is still used.)
>>
>> Tested on standard gcc regressions and the attached test case.
>>
>> OK for commit?
> What about:
>
> long long a()
> {
>    return 0x1234ffff56789abcll;
> }
>
> long long b()
> {
>    return 0x12345678ffff9abcll;
> }
>
> long long c()
> {
>    return 0x123456789abcffffll;
> }
>
> ?
>
> Surely these can also benefit from this sort of optimization, but it
> looks as though you only handle the top 16 bits being set.

Hi Richard,

How about this rework of the patch?

For code:

long long foo ()
{
   return 0xFFFFBEEFCAFEBABEll;
}

long long a()
{
   return 0x1234ffff56789abcll;
}

long long b()
{
   return 0x12345678ffff9abcll;
}

long long c()
{
   return 0x123456789abcffffll;
}

we now generate:
foo:
         mov     x0, -17730
         movk    x0, 0xcafe, lsl 16
         movk    x0, 0xbeef, lsl 32
         ret
         .size   foo, .-foo
         .align  2
         .global a
         .type   a, %function
a:
         mov     x0, -25924
         movk    x0, 0x5678, lsl 16
         movk    x0, 0x1234, lsl 48
         ret
         .size   a, .-a
         .align  2
         .global b
         .type   b, %function
b:
         mov     x0, -25924
         movk    x0, 0x5678, lsl 32
         movk    x0, 0x1234, lsl 48
         ret
         .size   b, .-b
         .align  2
         .global c
         .type   c, %function
c:
         mov     x0, -1698889729
         movk    x0, 0x5678, lsl 32
         movk    x0, 0x1234, lsl 48
         ret


3 instructions are used in each case.

Thanks,
Kyrill

2014-08-07  Ian Bolton  <ian.bolton@arm.com>
                     Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

         * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
         Use MOVN when one of the half-words is 0xffff.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-movn-pattern-patch-v3.patch --]
[-- Type: text/x-patch; name=aarch64-movn-pattern-patch-v3.patch, Size: 2290 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0a7f441..2db91c7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1005,7 +1005,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   unsigned HOST_WIDE_INT val;
   bool subtargets;
   rtx subtarget;
-  int one_match, zero_match;
+  int one_match, zero_match, first_not_ffff_match;
 
   gcc_assert (mode == SImode || mode == DImode);
 
@@ -1106,29 +1106,48 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   one_match = 0;
   zero_match = 0;
   mask = 0xffff;
+  first_not_ffff_match = -1;
 
   for (i = 0; i < 64; i += 16, mask <<= 16)
     {
-      if ((val & mask) == 0)
-	zero_match++;
-      else if ((val & mask) == mask)
+      if ((val & mask) == mask)
 	one_match++;
+      else
+	{
+	  if (first_not_ffff_match < 0)
+	    first_not_ffff_match = i;
+	  if ((val & mask) == 0)
+	    zero_match++;
+	}
     }
 
   if (one_match == 2)
     {
-      mask = 0xffff;
-      for (i = 0; i < 64; i += 16, mask <<= 16)
+      /* Set one of the quarters and then insert back into result.  */
+      mask = 0xffffll << first_not_ffff_match;
+      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
+      emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match),
+				 GEN_INT ((val >> first_not_ffff_match)
+					  & 0xffff)));
+      return;
+    }
+
+  if (one_match == 1)
+    {
+      /* Set either first three quarters or all but the third.	 */
+      mask = 0xffffll << (16 - first_not_ffff_match);
+      emit_insn (gen_rtx_SET (VOIDmode, dest,
+			      GEN_INT (val | mask | 0xffffffff00000000ull)));
+
+      /* Now insert other two quarters.	 */
+      for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1);
+	   i < 64; i += 16, mask <<= 16)
 	{
 	  if ((val & mask) != mask)
-	    {
-	      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
-	      emit_insn (gen_insv_immdi (dest, GEN_INT (i),
-					 GEN_INT ((val >> i) & 0xffff)));
-	      return;
-	    }
+	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
+				       GEN_INT ((val >> i) & 0xffff)));
 	}
-      gcc_unreachable ();
+      return;
     }
 
   if (zero_match == 2)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
  2014-08-07 11:32   ` Kyrill Tkachov
@ 2014-08-07 12:46     ` Richard Earnshaw
  2014-08-07 12:57       ` Kyrill Tkachov
  0 siblings, 1 reply; 10+ messages in thread
From: Richard Earnshaw @ 2014-08-07 12:46 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc-patches

On 07/08/14 12:32, Kyrill Tkachov wrote:
> 
> On 16/05/14 13:35, Richard Earnshaw wrote:
>> On 08/05/14 18:36, Ian Bolton wrote:
>>> Hi,
>>>
>>> It currently takes 4 instructions to generate certain immediates on
>>> AArch64 (unless we put them in the constant pool).
>>>
>>> For example ...
>>>
>>>    long long
>>>    ffffbeefcafebabe ()
>>>    {
>>>      return 0xFFFFBEEFCAFEBABEll;
>>>    }
>>>
>>> leads to ...
>>>
>>>    mov x0, 0x47806
>>>    mov x0, 0xcafe, lsl 16
>>>    mov x0, 0xbeef, lsl 32
>>>    orr x0, x0, -281474976710656
>>>
>>> The above case is tackled in this patch by employing MOVN
>>> to generate the top 32-bits in a single instruction ...
>>>
>>>    mov x0, -71536975282177
>>>    movk x0, 0xcafe, lsl 16
>>>    movk x0, 0xbabe, lsl 0
>>>
>>> Note that where at least two half-words are 0xffff, existing
>>> code that does the immediate in two instructions is still used.)
>>>
>>> Tested on standard gcc regressions and the attached test case.
>>>
>>> OK for commit?
>> What about:
>>
>> long long a()
>> {
>>    return 0x1234ffff56789abcll;
>> }
>>
>> long long b()
>> {
>>    return 0x12345678ffff9abcll;
>> }
>>
>> long long c()
>> {
>>    return 0x123456789abcffffll;
>> }
>>
>> ?
>>
>> Surely these can also benefit from this sort of optimization, but it
>> looks as though you only handle the top 16 bits being set.
> 
> Hi Richard,
> 
> How about this rework of the patch?
> 
> For code:
> 
> long long foo ()
> {
>    return 0xFFFFBEEFCAFEBABEll;
> }
> 
> long long a()
> {
>    return 0x1234ffff56789abcll;
> }
> 
> long long b()
> {
>    return 0x12345678ffff9abcll;
> }
> 
> long long c()
> {
>    return 0x123456789abcffffll;
> }
> 
> we now generate:
> foo:
>          mov     x0, -17730
>          movk    x0, 0xcafe, lsl 16
>          movk    x0, 0xbeef, lsl 32
>          ret
>          .size   foo, .-foo
>          .align  2
>          .global a
>          .type   a, %function
> a:
>          mov     x0, -25924
>          movk    x0, 0x5678, lsl 16
>          movk    x0, 0x1234, lsl 48
>          ret
>          .size   a, .-a
>          .align  2
>          .global b
>          .type   b, %function
> b:
>          mov     x0, -25924
>          movk    x0, 0x5678, lsl 32
>          movk    x0, 0x1234, lsl 48
>          ret
>          .size   b, .-b
>          .align  2
>          .global c
>          .type   c, %function
> c:
>          mov     x0, -1698889729
>          movk    x0, 0x5678, lsl 32
>          movk    x0, 0x1234, lsl 48
>          ret
> 
> 
> 3 instructions are used in each case.
> 
> Thanks,
> Kyrill
> 
> 2014-08-07  Ian Bolton  <ian.bolton@arm.com>
>                      Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
> 
>          * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
>          Use MOVN when one of the half-words is 0xffff.
> 
> 
> aarch64-movn-pattern-patch-v3.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 0a7f441..2db91c7 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1005,7 +1005,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>    unsigned HOST_WIDE_INT val;
>    bool subtargets;
>    rtx subtarget;
> -  int one_match, zero_match;
> +  int one_match, zero_match, first_not_ffff_match;
>  
>    gcc_assert (mode == SImode || mode == DImode);
>  
> @@ -1106,29 +1106,48 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>    one_match = 0;
>    zero_match = 0;
>    mask = 0xffff;
> +  first_not_ffff_match = -1;
>  
>    for (i = 0; i < 64; i += 16, mask <<= 16)
>      {
> -      if ((val & mask) == 0)
> -	zero_match++;
> -      else if ((val & mask) == mask)
> +      if ((val & mask) == mask)
>  	one_match++;
> +      else
> +	{
> +	  if (first_not_ffff_match < 0)
> +	    first_not_ffff_match = i;
> +	  if ((val & mask) == 0)
> +	    zero_match++;
> +	}
>      }
>  
>    if (one_match == 2)
>      {
> -      mask = 0xffff;
> -      for (i = 0; i < 64; i += 16, mask <<= 16)
> +      /* Set one of the quarters and then insert back into result.  */
> +      mask = 0xffffll << first_not_ffff_match;
> +      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
> +      emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match),
> +				 GEN_INT ((val >> first_not_ffff_match)
> +					  & 0xffff)));
> +      return;
> +    }
> +
> +  if (one_match == 1)

I think this should be (one_match > zero_match).

Otherwise constants such as


  0x00001234ffff0000ll

might end up taking three rather than two insns.

R.

> +    {
> +      /* Set either first three quarters or all but the third.	 */
> +      mask = 0xffffll << (16 - first_not_ffff_match);
> +      emit_insn (gen_rtx_SET (VOIDmode, dest,
> +			      GEN_INT (val | mask | 0xffffffff00000000ull)));
> +
> +      /* Now insert other two quarters.	 */
> +      for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1);
> +	   i < 64; i += 16, mask <<= 16)
>  	{
>  	  if ((val & mask) != mask)
> -	    {
> -	      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
> -	      emit_insn (gen_insv_immdi (dest, GEN_INT (i),
> -					 GEN_INT ((val >> i) & 0xffff)));
> -	      return;
> -	    }
> +	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
> +				       GEN_INT ((val >> i) & 0xffff)));
>  	}
> -      gcc_unreachable ();
> +      return;
>      }
>  
>    if (zero_match == 2)
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
  2014-08-07 12:46     ` Richard Earnshaw
@ 2014-08-07 12:57       ` Kyrill Tkachov
  2014-08-07 13:31         ` Richard Earnshaw
  2014-08-07 19:33         ` Richard Henderson
  0 siblings, 2 replies; 10+ messages in thread
From: Kyrill Tkachov @ 2014-08-07 12:57 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5533 bytes --]


On 07/08/14 13:46, Richard Earnshaw wrote:
> On 07/08/14 12:32, Kyrill Tkachov wrote:
>> On 16/05/14 13:35, Richard Earnshaw wrote:
>>> On 08/05/14 18:36, Ian Bolton wrote:
>>>> Hi,
>>>>
>>>> It currently takes 4 instructions to generate certain immediates on
>>>> AArch64 (unless we put them in the constant pool).
>>>>
>>>> For example ...
>>>>
>>>>     long long
>>>>     ffffbeefcafebabe ()
>>>>     {
>>>>       return 0xFFFFBEEFCAFEBABEll;
>>>>     }
>>>>
>>>> leads to ...
>>>>
>>>>     mov x0, 0x47806
>>>>     mov x0, 0xcafe, lsl 16
>>>>     mov x0, 0xbeef, lsl 32
>>>>     orr x0, x0, -281474976710656
>>>>
>>>> The above case is tackled in this patch by employing MOVN
>>>> to generate the top 32-bits in a single instruction ...
>>>>
>>>>     mov x0, -71536975282177
>>>>     movk x0, 0xcafe, lsl 16
>>>>     movk x0, 0xbabe, lsl 0
>>>>
>>>> Note that where at least two half-words are 0xffff, existing
>>>> code that does the immediate in two instructions is still used.)
>>>>
>>>> Tested on standard gcc regressions and the attached test case.
>>>>
>>>> OK for commit?
>>> What about:
>>>
>>> long long a()
>>> {
>>>     return 0x1234ffff56789abcll;
>>> }
>>>
>>> long long b()
>>> {
>>>     return 0x12345678ffff9abcll;
>>> }
>>>
>>> long long c()
>>> {
>>>     return 0x123456789abcffffll;
>>> }
>>>
>>> ?
>>>
>>> Surely these can also benefit from this sort of optimization, but it
>>> looks as though you only handle the top 16 bits being set.
>> Hi Richard,
>>
>> How about this rework of the patch?
>>
>> For code:
>>
>> long long foo ()
>> {
>>     return 0xFFFFBEEFCAFEBABEll;
>> }
>>
>> long long a()
>> {
>>     return 0x1234ffff56789abcll;
>> }
>>
>> long long b()
>> {
>>     return 0x12345678ffff9abcll;
>> }
>>
>> long long c()
>> {
>>     return 0x123456789abcffffll;
>> }
>>
>> we now generate:
>> foo:
>>           mov     x0, -17730
>>           movk    x0, 0xcafe, lsl 16
>>           movk    x0, 0xbeef, lsl 32
>>           ret
>>           .size   foo, .-foo
>>           .align  2
>>           .global a
>>           .type   a, %function
>> a:
>>           mov     x0, -25924
>>           movk    x0, 0x5678, lsl 16
>>           movk    x0, 0x1234, lsl 48
>>           ret
>>           .size   a, .-a
>>           .align  2
>>           .global b
>>           .type   b, %function
>> b:
>>           mov     x0, -25924
>>           movk    x0, 0x5678, lsl 32
>>           movk    x0, 0x1234, lsl 48
>>           ret
>>           .size   b, .-b
>>           .align  2
>>           .global c
>>           .type   c, %function
>> c:
>>           mov     x0, -1698889729
>>           movk    x0, 0x5678, lsl 32
>>           movk    x0, 0x1234, lsl 48
>>           ret
>>
>>
>> 3 instructions are used in each case.
>>
>> Thanks,
>> Kyrill
>>
>> 2014-08-07  Ian Bolton  <ian.bolton@arm.com>
>>                       Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>
>>           * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
>>           Use MOVN when one of the half-words is 0xffff.
>>
>>
>> aarch64-movn-pattern-patch-v3.patch
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index 0a7f441..2db91c7 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -1005,7 +1005,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>>     unsigned HOST_WIDE_INT val;
>>     bool subtargets;
>>     rtx subtarget;
>> -  int one_match, zero_match;
>> +  int one_match, zero_match, first_not_ffff_match;
>>   
>>     gcc_assert (mode == SImode || mode == DImode);
>>   
>> @@ -1106,29 +1106,48 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>>     one_match = 0;
>>     zero_match = 0;
>>     mask = 0xffff;
>> +  first_not_ffff_match = -1;
>>   
>>     for (i = 0; i < 64; i += 16, mask <<= 16)
>>       {
>> -      if ((val & mask) == 0)
>> -	zero_match++;
>> -      else if ((val & mask) == mask)
>> +      if ((val & mask) == mask)
>>   	one_match++;
>> +      else
>> +	{
>> +	  if (first_not_ffff_match < 0)
>> +	    first_not_ffff_match = i;
>> +	  if ((val & mask) == 0)
>> +	    zero_match++;
>> +	}
>>       }
>>   
>>     if (one_match == 2)
>>       {
>> -      mask = 0xffff;
>> -      for (i = 0; i < 64; i += 16, mask <<= 16)
>> +      /* Set one of the quarters and then insert back into result.  */
>> +      mask = 0xffffll << first_not_ffff_match;
>> +      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
>> +      emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match),
>> +				 GEN_INT ((val >> first_not_ffff_match)
>> +					  & 0xffff)));
>> +      return;
>> +    }
>> +
>> +  if (one_match == 1)
> I think this should be (one_match > zero_match).
>
> Otherwise constants such as
>
>
>    0x00001234ffff0000ll
>
> might end up taking three rather than two insns.

You're right, we generate:
         mov     x0, -65536
         movk    x0, 0x1234, lsl 32
         and     x0, x0, 281474976710655

with your suggestion we can improve this to:
         mov     x0, 4294901760
         movk    x0, 0x1234, lsl 32

Ok with that change then?

Kyrill

2014-08-07  Ian Bolton<ian.bolton@arm.com>
             Kyrylo Tkachov<kyrylo.tkachov@arm.com>

          * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
          Use MOVN when one of the half-words is 0xffff.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-movn-pattern-patch-v3.patch --]
[-- Type: text/x-patch; name=aarch64-movn-pattern-patch-v3.patch, Size: 2298 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 0a7f441..2db91c7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1005,7 +1005,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   unsigned HOST_WIDE_INT val;
   bool subtargets;
   rtx subtarget;
-  int one_match, zero_match;
+  int one_match, zero_match, first_not_ffff_match;
 
   gcc_assert (mode == SImode || mode == DImode);
 
@@ -1106,29 +1106,48 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
   one_match = 0;
   zero_match = 0;
   mask = 0xffff;
+  first_not_ffff_match = -1;
 
   for (i = 0; i < 64; i += 16, mask <<= 16)
     {
-      if ((val & mask) == 0)
-	zero_match++;
-      else if ((val & mask) == mask)
+      if ((val & mask) == mask)
 	one_match++;
+      else
+	{
+	  if (first_not_ffff_match < 0)
+	    first_not_ffff_match = i;
+	  if ((val & mask) == 0)
+	    zero_match++;
+	}
     }
 
   if (one_match == 2)
     {
-      mask = 0xffff;
-      for (i = 0; i < 64; i += 16, mask <<= 16)
+      /* Set one of the quarters and then insert back into result.  */
+      mask = 0xffffll << first_not_ffff_match;
+      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
+      emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match),
+				 GEN_INT ((val >> first_not_ffff_match)
+					  & 0xffff)));
+      return;
+    }
+
+  if (one_match > zero_match)
+    {
+      /* Set either first three quarters or all but the third.	 */
+      mask = 0xffffll << (16 - first_not_ffff_match);
+      emit_insn (gen_rtx_SET (VOIDmode, dest,
+			      GEN_INT (val | mask | 0xffffffff00000000ull)));
+
+      /* Now insert other two quarters.	 */
+      for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1);
+	   i < 64; i += 16, mask <<= 16)
 	{
 	  if ((val & mask) != mask)
-	    {
-	      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
-	      emit_insn (gen_insv_immdi (dest, GEN_INT (i),
-					 GEN_INT ((val >> i) & 0xffff)));
-	      return;
-	    }
+	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
+				       GEN_INT ((val >> i) & 0xffff)));
 	}
-      gcc_unreachable ();
+      return;
     }
 
   if (zero_match == 2)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
  2014-08-07 12:57       ` Kyrill Tkachov
@ 2014-08-07 13:31         ` Richard Earnshaw
  2014-08-07 19:33         ` Richard Henderson
  1 sibling, 0 replies; 10+ messages in thread
From: Richard Earnshaw @ 2014-08-07 13:31 UTC (permalink / raw)
  To: Kyrill Tkachov; +Cc: gcc-patches

On 07/08/14 13:57, Kyrill Tkachov wrote:
> 
> On 07/08/14 13:46, Richard Earnshaw wrote:
>> On 07/08/14 12:32, Kyrill Tkachov wrote:
>>> On 16/05/14 13:35, Richard Earnshaw wrote:
>>>> On 08/05/14 18:36, Ian Bolton wrote:
>>>>> Hi,
>>>>>
>>>>> It currently takes 4 instructions to generate certain immediates on
>>>>> AArch64 (unless we put them in the constant pool).
>>>>>
>>>>> For example ...
>>>>>
>>>>>     long long
>>>>>     ffffbeefcafebabe ()
>>>>>     {
>>>>>       return 0xFFFFBEEFCAFEBABEll;
>>>>>     }
>>>>>
>>>>> leads to ...
>>>>>
>>>>>     mov x0, 0x47806
>>>>>     mov x0, 0xcafe, lsl 16
>>>>>     mov x0, 0xbeef, lsl 32
>>>>>     orr x0, x0, -281474976710656
>>>>>
>>>>> The above case is tackled in this patch by employing MOVN
>>>>> to generate the top 32-bits in a single instruction ...
>>>>>
>>>>>     mov x0, -71536975282177
>>>>>     movk x0, 0xcafe, lsl 16
>>>>>     movk x0, 0xbabe, lsl 0
>>>>>
>>>>> Note that where at least two half-words are 0xffff, existing
>>>>> code that does the immediate in two instructions is still used.)
>>>>>
>>>>> Tested on standard gcc regressions and the attached test case.
>>>>>
>>>>> OK for commit?
>>>> What about:
>>>>
>>>> long long a()
>>>> {
>>>>     return 0x1234ffff56789abcll;
>>>> }
>>>>
>>>> long long b()
>>>> {
>>>>     return 0x12345678ffff9abcll;
>>>> }
>>>>
>>>> long long c()
>>>> {
>>>>     return 0x123456789abcffffll;
>>>> }
>>>>
>>>> ?
>>>>
>>>> Surely these can also benefit from this sort of optimization, but it
>>>> looks as though you only handle the top 16 bits being set.
>>> Hi Richard,
>>>
>>> How about this rework of the patch?
>>>
>>> For code:
>>>
>>> long long foo ()
>>> {
>>>     return 0xFFFFBEEFCAFEBABEll;
>>> }
>>>
>>> long long a()
>>> {
>>>     return 0x1234ffff56789abcll;
>>> }
>>>
>>> long long b()
>>> {
>>>     return 0x12345678ffff9abcll;
>>> }
>>>
>>> long long c()
>>> {
>>>     return 0x123456789abcffffll;
>>> }
>>>
>>> we now generate:
>>> foo:
>>>           mov     x0, -17730
>>>           movk    x0, 0xcafe, lsl 16
>>>           movk    x0, 0xbeef, lsl 32
>>>           ret
>>>           .size   foo, .-foo
>>>           .align  2
>>>           .global a
>>>           .type   a, %function
>>> a:
>>>           mov     x0, -25924
>>>           movk    x0, 0x5678, lsl 16
>>>           movk    x0, 0x1234, lsl 48
>>>           ret
>>>           .size   a, .-a
>>>           .align  2
>>>           .global b
>>>           .type   b, %function
>>> b:
>>>           mov     x0, -25924
>>>           movk    x0, 0x5678, lsl 32
>>>           movk    x0, 0x1234, lsl 48
>>>           ret
>>>           .size   b, .-b
>>>           .align  2
>>>           .global c
>>>           .type   c, %function
>>> c:
>>>           mov     x0, -1698889729
>>>           movk    x0, 0x5678, lsl 32
>>>           movk    x0, 0x1234, lsl 48
>>>           ret
>>>
>>>
>>> 3 instructions are used in each case.
>>>
>>> Thanks,
>>> Kyrill
>>>
>>> 2014-08-07  Ian Bolton  <ian.bolton@arm.com>
>>>                       Kyrylo Tkachov  <kyrylo.tkachov@arm.com>
>>>
>>>           * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
>>>           Use MOVN when one of the half-words is 0xffff.
>>>
>>>
>>> aarch64-movn-pattern-patch-v3.patch
>>>
>>>
>>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>>> index 0a7f441..2db91c7 100644
>>> --- a/gcc/config/aarch64/aarch64.c
>>> +++ b/gcc/config/aarch64/aarch64.c
>>> @@ -1005,7 +1005,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>>>     unsigned HOST_WIDE_INT val;
>>>     bool subtargets;
>>>     rtx subtarget;
>>> -  int one_match, zero_match;
>>> +  int one_match, zero_match, first_not_ffff_match;
>>>   
>>>     gcc_assert (mode == SImode || mode == DImode);
>>>   
>>> @@ -1106,29 +1106,48 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
>>>     one_match = 0;
>>>     zero_match = 0;
>>>     mask = 0xffff;
>>> +  first_not_ffff_match = -1;
>>>   
>>>     for (i = 0; i < 64; i += 16, mask <<= 16)
>>>       {
>>> -      if ((val & mask) == 0)
>>> -	zero_match++;
>>> -      else if ((val & mask) == mask)
>>> +      if ((val & mask) == mask)
>>>   	one_match++;
>>> +      else
>>> +	{
>>> +	  if (first_not_ffff_match < 0)
>>> +	    first_not_ffff_match = i;
>>> +	  if ((val & mask) == 0)
>>> +	    zero_match++;
>>> +	}
>>>       }
>>>   
>>>     if (one_match == 2)
>>>       {
>>> -      mask = 0xffff;
>>> -      for (i = 0; i < 64; i += 16, mask <<= 16)
>>> +      /* Set one of the quarters and then insert back into result.  */
>>> +      mask = 0xffffll << first_not_ffff_match;
>>> +      emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask)));
>>> +      emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match),
>>> +				 GEN_INT ((val >> first_not_ffff_match)
>>> +					  & 0xffff)));
>>> +      return;
>>> +    }
>>> +
>>> +  if (one_match == 1)
>> I think this should be (one_match > zero_match).
>>
>> Otherwise constants such as
>>
>>
>>    0x00001234ffff0000ll
>>
>> might end up taking three rather than two insns.
> 
> You're right, we generate:
>          mov     x0, -65536
>          movk    x0, 0x1234, lsl 32
>          and     x0, x0, 281474976710655
> 
> with your suggestion we can improve this to:
>          mov     x0, 4294901760
>          movk    x0, 0x1234, lsl 32
> 
> Ok with that change then?
> 
> Kyrill
> 
> 2014-08-07  Ian Bolton<ian.bolton@arm.com>
>              Kyrylo Tkachov<kyrylo.tkachov@arm.com>
> 
>           * config/aarch64/aarch64.c (aarch64_expand_mov_immediate):
>           Use MOVN when one of the half-words is 0xffff.
> 
> 

OK.

R.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
  2014-08-07 12:57       ` Kyrill Tkachov
  2014-08-07 13:31         ` Richard Earnshaw
@ 2014-08-07 19:33         ` Richard Henderson
  2014-08-13 15:30           ` Kyrill Tkachov
  1 sibling, 1 reply; 10+ messages in thread
From: Richard Henderson @ 2014-08-07 19:33 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Earnshaw; +Cc: gcc-patches

On 08/07/2014 02:57 AM, Kyrill Tkachov wrote:
> +  if (one_match > zero_match)
> +    {
> +      /* Set either first three quarters or all but the third.	 */
> +      mask = 0xffffll << (16 - first_not_ffff_match);
> +      emit_insn (gen_rtx_SET (VOIDmode, dest,
> +			      GEN_INT (val | mask | 0xffffffff00000000ull)));
> +
> +      /* Now insert other two quarters.	 */
> +      for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1);
> +	   i < 64; i += 16, mask <<= 16)
>  	{
>  	  if ((val & mask) != mask)
> +	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
> +				       GEN_INT ((val >> i) & 0xffff)));
>  	}
> +      return;
>      }
>  
>    if (zero_match == 2)

You should not place this three instruction sequence before the two instruction
sequences that follow.  I.e. place this just before simple_sequence.

I do wonder if we should be memo-izing these computations so that we only have
to do the complex search for a sequence only once for each constant...


r~

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
  2014-08-07 19:33         ` Richard Henderson
@ 2014-08-13 15:30           ` Kyrill Tkachov
  2014-08-14 17:31             ` Richard Henderson
  0 siblings, 1 reply; 10+ messages in thread
From: Kyrill Tkachov @ 2014-08-13 15:30 UTC (permalink / raw)
  To: Richard Henderson, Richard Earnshaw; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1825 bytes --]


On 07/08/14 20:32, Richard Henderson wrote:
> On 08/07/2014 02:57 AM, Kyrill Tkachov wrote:
>> +  if (one_match > zero_match)
>> +    {
>> +      /* Set either first three quarters or all but the third.	 */
>> +      mask = 0xffffll << (16 - first_not_ffff_match);
>> +      emit_insn (gen_rtx_SET (VOIDmode, dest,
>> +			      GEN_INT (val | mask | 0xffffffff00000000ull)));
>> +
>> +      /* Now insert other two quarters.	 */
>> +      for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1);
>> +	   i < 64; i += 16, mask <<= 16)
>>   	{
>>   	  if ((val & mask) != mask)
>> +	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
>> +				       GEN_INT ((val >> i) & 0xffff)));
>>   	}
>> +      return;
>>       }
>>   
>>     if (zero_match == 2)
> You should not place this three instruction sequence before the two instruction
> sequences that follow.  I.e. place this just before simple_sequence.
Hi Richard,

Is the attached patch ok? It just moves the section as you suggested. I 
did a build of the Linux kernel with and without this patch to make sure 
no code-gen was accidentally affected.

>
> I do wonder if we should be memo-izing these computations so that we only have
> to do the complex search for a sequence only once for each constant...

We'd need to store a mapping from constant to RTXes and everytime we 
have a "cache hit" we'd have to tweak them to make sure the registers 
involved are correct. I had a quick play with this but ended up with LRA 
ICEs  :(
Might look at it later on, but it's not high on my priorities right now.

Thanks,
Kyrill

2014-08-13  Kyrylo Tkachov  <kyrylo.tkachov@arm.com>

     * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Move
     one_match > zero_match case to just before simple_sequence.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: aarch64-movn-mov.patch --]
[-- Type: text/x-patch; name=aarch64-movn-mov.patch, Size: 1689 bytes --]

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 20debb9..a4e7158 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1136,24 +1136,6 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
       return;
     }
 
-  if (one_match > zero_match)
-    {
-      /* Set either first three quarters or all but the third.	 */
-      mask = 0xffffll << (16 - first_not_ffff_match);
-      emit_insn (gen_rtx_SET (VOIDmode, dest,
-			      GEN_INT (val | mask | 0xffffffff00000000ull)));
-
-      /* Now insert other two quarters.	 */
-      for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1);
-	   i < 64; i += 16, mask <<= 16)
-	{
-	  if ((val & mask) != mask)
-	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
-				       GEN_INT ((val >> i) & 0xffff)));
-	}
-      return;
-    }
-
   if (zero_match == 2)
     goto simple_sequence;
 
@@ -1270,6 +1252,24 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm)
 	}
     }
 
+  if (one_match > zero_match)
+    {
+      /* Set either first three quarters or all but the third.	 */
+      mask = 0xffffll << (16 - first_not_ffff_match);
+      emit_insn (gen_rtx_SET (VOIDmode, dest,
+			      GEN_INT (val | mask | 0xffffffff00000000ull)));
+
+      /* Now insert other two quarters.	 */
+      for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1);
+	   i < 64; i += 16, mask <<= 16)
+	{
+	  if ((val & mask) != mask)
+	    emit_insn (gen_insv_immdi (dest, GEN_INT (i),
+				       GEN_INT ((val >> i) & 0xffff)));
+	}
+      return;
+    }
+
  simple_sequence:
   first = true;
   mask = 0xffff;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
  2014-08-13 15:30           ` Kyrill Tkachov
@ 2014-08-14 17:31             ` Richard Henderson
  0 siblings, 0 replies; 10+ messages in thread
From: Richard Henderson @ 2014-08-14 17:31 UTC (permalink / raw)
  To: Kyrill Tkachov, Richard Earnshaw; +Cc: gcc-patches

On 08/13/2014 05:29 AM, Kyrill Tkachov wrote:
> Is the attached patch ok? It just moves the section as you suggested. I did a
> build of the Linux kernel with and without this patch to make sure no code-gen
> was accidentally affected.

Looks good.

> We'd need to store a mapping from constant to RTXes and everytime we have a
> "cache hit" we'd have to tweak them to make sure the registers involved are
> correct. I had a quick play with this but ended up with LRA ICEs  :(

You mis-understand how I meant to memoize.  Have a look at how we cache
expansions for multiply in synth_mult: we don't record registers or rtx's, but
we do record the operation and arguments.

So you could consider building a trie, indexed by a hash.

struct imm_algorithm
{
  HOST_WIDE_INT op1;
  imm_algorithm *prev;

  enum operation {
    // op1 is accepted by aarch64_mov_operand.
    // prev should be null.
    mov,
    // op1 is to be inserted at the given position
    // to the value constructed by prev.
    movk_48, movk_32, movk_16, movk_0,
    // op1 is an arithmetic immediate to be applied
    // to the value constructed by prev
    add, sub,
    // op1 is a logical immediate to be applied to
    // the value constructed by prev
    and, ior,
  } code;
};


r~


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-08-14 17:31 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-08 17:36 [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible Ian Bolton
2014-05-16  9:17 ` Ian Bolton
2014-05-16 12:35 ` Richard Earnshaw
2014-08-07 11:32   ` Kyrill Tkachov
2014-08-07 12:46     ` Richard Earnshaw
2014-08-07 12:57       ` Kyrill Tkachov
2014-08-07 13:31         ` Richard Earnshaw
2014-08-07 19:33         ` Richard Henderson
2014-08-13 15:30           ` Kyrill Tkachov
2014-08-14 17:31             ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).