* [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible @ 2014-05-08 17:36 Ian Bolton 2014-05-16 9:17 ` Ian Bolton 2014-05-16 12:35 ` Richard Earnshaw 0 siblings, 2 replies; 10+ messages in thread From: Ian Bolton @ 2014-05-08 17:36 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: text/plain, Size: 1030 bytes --] Hi, It currently takes 4 instructions to generate certain immediates on AArch64 (unless we put them in the constant pool). For example ... long long ffffbeefcafebabe () { return 0xFFFFBEEFCAFEBABEll; } leads to ... mov x0, 0x47806 mov x0, 0xcafe, lsl 16 mov x0, 0xbeef, lsl 32 orr x0, x0, -281474976710656 The above case is tackled in this patch by employing MOVN to generate the top 32-bits in a single instruction ... mov x0, -71536975282177 movk x0, 0xcafe, lsl 16 movk x0, 0xbabe, lsl 0 Note that where at least two half-words are 0xffff, existing code that does the immediate in two instructions is still used.) Tested on standard gcc regressions and the attached test case. OK for commit? Cheers, Ian 2014-05-08 Ian Bolton <ian.bolton@arm.com> gcc/ * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Use MOVN when top-most half-word (and only that half-word) is 0xffff. gcc/testsuite/ * gcc.target/aarch64/movn_1.c: New test. [-- Attachment #2: aarch64-movn-exploitation-patch-v5.txt --] [-- Type: text/plain, Size: 1576 bytes --] diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 43a83566..a8e504e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1177,6 +1177,18 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) } } + /* Look for case where upper 16 bits are set, so we can use MOVN. */ + if ((val & 0xffff000000000000ll) == 0xffff000000000000ll) + { + emit_insn (gen_rtx_SET (VOIDmode, dest, + GEN_INT (~ (~val & (0xffffll << 32))))); + emit_insn (gen_insv_immdi (dest, GEN_INT (16), + GEN_INT ((val >> 16) & 0xffff))); + emit_insn (gen_insv_immdi (dest, GEN_INT (0), + GEN_INT (val & 0xffff))); + return; + } + simple_sequence: first = true; mask = 0xffff; diff --git a/gcc/testsuite/gcc.target/aarch64/movn_1.c b/gcc/testsuite/gcc.target/aarch64/movn_1.c new file mode 100644 index 0000000..cc11ade --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movn_1.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options "-O2 -fno-inline --save-temps" } */ + +extern void abort (void); + +long long +foo () +{ + /* { dg-final { scan-assembler "mov\tx\[0-9\]+, -71536975282177" } } */ + return 0xffffbeefcafebabell; +} + +long long +merge4 (int a, int b, int c, int d) +{ + return ((long long) a << 48 | (long long) b << 32 + | (long long) c << 16 | (long long) d); +} + +int main () +{ + if (foo () != merge4 (0xffff, 0xbeef, 0xcafe, 0xbabe)) + abort (); + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ ^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible 2014-05-08 17:36 [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible Ian Bolton @ 2014-05-16 9:17 ` Ian Bolton 2014-05-16 12:35 ` Richard Earnshaw 1 sibling, 0 replies; 10+ messages in thread From: Ian Bolton @ 2014-05-16 9:17 UTC (permalink / raw) To: Ian Bolton, gcc-patches Ping. This should be relatively simple to review. Many thanks. > -----Original Message----- > From: gcc-patches-owner@gcc.gnu.org [mailto:gcc-patches- > owner@gcc.gnu.org] On Behalf Of Ian Bolton > Sent: 08 May 2014 18:36 > To: gcc-patches > Subject: [PATCH, AArch64] Use MOVN to generate 64-bit negative > immediates where sensible > > Hi, > > It currently takes 4 instructions to generate certain immediates on > AArch64 (unless we put them in the constant pool). > > For example ... > > long long > ffffbeefcafebabe () > { > return 0xFFFFBEEFCAFEBABEll; > } > > leads to ... > > mov x0, 0x47806 > mov x0, 0xcafe, lsl 16 > mov x0, 0xbeef, lsl 32 > orr x0, x0, -281474976710656 > > The above case is tackled in this patch by employing MOVN > to generate the top 32-bits in a single instruction ... > > mov x0, -71536975282177 > movk x0, 0xcafe, lsl 16 > movk x0, 0xbabe, lsl 0 > > Note that where at least two half-words are 0xffff, existing > code that does the immediate in two instructions is still used.) > > Tested on standard gcc regressions and the attached test case. > > OK for commit? > > Cheers, > Ian > > > 2014-05-08 Ian Bolton <ian.bolton@arm.com> > > gcc/ > * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): > Use MOVN when top-most half-word (and only that half-word) > is 0xffff. > gcc/testsuite/ > * gcc.target/aarch64/movn_1.c: New test. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible 2014-05-08 17:36 [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible Ian Bolton 2014-05-16 9:17 ` Ian Bolton @ 2014-05-16 12:35 ` Richard Earnshaw 2014-08-07 11:32 ` Kyrill Tkachov 1 sibling, 1 reply; 10+ messages in thread From: Richard Earnshaw @ 2014-05-16 12:35 UTC (permalink / raw) To: Ian Bolton; +Cc: gcc-patches On 08/05/14 18:36, Ian Bolton wrote: > Hi, > > It currently takes 4 instructions to generate certain immediates on > AArch64 (unless we put them in the constant pool). > > For example ... > > long long > ffffbeefcafebabe () > { > return 0xFFFFBEEFCAFEBABEll; > } > > leads to ... > > mov x0, 0x47806 > mov x0, 0xcafe, lsl 16 > mov x0, 0xbeef, lsl 32 > orr x0, x0, -281474976710656 > > The above case is tackled in this patch by employing MOVN > to generate the top 32-bits in a single instruction ... > > mov x0, -71536975282177 > movk x0, 0xcafe, lsl 16 > movk x0, 0xbabe, lsl 0 > > Note that where at least two half-words are 0xffff, existing > code that does the immediate in two instructions is still used.) > > Tested on standard gcc regressions and the attached test case. > > OK for commit? What about: long long a() { return 0x1234ffff56789abcll; } long long b() { return 0x12345678ffff9abcll; } long long c() { return 0x123456789abcffffll; } ? Surely these can also benefit from this sort of optimization, but it looks as though you only handle the top 16 bits being set. R. > > Cheers, > Ian > > > 2014-05-08 Ian Bolton <ian.bolton@arm.com> > > gcc/ > * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): > Use MOVN when top-most half-word (and only that half-word) > is 0xffff. > gcc/testsuite/ > * gcc.target/aarch64/movn_1.c: New test. > > > aarch64-movn-exploitation-patch-v5.txt > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 43a83566..a8e504e 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -1177,6 +1177,18 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) > } > } > > + /* Look for case where upper 16 bits are set, so we can use MOVN. */ > + if ((val & 0xffff000000000000ll) == 0xffff000000000000ll) > + { > + emit_insn (gen_rtx_SET (VOIDmode, dest, > + GEN_INT (~ (~val & (0xffffll << 32))))); > + emit_insn (gen_insv_immdi (dest, GEN_INT (16), > + GEN_INT ((val >> 16) & 0xffff))); > + emit_insn (gen_insv_immdi (dest, GEN_INT (0), > + GEN_INT (val & 0xffff))); > + return; > + } > + > simple_sequence: > first = true; > mask = 0xffff; > diff --git a/gcc/testsuite/gcc.target/aarch64/movn_1.c b/gcc/testsuite/gcc.target/aarch64/movn_1.c > new file mode 100644 > index 0000000..cc11ade > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/movn_1.c > @@ -0,0 +1,27 @@ > +/* { dg-do run } */ > +/* { dg-options "-O2 -fno-inline --save-temps" } */ > + > +extern void abort (void); > + > +long long > +foo () > +{ > + /* { dg-final { scan-assembler "mov\tx\[0-9\]+, -71536975282177" } } */ > + return 0xffffbeefcafebabell; > +} > + > +long long > +merge4 (int a, int b, int c, int d) > +{ > + return ((long long) a << 48 | (long long) b << 32 > + | (long long) c << 16 | (long long) d); > +} > + > +int main () > +{ > + if (foo () != merge4 (0xffff, 0xbeef, 0xcafe, 0xbabe)) > + abort (); > + return 0; > +} > + > +/* { dg-final { cleanup-saved-temps } } */ > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible 2014-05-16 12:35 ` Richard Earnshaw @ 2014-08-07 11:32 ` Kyrill Tkachov 2014-08-07 12:46 ` Richard Earnshaw 0 siblings, 1 reply; 10+ messages in thread From: Kyrill Tkachov @ 2014-08-07 11:32 UTC (permalink / raw) To: Richard Earnshaw; +Cc: gcc-patches [-- Attachment #1: Type: text/plain, Size: 2704 bytes --] On 16/05/14 13:35, Richard Earnshaw wrote: > On 08/05/14 18:36, Ian Bolton wrote: >> Hi, >> >> It currently takes 4 instructions to generate certain immediates on >> AArch64 (unless we put them in the constant pool). >> >> For example ... >> >> long long >> ffffbeefcafebabe () >> { >> return 0xFFFFBEEFCAFEBABEll; >> } >> >> leads to ... >> >> mov x0, 0x47806 >> mov x0, 0xcafe, lsl 16 >> mov x0, 0xbeef, lsl 32 >> orr x0, x0, -281474976710656 >> >> The above case is tackled in this patch by employing MOVN >> to generate the top 32-bits in a single instruction ... >> >> mov x0, -71536975282177 >> movk x0, 0xcafe, lsl 16 >> movk x0, 0xbabe, lsl 0 >> >> Note that where at least two half-words are 0xffff, existing >> code that does the immediate in two instructions is still used.) >> >> Tested on standard gcc regressions and the attached test case. >> >> OK for commit? > What about: > > long long a() > { > return 0x1234ffff56789abcll; > } > > long long b() > { > return 0x12345678ffff9abcll; > } > > long long c() > { > return 0x123456789abcffffll; > } > > ? > > Surely these can also benefit from this sort of optimization, but it > looks as though you only handle the top 16 bits being set. Hi Richard, How about this rework of the patch? For code: long long foo () { return 0xFFFFBEEFCAFEBABEll; } long long a() { return 0x1234ffff56789abcll; } long long b() { return 0x12345678ffff9abcll; } long long c() { return 0x123456789abcffffll; } we now generate: foo: mov x0, -17730 movk x0, 0xcafe, lsl 16 movk x0, 0xbeef, lsl 32 ret .size foo, .-foo .align 2 .global a .type a, %function a: mov x0, -25924 movk x0, 0x5678, lsl 16 movk x0, 0x1234, lsl 48 ret .size a, .-a .align 2 .global b .type b, %function b: mov x0, -25924 movk x0, 0x5678, lsl 32 movk x0, 0x1234, lsl 48 ret .size b, .-b .align 2 .global c .type c, %function c: mov x0, -1698889729 movk x0, 0x5678, lsl 32 movk x0, 0x1234, lsl 48 ret 3 instructions are used in each case. Thanks, Kyrill 2014-08-07 Ian Bolton <ian.bolton@arm.com> Kyrylo Tkachov <kyrylo.tkachov@arm.com> * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Use MOVN when one of the half-words is 0xffff. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: aarch64-movn-pattern-patch-v3.patch --] [-- Type: text/x-patch; name=aarch64-movn-pattern-patch-v3.patch, Size: 2290 bytes --] diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0a7f441..2db91c7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1005,7 +1005,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) unsigned HOST_WIDE_INT val; bool subtargets; rtx subtarget; - int one_match, zero_match; + int one_match, zero_match, first_not_ffff_match; gcc_assert (mode == SImode || mode == DImode); @@ -1106,29 +1106,48 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) one_match = 0; zero_match = 0; mask = 0xffff; + first_not_ffff_match = -1; for (i = 0; i < 64; i += 16, mask <<= 16) { - if ((val & mask) == 0) - zero_match++; - else if ((val & mask) == mask) + if ((val & mask) == mask) one_match++; + else + { + if (first_not_ffff_match < 0) + first_not_ffff_match = i; + if ((val & mask) == 0) + zero_match++; + } } if (one_match == 2) { - mask = 0xffff; - for (i = 0; i < 64; i += 16, mask <<= 16) + /* Set one of the quarters and then insert back into result. */ + mask = 0xffffll << first_not_ffff_match; + emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask))); + emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match), + GEN_INT ((val >> first_not_ffff_match) + & 0xffff))); + return; + } + + if (one_match == 1) + { + /* Set either first three quarters or all but the third. */ + mask = 0xffffll << (16 - first_not_ffff_match); + emit_insn (gen_rtx_SET (VOIDmode, dest, + GEN_INT (val | mask | 0xffffffff00000000ull))); + + /* Now insert other two quarters. */ + for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1); + i < 64; i += 16, mask <<= 16) { if ((val & mask) != mask) - { - emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask))); - emit_insn (gen_insv_immdi (dest, GEN_INT (i), - GEN_INT ((val >> i) & 0xffff))); - return; - } + emit_insn (gen_insv_immdi (dest, GEN_INT (i), + GEN_INT ((val >> i) & 0xffff))); } - gcc_unreachable (); + return; } if (zero_match == 2) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible 2014-08-07 11:32 ` Kyrill Tkachov @ 2014-08-07 12:46 ` Richard Earnshaw 2014-08-07 12:57 ` Kyrill Tkachov 0 siblings, 1 reply; 10+ messages in thread From: Richard Earnshaw @ 2014-08-07 12:46 UTC (permalink / raw) To: Kyrill Tkachov; +Cc: gcc-patches On 07/08/14 12:32, Kyrill Tkachov wrote: > > On 16/05/14 13:35, Richard Earnshaw wrote: >> On 08/05/14 18:36, Ian Bolton wrote: >>> Hi, >>> >>> It currently takes 4 instructions to generate certain immediates on >>> AArch64 (unless we put them in the constant pool). >>> >>> For example ... >>> >>> long long >>> ffffbeefcafebabe () >>> { >>> return 0xFFFFBEEFCAFEBABEll; >>> } >>> >>> leads to ... >>> >>> mov x0, 0x47806 >>> mov x0, 0xcafe, lsl 16 >>> mov x0, 0xbeef, lsl 32 >>> orr x0, x0, -281474976710656 >>> >>> The above case is tackled in this patch by employing MOVN >>> to generate the top 32-bits in a single instruction ... >>> >>> mov x0, -71536975282177 >>> movk x0, 0xcafe, lsl 16 >>> movk x0, 0xbabe, lsl 0 >>> >>> Note that where at least two half-words are 0xffff, existing >>> code that does the immediate in two instructions is still used.) >>> >>> Tested on standard gcc regressions and the attached test case. >>> >>> OK for commit? >> What about: >> >> long long a() >> { >> return 0x1234ffff56789abcll; >> } >> >> long long b() >> { >> return 0x12345678ffff9abcll; >> } >> >> long long c() >> { >> return 0x123456789abcffffll; >> } >> >> ? >> >> Surely these can also benefit from this sort of optimization, but it >> looks as though you only handle the top 16 bits being set. > > Hi Richard, > > How about this rework of the patch? > > For code: > > long long foo () > { > return 0xFFFFBEEFCAFEBABEll; > } > > long long a() > { > return 0x1234ffff56789abcll; > } > > long long b() > { > return 0x12345678ffff9abcll; > } > > long long c() > { > return 0x123456789abcffffll; > } > > we now generate: > foo: > mov x0, -17730 > movk x0, 0xcafe, lsl 16 > movk x0, 0xbeef, lsl 32 > ret > .size foo, .-foo > .align 2 > .global a > .type a, %function > a: > mov x0, -25924 > movk x0, 0x5678, lsl 16 > movk x0, 0x1234, lsl 48 > ret > .size a, .-a > .align 2 > .global b > .type b, %function > b: > mov x0, -25924 > movk x0, 0x5678, lsl 32 > movk x0, 0x1234, lsl 48 > ret > .size b, .-b > .align 2 > .global c > .type c, %function > c: > mov x0, -1698889729 > movk x0, 0x5678, lsl 32 > movk x0, 0x1234, lsl 48 > ret > > > 3 instructions are used in each case. > > Thanks, > Kyrill > > 2014-08-07 Ian Bolton <ian.bolton@arm.com> > Kyrylo Tkachov <kyrylo.tkachov@arm.com> > > * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): > Use MOVN when one of the half-words is 0xffff. > > > aarch64-movn-pattern-patch-v3.patch > > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 0a7f441..2db91c7 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -1005,7 +1005,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) > unsigned HOST_WIDE_INT val; > bool subtargets; > rtx subtarget; > - int one_match, zero_match; > + int one_match, zero_match, first_not_ffff_match; > > gcc_assert (mode == SImode || mode == DImode); > > @@ -1106,29 +1106,48 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) > one_match = 0; > zero_match = 0; > mask = 0xffff; > + first_not_ffff_match = -1; > > for (i = 0; i < 64; i += 16, mask <<= 16) > { > - if ((val & mask) == 0) > - zero_match++; > - else if ((val & mask) == mask) > + if ((val & mask) == mask) > one_match++; > + else > + { > + if (first_not_ffff_match < 0) > + first_not_ffff_match = i; > + if ((val & mask) == 0) > + zero_match++; > + } > } > > if (one_match == 2) > { > - mask = 0xffff; > - for (i = 0; i < 64; i += 16, mask <<= 16) > + /* Set one of the quarters and then insert back into result. */ > + mask = 0xffffll << first_not_ffff_match; > + emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask))); > + emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match), > + GEN_INT ((val >> first_not_ffff_match) > + & 0xffff))); > + return; > + } > + > + if (one_match == 1) I think this should be (one_match > zero_match). Otherwise constants such as 0x00001234ffff0000ll might end up taking three rather than two insns. R. > + { > + /* Set either first three quarters or all but the third. */ > + mask = 0xffffll << (16 - first_not_ffff_match); > + emit_insn (gen_rtx_SET (VOIDmode, dest, > + GEN_INT (val | mask | 0xffffffff00000000ull))); > + > + /* Now insert other two quarters. */ > + for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1); > + i < 64; i += 16, mask <<= 16) > { > if ((val & mask) != mask) > - { > - emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask))); > - emit_insn (gen_insv_immdi (dest, GEN_INT (i), > - GEN_INT ((val >> i) & 0xffff))); > - return; > - } > + emit_insn (gen_insv_immdi (dest, GEN_INT (i), > + GEN_INT ((val >> i) & 0xffff))); > } > - gcc_unreachable (); > + return; > } > > if (zero_match == 2) > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible 2014-08-07 12:46 ` Richard Earnshaw @ 2014-08-07 12:57 ` Kyrill Tkachov 2014-08-07 13:31 ` Richard Earnshaw 2014-08-07 19:33 ` Richard Henderson 0 siblings, 2 replies; 10+ messages in thread From: Kyrill Tkachov @ 2014-08-07 12:57 UTC (permalink / raw) To: Richard Earnshaw; +Cc: gcc-patches [-- Attachment #1: Type: text/plain, Size: 5533 bytes --] On 07/08/14 13:46, Richard Earnshaw wrote: > On 07/08/14 12:32, Kyrill Tkachov wrote: >> On 16/05/14 13:35, Richard Earnshaw wrote: >>> On 08/05/14 18:36, Ian Bolton wrote: >>>> Hi, >>>> >>>> It currently takes 4 instructions to generate certain immediates on >>>> AArch64 (unless we put them in the constant pool). >>>> >>>> For example ... >>>> >>>> long long >>>> ffffbeefcafebabe () >>>> { >>>> return 0xFFFFBEEFCAFEBABEll; >>>> } >>>> >>>> leads to ... >>>> >>>> mov x0, 0x47806 >>>> mov x0, 0xcafe, lsl 16 >>>> mov x0, 0xbeef, lsl 32 >>>> orr x0, x0, -281474976710656 >>>> >>>> The above case is tackled in this patch by employing MOVN >>>> to generate the top 32-bits in a single instruction ... >>>> >>>> mov x0, -71536975282177 >>>> movk x0, 0xcafe, lsl 16 >>>> movk x0, 0xbabe, lsl 0 >>>> >>>> Note that where at least two half-words are 0xffff, existing >>>> code that does the immediate in two instructions is still used.) >>>> >>>> Tested on standard gcc regressions and the attached test case. >>>> >>>> OK for commit? >>> What about: >>> >>> long long a() >>> { >>> return 0x1234ffff56789abcll; >>> } >>> >>> long long b() >>> { >>> return 0x12345678ffff9abcll; >>> } >>> >>> long long c() >>> { >>> return 0x123456789abcffffll; >>> } >>> >>> ? >>> >>> Surely these can also benefit from this sort of optimization, but it >>> looks as though you only handle the top 16 bits being set. >> Hi Richard, >> >> How about this rework of the patch? >> >> For code: >> >> long long foo () >> { >> return 0xFFFFBEEFCAFEBABEll; >> } >> >> long long a() >> { >> return 0x1234ffff56789abcll; >> } >> >> long long b() >> { >> return 0x12345678ffff9abcll; >> } >> >> long long c() >> { >> return 0x123456789abcffffll; >> } >> >> we now generate: >> foo: >> mov x0, -17730 >> movk x0, 0xcafe, lsl 16 >> movk x0, 0xbeef, lsl 32 >> ret >> .size foo, .-foo >> .align 2 >> .global a >> .type a, %function >> a: >> mov x0, -25924 >> movk x0, 0x5678, lsl 16 >> movk x0, 0x1234, lsl 48 >> ret >> .size a, .-a >> .align 2 >> .global b >> .type b, %function >> b: >> mov x0, -25924 >> movk x0, 0x5678, lsl 32 >> movk x0, 0x1234, lsl 48 >> ret >> .size b, .-b >> .align 2 >> .global c >> .type c, %function >> c: >> mov x0, -1698889729 >> movk x0, 0x5678, lsl 32 >> movk x0, 0x1234, lsl 48 >> ret >> >> >> 3 instructions are used in each case. >> >> Thanks, >> Kyrill >> >> 2014-08-07 Ian Bolton <ian.bolton@arm.com> >> Kyrylo Tkachov <kyrylo.tkachov@arm.com> >> >> * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): >> Use MOVN when one of the half-words is 0xffff. >> >> >> aarch64-movn-pattern-patch-v3.patch >> >> >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index 0a7f441..2db91c7 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -1005,7 +1005,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) >> unsigned HOST_WIDE_INT val; >> bool subtargets; >> rtx subtarget; >> - int one_match, zero_match; >> + int one_match, zero_match, first_not_ffff_match; >> >> gcc_assert (mode == SImode || mode == DImode); >> >> @@ -1106,29 +1106,48 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) >> one_match = 0; >> zero_match = 0; >> mask = 0xffff; >> + first_not_ffff_match = -1; >> >> for (i = 0; i < 64; i += 16, mask <<= 16) >> { >> - if ((val & mask) == 0) >> - zero_match++; >> - else if ((val & mask) == mask) >> + if ((val & mask) == mask) >> one_match++; >> + else >> + { >> + if (first_not_ffff_match < 0) >> + first_not_ffff_match = i; >> + if ((val & mask) == 0) >> + zero_match++; >> + } >> } >> >> if (one_match == 2) >> { >> - mask = 0xffff; >> - for (i = 0; i < 64; i += 16, mask <<= 16) >> + /* Set one of the quarters and then insert back into result. */ >> + mask = 0xffffll << first_not_ffff_match; >> + emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask))); >> + emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match), >> + GEN_INT ((val >> first_not_ffff_match) >> + & 0xffff))); >> + return; >> + } >> + >> + if (one_match == 1) > I think this should be (one_match > zero_match). > > Otherwise constants such as > > > 0x00001234ffff0000ll > > might end up taking three rather than two insns. You're right, we generate: mov x0, -65536 movk x0, 0x1234, lsl 32 and x0, x0, 281474976710655 with your suggestion we can improve this to: mov x0, 4294901760 movk x0, 0x1234, lsl 32 Ok with that change then? Kyrill 2014-08-07 Ian Bolton<ian.bolton@arm.com> Kyrylo Tkachov<kyrylo.tkachov@arm.com> * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Use MOVN when one of the half-words is 0xffff. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: aarch64-movn-pattern-patch-v3.patch --] [-- Type: text/x-patch; name=aarch64-movn-pattern-patch-v3.patch, Size: 2298 bytes --] diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 0a7f441..2db91c7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1005,7 +1005,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) unsigned HOST_WIDE_INT val; bool subtargets; rtx subtarget; - int one_match, zero_match; + int one_match, zero_match, first_not_ffff_match; gcc_assert (mode == SImode || mode == DImode); @@ -1106,29 +1106,48 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) one_match = 0; zero_match = 0; mask = 0xffff; + first_not_ffff_match = -1; for (i = 0; i < 64; i += 16, mask <<= 16) { - if ((val & mask) == 0) - zero_match++; - else if ((val & mask) == mask) + if ((val & mask) == mask) one_match++; + else + { + if (first_not_ffff_match < 0) + first_not_ffff_match = i; + if ((val & mask) == 0) + zero_match++; + } } if (one_match == 2) { - mask = 0xffff; - for (i = 0; i < 64; i += 16, mask <<= 16) + /* Set one of the quarters and then insert back into result. */ + mask = 0xffffll << first_not_ffff_match; + emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask))); + emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match), + GEN_INT ((val >> first_not_ffff_match) + & 0xffff))); + return; + } + + if (one_match > zero_match) + { + /* Set either first three quarters or all but the third. */ + mask = 0xffffll << (16 - first_not_ffff_match); + emit_insn (gen_rtx_SET (VOIDmode, dest, + GEN_INT (val | mask | 0xffffffff00000000ull))); + + /* Now insert other two quarters. */ + for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1); + i < 64; i += 16, mask <<= 16) { if ((val & mask) != mask) - { - emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask))); - emit_insn (gen_insv_immdi (dest, GEN_INT (i), - GEN_INT ((val >> i) & 0xffff))); - return; - } + emit_insn (gen_insv_immdi (dest, GEN_INT (i), + GEN_INT ((val >> i) & 0xffff))); } - gcc_unreachable (); + return; } if (zero_match == 2) ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible 2014-08-07 12:57 ` Kyrill Tkachov @ 2014-08-07 13:31 ` Richard Earnshaw 2014-08-07 19:33 ` Richard Henderson 1 sibling, 0 replies; 10+ messages in thread From: Richard Earnshaw @ 2014-08-07 13:31 UTC (permalink / raw) To: Kyrill Tkachov; +Cc: gcc-patches On 07/08/14 13:57, Kyrill Tkachov wrote: > > On 07/08/14 13:46, Richard Earnshaw wrote: >> On 07/08/14 12:32, Kyrill Tkachov wrote: >>> On 16/05/14 13:35, Richard Earnshaw wrote: >>>> On 08/05/14 18:36, Ian Bolton wrote: >>>>> Hi, >>>>> >>>>> It currently takes 4 instructions to generate certain immediates on >>>>> AArch64 (unless we put them in the constant pool). >>>>> >>>>> For example ... >>>>> >>>>> long long >>>>> ffffbeefcafebabe () >>>>> { >>>>> return 0xFFFFBEEFCAFEBABEll; >>>>> } >>>>> >>>>> leads to ... >>>>> >>>>> mov x0, 0x47806 >>>>> mov x0, 0xcafe, lsl 16 >>>>> mov x0, 0xbeef, lsl 32 >>>>> orr x0, x0, -281474976710656 >>>>> >>>>> The above case is tackled in this patch by employing MOVN >>>>> to generate the top 32-bits in a single instruction ... >>>>> >>>>> mov x0, -71536975282177 >>>>> movk x0, 0xcafe, lsl 16 >>>>> movk x0, 0xbabe, lsl 0 >>>>> >>>>> Note that where at least two half-words are 0xffff, existing >>>>> code that does the immediate in two instructions is still used.) >>>>> >>>>> Tested on standard gcc regressions and the attached test case. >>>>> >>>>> OK for commit? >>>> What about: >>>> >>>> long long a() >>>> { >>>> return 0x1234ffff56789abcll; >>>> } >>>> >>>> long long b() >>>> { >>>> return 0x12345678ffff9abcll; >>>> } >>>> >>>> long long c() >>>> { >>>> return 0x123456789abcffffll; >>>> } >>>> >>>> ? >>>> >>>> Surely these can also benefit from this sort of optimization, but it >>>> looks as though you only handle the top 16 bits being set. >>> Hi Richard, >>> >>> How about this rework of the patch? >>> >>> For code: >>> >>> long long foo () >>> { >>> return 0xFFFFBEEFCAFEBABEll; >>> } >>> >>> long long a() >>> { >>> return 0x1234ffff56789abcll; >>> } >>> >>> long long b() >>> { >>> return 0x12345678ffff9abcll; >>> } >>> >>> long long c() >>> { >>> return 0x123456789abcffffll; >>> } >>> >>> we now generate: >>> foo: >>> mov x0, -17730 >>> movk x0, 0xcafe, lsl 16 >>> movk x0, 0xbeef, lsl 32 >>> ret >>> .size foo, .-foo >>> .align 2 >>> .global a >>> .type a, %function >>> a: >>> mov x0, -25924 >>> movk x0, 0x5678, lsl 16 >>> movk x0, 0x1234, lsl 48 >>> ret >>> .size a, .-a >>> .align 2 >>> .global b >>> .type b, %function >>> b: >>> mov x0, -25924 >>> movk x0, 0x5678, lsl 32 >>> movk x0, 0x1234, lsl 48 >>> ret >>> .size b, .-b >>> .align 2 >>> .global c >>> .type c, %function >>> c: >>> mov x0, -1698889729 >>> movk x0, 0x5678, lsl 32 >>> movk x0, 0x1234, lsl 48 >>> ret >>> >>> >>> 3 instructions are used in each case. >>> >>> Thanks, >>> Kyrill >>> >>> 2014-08-07 Ian Bolton <ian.bolton@arm.com> >>> Kyrylo Tkachov <kyrylo.tkachov@arm.com> >>> >>> * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): >>> Use MOVN when one of the half-words is 0xffff. >>> >>> >>> aarch64-movn-pattern-patch-v3.patch >>> >>> >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>> index 0a7f441..2db91c7 100644 >>> --- a/gcc/config/aarch64/aarch64.c >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -1005,7 +1005,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) >>> unsigned HOST_WIDE_INT val; >>> bool subtargets; >>> rtx subtarget; >>> - int one_match, zero_match; >>> + int one_match, zero_match, first_not_ffff_match; >>> >>> gcc_assert (mode == SImode || mode == DImode); >>> >>> @@ -1106,29 +1106,48 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) >>> one_match = 0; >>> zero_match = 0; >>> mask = 0xffff; >>> + first_not_ffff_match = -1; >>> >>> for (i = 0; i < 64; i += 16, mask <<= 16) >>> { >>> - if ((val & mask) == 0) >>> - zero_match++; >>> - else if ((val & mask) == mask) >>> + if ((val & mask) == mask) >>> one_match++; >>> + else >>> + { >>> + if (first_not_ffff_match < 0) >>> + first_not_ffff_match = i; >>> + if ((val & mask) == 0) >>> + zero_match++; >>> + } >>> } >>> >>> if (one_match == 2) >>> { >>> - mask = 0xffff; >>> - for (i = 0; i < 64; i += 16, mask <<= 16) >>> + /* Set one of the quarters and then insert back into result. */ >>> + mask = 0xffffll << first_not_ffff_match; >>> + emit_insn (gen_rtx_SET (VOIDmode, dest, GEN_INT (val | mask))); >>> + emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match), >>> + GEN_INT ((val >> first_not_ffff_match) >>> + & 0xffff))); >>> + return; >>> + } >>> + >>> + if (one_match == 1) >> I think this should be (one_match > zero_match). >> >> Otherwise constants such as >> >> >> 0x00001234ffff0000ll >> >> might end up taking three rather than two insns. > > You're right, we generate: > mov x0, -65536 > movk x0, 0x1234, lsl 32 > and x0, x0, 281474976710655 > > with your suggestion we can improve this to: > mov x0, 4294901760 > movk x0, 0x1234, lsl 32 > > Ok with that change then? > > Kyrill > > 2014-08-07 Ian Bolton<ian.bolton@arm.com> > Kyrylo Tkachov<kyrylo.tkachov@arm.com> > > * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): > Use MOVN when one of the half-words is 0xffff. > > OK. R. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible 2014-08-07 12:57 ` Kyrill Tkachov 2014-08-07 13:31 ` Richard Earnshaw @ 2014-08-07 19:33 ` Richard Henderson 2014-08-13 15:30 ` Kyrill Tkachov 1 sibling, 1 reply; 10+ messages in thread From: Richard Henderson @ 2014-08-07 19:33 UTC (permalink / raw) To: Kyrill Tkachov, Richard Earnshaw; +Cc: gcc-patches On 08/07/2014 02:57 AM, Kyrill Tkachov wrote: > + if (one_match > zero_match) > + { > + /* Set either first three quarters or all but the third. */ > + mask = 0xffffll << (16 - first_not_ffff_match); > + emit_insn (gen_rtx_SET (VOIDmode, dest, > + GEN_INT (val | mask | 0xffffffff00000000ull))); > + > + /* Now insert other two quarters. */ > + for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1); > + i < 64; i += 16, mask <<= 16) > { > if ((val & mask) != mask) > + emit_insn (gen_insv_immdi (dest, GEN_INT (i), > + GEN_INT ((val >> i) & 0xffff))); > } > + return; > } > > if (zero_match == 2) You should not place this three instruction sequence before the two instruction sequences that follow. I.e. place this just before simple_sequence. I do wonder if we should be memo-izing these computations so that we only have to do the complex search for a sequence only once for each constant... r~ ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible 2014-08-07 19:33 ` Richard Henderson @ 2014-08-13 15:30 ` Kyrill Tkachov 2014-08-14 17:31 ` Richard Henderson 0 siblings, 1 reply; 10+ messages in thread From: Kyrill Tkachov @ 2014-08-13 15:30 UTC (permalink / raw) To: Richard Henderson, Richard Earnshaw; +Cc: gcc-patches [-- Attachment #1: Type: text/plain, Size: 1825 bytes --] On 07/08/14 20:32, Richard Henderson wrote: > On 08/07/2014 02:57 AM, Kyrill Tkachov wrote: >> + if (one_match > zero_match) >> + { >> + /* Set either first three quarters or all but the third. */ >> + mask = 0xffffll << (16 - first_not_ffff_match); >> + emit_insn (gen_rtx_SET (VOIDmode, dest, >> + GEN_INT (val | mask | 0xffffffff00000000ull))); >> + >> + /* Now insert other two quarters. */ >> + for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1); >> + i < 64; i += 16, mask <<= 16) >> { >> if ((val & mask) != mask) >> + emit_insn (gen_insv_immdi (dest, GEN_INT (i), >> + GEN_INT ((val >> i) & 0xffff))); >> } >> + return; >> } >> >> if (zero_match == 2) > You should not place this three instruction sequence before the two instruction > sequences that follow. I.e. place this just before simple_sequence. Hi Richard, Is the attached patch ok? It just moves the section as you suggested. I did a build of the Linux kernel with and without this patch to make sure no code-gen was accidentally affected. > > I do wonder if we should be memo-izing these computations so that we only have > to do the complex search for a sequence only once for each constant... We'd need to store a mapping from constant to RTXes and everytime we have a "cache hit" we'd have to tweak them to make sure the registers involved are correct. I had a quick play with this but ended up with LRA ICEs :( Might look at it later on, but it's not high on my priorities right now. Thanks, Kyrill 2014-08-13 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Move one_match > zero_match case to just before simple_sequence. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: aarch64-movn-mov.patch --] [-- Type: text/x-patch; name=aarch64-movn-mov.patch, Size: 1689 bytes --] diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 20debb9..a4e7158 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1136,24 +1136,6 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) return; } - if (one_match > zero_match) - { - /* Set either first three quarters or all but the third. */ - mask = 0xffffll << (16 - first_not_ffff_match); - emit_insn (gen_rtx_SET (VOIDmode, dest, - GEN_INT (val | mask | 0xffffffff00000000ull))); - - /* Now insert other two quarters. */ - for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1); - i < 64; i += 16, mask <<= 16) - { - if ((val & mask) != mask) - emit_insn (gen_insv_immdi (dest, GEN_INT (i), - GEN_INT ((val >> i) & 0xffff))); - } - return; - } - if (zero_match == 2) goto simple_sequence; @@ -1270,6 +1252,24 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) } } + if (one_match > zero_match) + { + /* Set either first three quarters or all but the third. */ + mask = 0xffffll << (16 - first_not_ffff_match); + emit_insn (gen_rtx_SET (VOIDmode, dest, + GEN_INT (val | mask | 0xffffffff00000000ull))); + + /* Now insert other two quarters. */ + for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1); + i < 64; i += 16, mask <<= 16) + { + if ((val & mask) != mask) + emit_insn (gen_insv_immdi (dest, GEN_INT (i), + GEN_INT ((val >> i) & 0xffff))); + } + return; + } + simple_sequence: first = true; mask = 0xffff; ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible 2014-08-13 15:30 ` Kyrill Tkachov @ 2014-08-14 17:31 ` Richard Henderson 0 siblings, 0 replies; 10+ messages in thread From: Richard Henderson @ 2014-08-14 17:31 UTC (permalink / raw) To: Kyrill Tkachov, Richard Earnshaw; +Cc: gcc-patches On 08/13/2014 05:29 AM, Kyrill Tkachov wrote: > Is the attached patch ok? It just moves the section as you suggested. I did a > build of the Linux kernel with and without this patch to make sure no code-gen > was accidentally affected. Looks good. > We'd need to store a mapping from constant to RTXes and everytime we have a > "cache hit" we'd have to tweak them to make sure the registers involved are > correct. I had a quick play with this but ended up with LRA ICEs :( You mis-understand how I meant to memoize. Have a look at how we cache expansions for multiply in synth_mult: we don't record registers or rtx's, but we do record the operation and arguments. So you could consider building a trie, indexed by a hash. struct imm_algorithm { HOST_WIDE_INT op1; imm_algorithm *prev; enum operation { // op1 is accepted by aarch64_mov_operand. // prev should be null. mov, // op1 is to be inserted at the given position // to the value constructed by prev. movk_48, movk_32, movk_16, movk_0, // op1 is an arithmetic immediate to be applied // to the value constructed by prev add, sub, // op1 is a logical immediate to be applied to // the value constructed by prev and, ior, } code; }; r~ ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2014-08-14 17:31 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-05-08 17:36 [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible Ian Bolton 2014-05-16 9:17 ` Ian Bolton 2014-05-16 12:35 ` Richard Earnshaw 2014-08-07 11:32 ` Kyrill Tkachov 2014-08-07 12:46 ` Richard Earnshaw 2014-08-07 12:57 ` Kyrill Tkachov 2014-08-07 13:31 ` Richard Earnshaw 2014-08-07 19:33 ` Richard Henderson 2014-08-13 15:30 ` Kyrill Tkachov 2014-08-14 17:31 ` Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).