* [PATCH] x86: Optimize EVEX vector load/store instructions @ 2019-03-15 23:58 H.J. Lu 2019-03-17 20:47 ` V2 " H.J. Lu 0 siblings, 1 reply; 7+ messages in thread From: H.J. Lu @ 2019-03-15 23:58 UTC (permalink / raw) To: binutils When there is no write mask, we can encode lower 16 128-bit/256-bit vector register load and store instructions as VEX vector register load and store instructions with -O2. gas/ PR gas/24348 * config/tc-i386.c (optimize_encoding): Encode EVEX 128-bit and 256-bit vector register load/store instructions as VEX vector register load/store instructions for -O2. (md_parse_option): Set optimize to INT_MAX for -Os. * doc/c-i386.texi: Update -O2 documentation. gas/ PR gas/24348 * testsuite/gas/i386/optimize-1.s: Add tests for EVEX vector load/store instructions. * testsuite/gas/i386/optimize-2.s: Likewise. * testsuite/gas/i386/optimize-3.s: Likewise. * testsuite/gas/i386/optimize-5.s: Likewise. * testsuite/gas/i386/x86-64-optimize-2.s: Likewise. * testsuite/gas/i386/x86-64-optimize-3.s: Likewise. * testsuite/gas/i386/x86-64-optimize-4.s: Likewise. * testsuite/gas/i386/x86-64-optimize-5.s: Likewise. * testsuite/gas/i386/x86-64-optimize-6.s: Likewise. * testsuite/gas/i386/optimize-1.d: Updated. * testsuite/gas/i386/optimize-2.d: Likewise. * testsuite/gas/i386/optimize-3.d: Likewise. * testsuite/gas/i386/optimize-4.d: Likewise. * testsuite/gas/i386/optimize-5.d: Likewise. * testsuite/gas/i386/x86-64-optimize-2.d: Likewise. * testsuite/gas/i386/x86-64-optimize-3.d: Likewise. * testsuite/gas/i386/x86-64-optimize-4.d: Likewise. * testsuite/gas/i386/x86-64-optimize-5.d: Likewise. * testsuite/gas/i386/x86-64-optimize-6.d: Likewise. opcodes/ PR gas/24348 * i386-opc.tbl: Add Optimize to vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16, vmovdqu32 and vmovdqu64. * i386-tbl.h: Regenerated. --- gas/config/tc-i386.c | 59 +++++++++++- gas/doc/c-i386.texi | 4 +- gas/testsuite/gas/i386/optimize-1.d | 36 +++++++ gas/testsuite/gas/i386/optimize-1.s | 42 +++++++++ gas/testsuite/gas/i386/optimize-2.d | 72 ++++++++++++++ gas/testsuite/gas/i386/optimize-2.s | 84 +++++++++++++++++ gas/testsuite/gas/i386/optimize-3.d | 6 ++ gas/testsuite/gas/i386/optimize-3.s | 7 ++ gas/testsuite/gas/i386/optimize-4.d | 36 +++++++ gas/testsuite/gas/i386/optimize-5.d | 42 +++++++++ gas/testsuite/gas/i386/optimize-5.s | 7 ++ gas/testsuite/gas/i386/x86-64-optimize-2.d | 48 ++++++++++ gas/testsuite/gas/i386/x86-64-optimize-2.s | 56 +++++++++++ gas/testsuite/gas/i386/x86-64-optimize-3.d | 90 ++++++++++++++++++ gas/testsuite/gas/i386/x86-64-optimize-3.s | 105 +++++++++++++++++++++ gas/testsuite/gas/i386/x86-64-optimize-4.d | 6 ++ gas/testsuite/gas/i386/x86-64-optimize-4.s | 7 ++ gas/testsuite/gas/i386/x86-64-optimize-5.d | 54 +++++++++++ gas/testsuite/gas/i386/x86-64-optimize-5.s | 7 ++ gas/testsuite/gas/i386/x86-64-optimize-6.d | 54 +++++++++++ gas/testsuite/gas/i386/x86-64-optimize-6.s | 7 ++ opcodes/i386-opc.tbl | 12 +-- opcodes/i386-tbl.h | 12 +-- 23 files changed, 839 insertions(+), 14 deletions(-) diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index 1b1b0a95da..1028c8d02f 100644 --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -4056,6 +4056,63 @@ optimize_encoding (void) i.types[j].bitfield.ymmword = 0; } } + else if (optimize > 1 + && i.vec_encoding != vex_encoding_evex + && !i.mask + && is_evex_encoding (&i.tm) + && (i.tm.base_opcode == 0x666f + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f + || i.tm.base_opcode == 0xf36f + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f + || i.tm.base_opcode == 0xf26f + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) + && i.tm.extension_opcode == None) + { + /* Optimize: -O2: + VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16, + vmovdqu32 and vmovdqu64: + EVEX VOP %xmmM, %xmmN + -> VEX VOP %xmmM, %xmmN (M and N < 16) + EVEX VOP %ymmM, %ymmN + -> VEX VOP %ymmM, %ymmN (M and N < 16) + EVEX VOP %xmmM, mem + -> VEX VOP %xmmM, mem (M < 16) + EVEX VOP %ymmM, mem + -> VEX VOP %ymmM, mem (M < 16) + EVEX VOP mem, %xmmN + -> VEX VOP mem, %xmmN (N < 16) + EVEX VOP mem, %ymmN + -> VEX VOP mem, %ymmN (N < 16) + */ + int ymmword = 0; + for (j = 0; j < 2; j++) + if (i.types[j].bitfield.regsimd) + { + if (i.op[j].regs->reg_num > 15 + || i.types[j].bitfield.zmmword) + return; + ymmword = i.types[j].bitfield.ymmword; + } + + if (i.tm.base_opcode == 0xf26f) + i.tm.base_opcode = 0xf36f; + else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) + i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD; + i.tm.opcode_modifier.vex = ymmword ? VEX256 : VEX128; + i.tm.opcode_modifier.vexw = VEXW0; + i.tm.opcode_modifier.evex = 0; + i.tm.opcode_modifier.masking = 0; + i.tm.opcode_modifier.disp8memshift = 0; + i.memshift = 0; + for (j = 0; j < 2; j++) + if (operand_type_check (i.types[j], disp) + && i.op[j].disps->X_op == O_constant) + { + i.types[j].bitfield.disp8 + = fits_in_disp8 (i.op[j].disps->X_add_number); + break; + } + } } /* This is the guts of the machine-dependent assembler. LINE points to a @@ -11342,7 +11399,7 @@ md_parse_option (int c, const char *arg) { optimize_for_space = 1; /* Turn on all encoding optimizations. */ - optimize = -1; + optimize = INT_MAX; } else { diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi index 6c63560dbc..3820d2593a 100644 --- a/gas/doc/c-i386.texi +++ b/gas/doc/c-i386.texi @@ -456,7 +456,9 @@ immediate as 32-bit register load instructions with 31-bit or 32-bits immediates and encode 64-bit register clearing instructions with 32-bit register clearing instructions. @samp{-O2} includes @samp{-O1} optimization plus encodes 256-bit and 512-bit vector register clearing -instructions with 128-bit vector register clearing instructions. +instructions with 128-bit vector register clearing instructions as well +as encodes EVEX 128-bit and 256-bit vector register load/store +instructions with VEX vector register load/store instructions. @samp{-Os} includes @samp{-O2} optimization plus encodes 16-bit, 32-bit and 64-bit register tests with immediate as 8-bit register test with immediate. @samp{-O0} turns off this optimization. diff --git a/gas/testsuite/gas/i386/optimize-1.d b/gas/testsuite/gas/i386/optimize-1.d index 4358c19c21..70c802c002 100644 --- a/gas/testsuite/gas/i386/optimize-1.d +++ b/gas/testsuite/gas/i386/optimize-1.d @@ -62,4 +62,40 @@ Disassembly of section .text: +[a-f0-9]+: c5 f4 47 e9 kxorw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) #pass diff --git a/gas/testsuite/gas/i386/optimize-1.s b/gas/testsuite/gas/i386/optimize-1.s index f61a176de8..6dcfbc2799 100644 --- a/gas/testsuite/gas/i386/optimize-1.s +++ b/gas/testsuite/gas/i386/optimize-1.s @@ -72,3 +72,45 @@ _start: kandnd %k1, %k1, %k5 kandnq %k1, %k1, %k5 + + vmovdqa32 %xmm1, %xmm2 + vmovdqa64 %xmm1, %xmm2 + vmovdqu8 %xmm1, %xmm2 + vmovdqu16 %xmm1, %xmm2 + vmovdqu32 %xmm1, %xmm2 + vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 127(%eax), %xmm2 + vmovdqa64 127(%eax), %xmm2 + vmovdqu8 127(%eax), %xmm2 + vmovdqu16 127(%eax), %xmm2 + vmovdqu32 127(%eax), %xmm2 + vmovdqu64 127(%eax), %xmm2 + + vmovdqa32 %xmm1, 128(%eax) + vmovdqa64 %xmm1, 128(%eax) + vmovdqu8 %xmm1, 128(%eax) + vmovdqu16 %xmm1, 128(%eax) + vmovdqu32 %xmm1, 128(%eax) + vmovdqu64 %xmm1, 128(%eax) + + vmovdqa32 %ymm1, %ymm2 + vmovdqa64 %ymm1, %ymm2 + vmovdqu8 %ymm1, %ymm2 + vmovdqu16 %ymm1, %ymm2 + vmovdqu32 %ymm1, %ymm2 + vmovdqu64 %ymm1, %ymm2 + + vmovdqa32 127(%eax), %ymm2 + vmovdqa64 127(%eax), %ymm2 + vmovdqu8 127(%eax), %ymm2 + vmovdqu16 127(%eax), %ymm2 + vmovdqu32 127(%eax), %ymm2 + vmovdqu64 127(%eax), %ymm2 + + vmovdqa32 %ymm1, 128(%eax) + vmovdqa64 %ymm1, 128(%eax) + vmovdqu8 %ymm1, 128(%eax) + vmovdqu16 %ymm1, 128(%eax) + vmovdqu32 %ymm1, 128(%eax) + vmovdqu64 %ymm1, 128(%eax) diff --git a/gas/testsuite/gas/i386/optimize-2.d b/gas/testsuite/gas/i386/optimize-2.d index ec989b0e13..68aaaaaab4 100644 --- a/gas/testsuite/gas/i386/optimize-2.d +++ b/gas/testsuite/gas/i386/optimize-2.d @@ -16,4 +16,76 @@ Disassembly of section .text: +[a-f0-9]+: f6 c3 7f test \$0x7f,%bl +[a-f0-9]+: f7 c7 7f 00 00 00 test \$0x7f,%edi +[a-f0-9]+: 66 f7 c7 7f 00 test \$0x7f,%di + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 48 6f d1 vmovdqa32 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 fd 48 6f d1 vmovdqa64 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7f 48 6f d1 vmovdqu8 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 ff 48 6f d1 vmovdqu16 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7e 48 6f d1 vmovdqu32 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 fe 48 6f d1 vmovdqu64 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7d 29 6f d1 vmovdqa32 %ymm1,%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 6f d1 vmovdqa64 %ymm1,%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 6f d1 vmovdqu8 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 6f d1 vmovdqu16 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 6f d1 vmovdqu32 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 6f d1 vmovdqu64 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7d 29 6f 10 vmovdqa32 \(%eax\),%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 6f 10 vmovdqa64 \(%eax\),%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 6f 10 vmovdqu8 \(%eax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 6f 10 vmovdqu16 \(%eax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 6f 10 vmovdqu32 \(%eax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 6f 10 vmovdqu64 \(%eax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7d 29 7f 08 vmovdqa32 %ymm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 7f 08 vmovdqa64 %ymm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 7f 08 vmovdqu8 %xmm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 7f 08 vmovdqu16 %xmm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 7f 08 vmovdqu32 %xmm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 7f 08 vmovdqu64 %xmm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 7d 89 6f d1 vmovdqa32 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 fd 89 6f d1 vmovdqa64 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 7f 89 6f d1 vmovdqu8 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 ff 89 6f d1 vmovdqu16 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 7e 89 6f d1 vmovdqu32 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 fe 89 6f d1 vmovdqu64 %xmm1,%xmm2\{%k1\}\{z\} #pass diff --git a/gas/testsuite/gas/i386/optimize-2.s b/gas/testsuite/gas/i386/optimize-2.s index b427a741b9..d73f41ba61 100644 --- a/gas/testsuite/gas/i386/optimize-2.s +++ b/gas/testsuite/gas/i386/optimize-2.s @@ -11,3 +11,87 @@ _start: test $0x7f, %bl test $0x7f, %edi test $0x7f, %di + + vmovdqa32 %xmm1, %xmm2 + vmovdqa64 %xmm1, %xmm2 + vmovdqu8 %xmm1, %xmm2 + vmovdqu16 %xmm1, %xmm2 + vmovdqu32 %xmm1, %xmm2 + vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 127(%eax), %xmm2 + vmovdqa64 127(%eax), %xmm2 + vmovdqu8 127(%eax), %xmm2 + vmovdqu16 127(%eax), %xmm2 + vmovdqu32 127(%eax), %xmm2 + vmovdqu64 127(%eax), %xmm2 + + vmovdqa32 %xmm1, 128(%eax) + vmovdqa64 %xmm1, 128(%eax) + vmovdqu8 %xmm1, 128(%eax) + vmovdqu16 %xmm1, 128(%eax) + vmovdqu32 %xmm1, 128(%eax) + vmovdqu64 %xmm1, 128(%eax) + + vmovdqa32 %ymm1, %ymm2 + vmovdqa64 %ymm1, %ymm2 + vmovdqu8 %ymm1, %ymm2 + vmovdqu16 %ymm1, %ymm2 + vmovdqu32 %ymm1, %ymm2 + vmovdqu64 %ymm1, %ymm2 + + vmovdqa32 127(%eax), %ymm2 + vmovdqa64 127(%eax), %ymm2 + vmovdqu8 127(%eax), %ymm2 + vmovdqu16 127(%eax), %ymm2 + vmovdqu32 127(%eax), %ymm2 + vmovdqu64 127(%eax), %ymm2 + + vmovdqa32 %ymm1, 128(%eax) + vmovdqa64 %ymm1, 128(%eax) + vmovdqu8 %ymm1, 128(%eax) + vmovdqu16 %ymm1, 128(%eax) + vmovdqu32 %ymm1, 128(%eax) + vmovdqu64 %ymm1, 128(%eax) + + vmovdqa32 %zmm1, %zmm2 + vmovdqa64 %zmm1, %zmm2 + vmovdqu8 %zmm1, %zmm2 + vmovdqu16 %zmm1, %zmm2 + vmovdqu32 %zmm1, %zmm2 + vmovdqu64 %zmm1, %zmm2 + + {evex} vmovdqa32 %ymm1, %ymm2 + {evex} vmovdqa64 %ymm1, %ymm2 + {evex} vmovdqu8 %xmm1, %xmm2 + {evex} vmovdqu16 %xmm1, %xmm2 + {evex} vmovdqu32 %xmm1, %xmm2 + {evex} vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 %ymm1, %ymm2{%k1} + vmovdqa64 %ymm1, %ymm2{%k1} + vmovdqu8 %xmm1, %xmm2{%k1} + vmovdqu16 %xmm1, %xmm2{%k1} + vmovdqu32 %xmm1, %xmm2{%k1} + vmovdqu64 %xmm1, %xmm2{%k1} + + vmovdqa32 (%eax), %ymm2{%k1} + vmovdqa64 (%eax), %ymm2{%k1} + vmovdqu8 (%eax), %xmm2{%k1} + vmovdqu16 (%eax), %xmm2{%k1} + vmovdqu32 (%eax), %xmm2{%k1} + vmovdqu64 (%eax), %xmm2{%k1} + + vmovdqa32 %ymm1, (%eax){%k1} + vmovdqa64 %ymm1, (%eax){%k1} + vmovdqu8 %xmm1, (%eax){%k1} + vmovdqu16 %xmm1, (%eax){%k1} + vmovdqu32 %xmm1, (%eax){%k1} + vmovdqu64 %xmm1, (%eax){%k1} + + vmovdqa32 %xmm1, %xmm2{%k1}{z} + vmovdqa64 %xmm1, %xmm2{%k1}{z} + vmovdqu8 %xmm1, %xmm2{%k1}{z} + vmovdqu16 %xmm1, %xmm2{%k1}{z} + vmovdqu32 %xmm1, %xmm2{%k1}{z} + vmovdqu64 %xmm1, %xmm2{%k1}{z} diff --git a/gas/testsuite/gas/i386/optimize-3.d b/gas/testsuite/gas/i386/optimize-3.d index f251a3626d..cd43243b49 100644 --- a/gas/testsuite/gas/i386/optimize-3.d +++ b/gas/testsuite/gas/i386/optimize-3.d @@ -9,4 +9,10 @@ Disassembly of section .text: 0+ <_start>: +[a-f0-9]+: a9 7f 00 00 00 test \$0x7f,%eax + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 #pass diff --git a/gas/testsuite/gas/i386/optimize-3.s b/gas/testsuite/gas/i386/optimize-3.s index 536bf0cfb2..a70893c15d 100644 --- a/gas/testsuite/gas/i386/optimize-3.s +++ b/gas/testsuite/gas/i386/optimize-3.s @@ -4,3 +4,10 @@ .text _start: {nooptimize} testl $0x7f, %eax + + {nooptimize} vmovdqa32 %ymm1, %ymm2 + {nooptimize} vmovdqa64 %ymm1, %ymm2 + {nooptimize} vmovdqu8 %xmm1, %xmm2 + {nooptimize} vmovdqu16 %xmm1, %xmm2 + {nooptimize} vmovdqu32 %xmm1, %xmm2 + {nooptimize} vmovdqu64 %xmm1, %xmm2 diff --git a/gas/testsuite/gas/i386/optimize-4.d b/gas/testsuite/gas/i386/optimize-4.d index 9f99dadf34..2df84654d6 100644 --- a/gas/testsuite/gas/i386/optimize-4.d +++ b/gas/testsuite/gas/i386/optimize-4.d @@ -62,6 +62,42 @@ Disassembly of section .text: +[a-f0-9]+: c5 f4 47 e9 kxorw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 #pass diff --git a/gas/testsuite/gas/i386/optimize-5.d b/gas/testsuite/gas/i386/optimize-5.d index cfd0df04a4..ecc1ab139a 100644 --- a/gas/testsuite/gas/i386/optimize-5.d +++ b/gas/testsuite/gas/i386/optimize-5.d @@ -62,6 +62,48 @@ Disassembly of section .text: +[a-f0-9]+: c5 f4 47 e9 kxorw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 #pass diff --git a/gas/testsuite/gas/i386/optimize-5.s b/gas/testsuite/gas/i386/optimize-5.s index 66c762bd3b..77d60edb69 100644 --- a/gas/testsuite/gas/i386/optimize-5.s +++ b/gas/testsuite/gas/i386/optimize-5.s @@ -6,3 +6,10 @@ {evex} vandnpd %zmm1, %zmm1, %zmm5 {evex} vandnpd %ymm1, %ymm1, %ymm5 + + {evex} vmovdqa32 %ymm1, %ymm2 + {evex} vmovdqa64 %ymm1, %ymm2 + {evex} vmovdqu8 %xmm1, %xmm2 + {evex} vmovdqu16 %xmm1, %xmm2 + {evex} vmovdqu32 %xmm1, %xmm2 + {evex} vmovdqu64 %xmm1, %xmm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.d b/gas/testsuite/gas/i386/x86-64-optimize-2.d index f374619d4a..067df076f7 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-2.d @@ -106,4 +106,52 @@ Disassembly of section .text: +[a-f0-9]+: 62 e1 f5 08 fb c1 vpsubq %xmm1,%xmm1,%xmm16 +[a-f0-9]+: 62 b1 f5 40 fb c9 vpsubq %zmm17,%zmm17,%zmm1 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.s b/gas/testsuite/gas/i386/x86-64-optimize-2.s index 10ce788ffb..1275610e55 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-2.s @@ -114,3 +114,59 @@ _start: vpsubq %ymm1, %ymm1, %ymm16 vpsubq %zmm17, %zmm17, %zmm1 vpsubq %ymm17, %ymm17, %ymm1 + + vmovdqa32 %xmm1, %xmm2 + vmovdqa64 %xmm1, %xmm2 + vmovdqu8 %xmm1, %xmm2 + vmovdqu16 %xmm1, %xmm2 + vmovdqu32 %xmm1, %xmm2 + vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 %xmm11, %xmm12 + vmovdqa64 %xmm11, %xmm12 + vmovdqu8 %xmm11, %xmm12 + vmovdqu16 %xmm11, %xmm12 + vmovdqu32 %xmm11, %xmm12 + vmovdqu64 %xmm11, %xmm12 + + vmovdqa32 127(%rax), %xmm2 + vmovdqa64 127(%rax), %xmm2 + vmovdqu8 127(%rax), %xmm2 + vmovdqu16 127(%rax), %xmm2 + vmovdqu32 127(%rax), %xmm2 + vmovdqu64 127(%rax), %xmm2 + + vmovdqa32 %xmm1, 128(%rax) + vmovdqa64 %xmm1, 128(%rax) + vmovdqu8 %xmm1, 128(%rax) + vmovdqu16 %xmm1, 128(%rax) + vmovdqu32 %xmm1, 128(%rax) + vmovdqu64 %xmm1, 128(%rax) + + vmovdqa32 %ymm1, %ymm2 + vmovdqa64 %ymm1, %ymm2 + vmovdqu8 %ymm1, %ymm2 + vmovdqu16 %ymm1, %ymm2 + vmovdqu32 %ymm1, %ymm2 + vmovdqu64 %ymm1, %ymm2 + + vmovdqa32 %ymm11, %ymm12 + vmovdqa64 %ymm11, %ymm12 + vmovdqu8 %ymm11, %ymm12 + vmovdqu16 %ymm11, %ymm12 + vmovdqu32 %ymm11, %ymm12 + vmovdqu64 %ymm11, %ymm12 + + vmovdqa32 127(%rax), %ymm2 + vmovdqa64 127(%rax), %ymm2 + vmovdqu8 127(%rax), %ymm2 + vmovdqu16 127(%rax), %ymm2 + vmovdqu32 127(%rax), %ymm2 + vmovdqu64 127(%rax), %ymm2 + + vmovdqa32 %ymm1, 128(%rax) + vmovdqa64 %ymm1, 128(%rax) + vmovdqu8 %ymm1, 128(%rax) + vmovdqu16 %ymm1, 128(%rax) + vmovdqu32 %ymm1, 128(%rax) + vmovdqu64 %ymm1, 128(%rax) diff --git a/gas/testsuite/gas/i386/x86-64-optimize-3.d b/gas/testsuite/gas/i386/x86-64-optimize-3.d index b46f728dd8..35a53e0f4b 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-3.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-3.d @@ -24,4 +24,94 @@ Disassembly of section .text: +[a-f0-9]+: 41 f6 c1 7f test \$0x7f,%r9b +[a-f0-9]+: 41 f6 c1 7f test \$0x7f,%r9b +[a-f0-9]+: 41 f6 c1 7f test \$0x7f,%r9b + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 b1 7d 08 6f d5 vmovdqa32 %xmm21,%xmm2 + +[a-f0-9]+: 62 b1 fd 08 6f d5 vmovdqa64 %xmm21,%xmm2 + +[a-f0-9]+: 62 b1 7f 08 6f d5 vmovdqu8 %xmm21,%xmm2 + +[a-f0-9]+: 62 b1 ff 08 6f d5 vmovdqu16 %xmm21,%xmm2 + +[a-f0-9]+: 62 b1 7e 08 6f d5 vmovdqu32 %xmm21,%xmm2 + +[a-f0-9]+: 62 b1 fe 08 6f d5 vmovdqu64 %xmm21,%xmm2 + +[a-f0-9]+: 62 f1 7d 48 6f d1 vmovdqa32 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 fd 48 6f d1 vmovdqa64 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7f 48 6f d1 vmovdqu8 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 ff 48 6f d1 vmovdqu16 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7e 48 6f d1 vmovdqu32 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 fe 48 6f d1 vmovdqu64 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7d 29 6f d1 vmovdqa32 %ymm1,%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 6f d1 vmovdqa64 %ymm1,%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 6f d1 vmovdqu8 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 6f d1 vmovdqu16 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 6f d1 vmovdqu32 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 6f d1 vmovdqu64 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7d 29 6f 10 vmovdqa32 \(%rax\),%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 6f 10 vmovdqa64 \(%rax\),%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 6f 10 vmovdqu8 \(%rax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 6f 10 vmovdqu16 \(%rax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 6f 10 vmovdqu32 \(%rax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 6f 10 vmovdqu64 \(%rax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7d 29 7f 08 vmovdqa32 %ymm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 7f 08 vmovdqa64 %ymm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 7f 08 vmovdqu8 %xmm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 7f 08 vmovdqu16 %xmm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 7f 08 vmovdqu32 %xmm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 7f 08 vmovdqu64 %xmm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 7d 89 6f d1 vmovdqa32 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 fd 89 6f d1 vmovdqa64 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 7f 89 6f d1 vmovdqu8 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 ff 89 6f d1 vmovdqu16 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 7e 89 6f d1 vmovdqu32 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 fe 89 6f d1 vmovdqu64 %xmm1,%xmm2\{%k1\}\{z\} #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-3.s b/gas/testsuite/gas/i386/x86-64-optimize-3.s index 61c150a87c..688f9623b2 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-3.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-3.s @@ -19,3 +19,108 @@ _start: test $0x7f, %r9d test $0x7f, %r9w test $0x7f, %r9b + + vmovdqa32 %xmm1, %xmm2 + vmovdqa64 %xmm1, %xmm2 + vmovdqu8 %xmm1, %xmm2 + vmovdqu16 %xmm1, %xmm2 + vmovdqu32 %xmm1, %xmm2 + vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 %xmm11, %xmm12 + vmovdqa64 %xmm11, %xmm12 + vmovdqu8 %xmm11, %xmm12 + vmovdqu16 %xmm11, %xmm12 + vmovdqu32 %xmm11, %xmm12 + vmovdqu64 %xmm11, %xmm12 + + vmovdqa32 127(%rax), %xmm2 + vmovdqa64 127(%rax), %xmm2 + vmovdqu8 127(%rax), %xmm2 + vmovdqu16 127(%rax), %xmm2 + vmovdqu32 127(%rax), %xmm2 + vmovdqu64 127(%rax), %xmm2 + + vmovdqa32 %xmm1, 128(%rax) + vmovdqa64 %xmm1, 128(%rax) + vmovdqu8 %xmm1, 128(%rax) + vmovdqu16 %xmm1, 128(%rax) + vmovdqu32 %xmm1, 128(%rax) + vmovdqu64 %xmm1, 128(%rax) + + vmovdqa32 %ymm1, %ymm2 + vmovdqa64 %ymm1, %ymm2 + vmovdqu8 %ymm1, %ymm2 + vmovdqu16 %ymm1, %ymm2 + vmovdqu32 %ymm1, %ymm2 + vmovdqu64 %ymm1, %ymm2 + + vmovdqa32 %ymm11, %ymm12 + vmovdqa64 %ymm11, %ymm12 + vmovdqu8 %ymm11, %ymm12 + vmovdqu16 %ymm11, %ymm12 + vmovdqu32 %ymm11, %ymm12 + vmovdqu64 %ymm11, %ymm12 + + vmovdqa32 127(%rax), %ymm2 + vmovdqa64 127(%rax), %ymm2 + vmovdqu8 127(%rax), %ymm2 + vmovdqu16 127(%rax), %ymm2 + vmovdqu32 127(%rax), %ymm2 + vmovdqu64 127(%rax), %ymm2 + + vmovdqa32 %ymm1, 128(%rax) + vmovdqa64 %ymm1, 128(%rax) + vmovdqu8 %ymm1, 128(%rax) + vmovdqu16 %ymm1, 128(%rax) + vmovdqu32 %ymm1, 128(%rax) + vmovdqu64 %ymm1, 128(%rax) + + vmovdqa32 %xmm21, %xmm2 + vmovdqa64 %xmm21, %xmm2 + vmovdqu8 %xmm21, %xmm2 + vmovdqu16 %xmm21, %xmm2 + vmovdqu32 %xmm21, %xmm2 + vmovdqu64 %xmm21, %xmm2 + + vmovdqa32 %zmm1, %zmm2 + vmovdqa64 %zmm1, %zmm2 + vmovdqu8 %zmm1, %zmm2 + vmovdqu16 %zmm1, %zmm2 + vmovdqu32 %zmm1, %zmm2 + vmovdqu64 %zmm1, %zmm2 + + {evex} vmovdqa32 %ymm1, %ymm2 + {evex} vmovdqa64 %ymm1, %ymm2 + {evex} vmovdqu8 %xmm1, %xmm2 + {evex} vmovdqu16 %xmm1, %xmm2 + {evex} vmovdqu32 %xmm1, %xmm2 + {evex} vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 %ymm1, %ymm2{%k1} + vmovdqa64 %ymm1, %ymm2{%k1} + vmovdqu8 %xmm1, %xmm2{%k1} + vmovdqu16 %xmm1, %xmm2{%k1} + vmovdqu32 %xmm1, %xmm2{%k1} + vmovdqu64 %xmm1, %xmm2{%k1} + + vmovdqa32 (%rax), %ymm2{%k1} + vmovdqa64 (%rax), %ymm2{%k1} + vmovdqu8 (%rax), %xmm2{%k1} + vmovdqu16 (%rax), %xmm2{%k1} + vmovdqu32 (%rax), %xmm2{%k1} + vmovdqu64 (%rax), %xmm2{%k1} + + vmovdqa32 %ymm1, (%rax){%k1} + vmovdqa64 %ymm1, (%rax){%k1} + vmovdqu8 %xmm1, (%rax){%k1} + vmovdqu16 %xmm1, (%rax){%k1} + vmovdqu32 %xmm1, (%rax){%k1} + vmovdqu64 %xmm1, (%rax){%k1} + + vmovdqa32 %xmm1, %xmm2{%k1}{z} + vmovdqa64 %xmm1, %xmm2{%k1}{z} + vmovdqu8 %xmm1, %xmm2{%k1}{z} + vmovdqu16 %xmm1, %xmm2{%k1}{z} + vmovdqu32 %xmm1, %xmm2{%k1}{z} + vmovdqu64 %xmm1, %xmm2{%k1}{z} diff --git a/gas/testsuite/gas/i386/x86-64-optimize-4.d b/gas/testsuite/gas/i386/x86-64-optimize-4.d index 10e7b02d3a..18fdeb1442 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-4.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-4.d @@ -9,4 +9,10 @@ Disassembly of section .text: 0+ <_start>: +[a-f0-9]+: a9 7f 00 00 00 test \$0x7f,%eax + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-4.s b/gas/testsuite/gas/i386/x86-64-optimize-4.s index 0c4fdcecc5..b6d872db2c 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-4.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-4.s @@ -4,3 +4,10 @@ .text _start: {nooptimize} testl $0x7f, %eax + + {nooptimize} vmovdqa32 %ymm1, %ymm2 + {nooptimize} vmovdqa64 %ymm1, %ymm2 + {nooptimize} vmovdqu8 %xmm1, %xmm2 + {nooptimize} vmovdqu16 %xmm1, %xmm2 + {nooptimize} vmovdqu32 %xmm1, %xmm2 + {nooptimize} vmovdqu64 %xmm1, %xmm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-5.d b/gas/testsuite/gas/i386/x86-64-optimize-5.d index 085f7f29f2..012237df57 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-5.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-5.d @@ -106,6 +106,60 @@ Disassembly of section .text: +[a-f0-9]+: 62 e1 f5 08 fb c1 vpsubq %xmm1,%xmm1,%xmm16 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-5.s b/gas/testsuite/gas/i386/x86-64-optimize-5.s index 6b4ff103ab..9756ae815c 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-5.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-5.s @@ -4,3 +4,10 @@ {evex} vandnpd %zmm1, %zmm1, %zmm5 {evex} vandnpd %ymm1, %ymm1, %ymm5 + + {evex} vmovdqa32 %ymm1, %ymm2 + {evex} vmovdqa64 %ymm1, %ymm2 + {evex} vmovdqu8 %xmm1, %xmm2 + {evex} vmovdqu16 %xmm1, %xmm2 + {evex} vmovdqu32 %xmm1, %xmm2 + {evex} vmovdqu64 %xmm1, %xmm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-6.d b/gas/testsuite/gas/i386/x86-64-optimize-6.d index 0d52c8fcbb..aca119e4f9 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-6.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-6.d @@ -106,6 +106,60 @@ Disassembly of section .text: +[a-f0-9]+: 62 e1 f5 08 fb c1 vpsubq %xmm1,%xmm1,%xmm16 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-6.s b/gas/testsuite/gas/i386/x86-64-optimize-6.s index 70ccbc41be..7c403fcc86 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-6.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-6.s @@ -6,3 +6,10 @@ {evex} vandnpd %zmm1, %zmm1, %zmm5 {evex} vandnpd %ymm1, %ymm1, %ymm5 + + {evex} vmovdqa32 %ymm1, %ymm2 + {evex} vmovdqa64 %ymm1, %ymm2 + {evex} vmovdqu8 %xmm1, %xmm2 + {evex} vmovdqu16 %xmm1, %xmm2 + {evex} vmovdqu32 %xmm1, %xmm2 + {evex} vmovdqu64 %xmm1, %xmm2 diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl index 1194dcd1c0..26a68d8cbe 100644 --- a/opcodes/i386-opc.tbl +++ b/opcodes/i386-opc.tbl @@ -3709,11 +3709,11 @@ vmovd, 2, 0x666E, None, 1, CpuAVX512F, D|Modrm|EVex=2|VexOpcode=0|Disp8MemShift= vmovddup, 2, 0xF212, None, 1, CpuAVX512F, Modrm|Masking=3|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM|RegZMM|Unspecified|BaseIndex, RegYMM|RegZMM } -vmovdqa64, 2, 0x666F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } -vmovdqa32, 2, 0x666F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqa64, 2, 0x666F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqa32, 2, 0x666F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vmovntdq, 2, 0x66E7, None, 1, CpuAVX512F, Modrm|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex } -vmovdqu32, 2, 0xF36F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } -vmovdqu64, 2, 0xF36F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqu32, 2, 0xF36F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqu64, 2, 0xF36F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vmovhlps, 3, 0x12, None, 1, CpuAVX512F, Modrm|EVex=4|VexOpcode=0|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM } vmovlhps, 3, 0x16, None, 1, CpuAVX512F, Modrm|EVex=4|VexOpcode=0|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM } @@ -4190,8 +4190,8 @@ kshiftrq, 3, 0x6631, None, 1, CpuAVX512BW, Modrm|Vex=1|VexOpcode=2|VexW=2|No_bSu vdbpsadbw, 4, 0x6642, None, 1, CpuAVX512BW, Modrm|Masking=3|VexOpcode=2|VexVVVV=1|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } -vmovdqu8, 2, 0xF26F, None, 1, CpuAVX512BW, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } -vmovdqu16, 2, 0xF26F, None, 1, CpuAVX512BW, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqu8, 2, 0xF26F, None, 1, CpuAVX512BW, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqu16, 2, 0xF26F, None, 1, CpuAVX512BW, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vpabsb, 2, 0x661C, None, 1, CpuAVX512BW, Modrm|Masking=3|VexOpcode=1|VexWIG|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vpmaxsb, 3, 0x663C, None, 1, CpuAVX512BW, Modrm|Masking=3|VexOpcode=1|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } diff --git a/opcodes/i386-tbl.h b/opcodes/i386-tbl.h index 81575df3f2..bd33eb5ce5 100644 --- a/opcodes/i386-tbl.h +++ b/opcodes/i386-tbl.h @@ -60123,7 +60123,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, @@ -60139,7 +60139,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, @@ -60155,7 +60155,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, @@ -60171,7 +60171,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, @@ -63555,7 +63555,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, @@ -63571,7 +63571,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, -- 2.20.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* V2 [PATCH] x86: Optimize EVEX vector load/store instructions 2019-03-15 23:58 [PATCH] x86: Optimize EVEX vector load/store instructions H.J. Lu @ 2019-03-17 20:47 ` H.J. Lu 2019-03-18 13:49 ` Jan Beulich 0 siblings, 1 reply; 7+ messages in thread From: H.J. Lu @ 2019-03-17 20:47 UTC (permalink / raw) To: binutils On Sat, Mar 16, 2019 at 07:54:14AM +0800, H.J. Lu wrote: > When there is no write mask, we can encode lower 16 128-bit/256-bit > vector register load and store instructions as VEX vector register > load and store instructions with -O2. > > gas/ > > PR gas/24348 > * config/tc-i386.c (optimize_encoding): Encode EVEX 128-bit and > 256-bit vector register load/store instructions as VEX vector > register load/store instructions for -O2. > (md_parse_option): Set optimize to INT_MAX for -Os. > * doc/c-i386.texi: Update -O2 documentation. > This is the patch I am checking in. H.J. ---- When there is no write mask, we can encode lower 16 128-bit/256-bit vector register load and store instructions as VEX vector register load and store instructions with -O1. gas/ PR gas/24348 * config/tc-i386.c (optimize_encoding): Encode EVEX 128-bit and 256-bit vector register load/store instructions as VEX vector register load/store instructions for -O1. * doc/c-i386.texi: Update -O1 documentation. * testsuite/gas/i386/i386.exp: Run PR gas/24348 tests. * testsuite/gas/i386/optimize-1.s: Add tests for EVEX vector load/store instructions. * testsuite/gas/i386/optimize-2.s: Likewise. * testsuite/gas/i386/optimize-3.s: Likewise. * testsuite/gas/i386/optimize-5.s: Likewise. * testsuite/gas/i386/x86-64-optimize-2.s: Likewise. * testsuite/gas/i386/x86-64-optimize-3.s: Likewise. * testsuite/gas/i386/x86-64-optimize-4.s: Likewise. * testsuite/gas/i386/x86-64-optimize-5.s: Likewise. * testsuite/gas/i386/x86-64-optimize-6.s: Likewise. * testsuite/gas/i386/optimize-1.d: Updated. * testsuite/gas/i386/optimize-2.d: Likewise. * testsuite/gas/i386/optimize-3.d: Likewise. * testsuite/gas/i386/optimize-4.d: Likewise. * testsuite/gas/i386/optimize-5.d: Likewise. * testsuite/gas/i386/x86-64-optimize-2.d: Likewise. * testsuite/gas/i386/x86-64-optimize-3.d: Likewise. * testsuite/gas/i386/x86-64-optimize-4.d: Likewise. * testsuite/gas/i386/x86-64-optimize-5.d: Likewise. * testsuite/gas/i386/x86-64-optimize-6.d: Likewise. * testsuite/gas/i386/optimize-7.d: New file. * testsuite/gas/i386/optimize-7.s: Likewise. * testsuite/gas/i386/x86-64-optimize-8.d: Likewise. * testsuite/gas/i386/x86-64-optimize-8.s: Likewise. opcodes/ PR gas/24348 * i386-opc.tbl: Add Optimize to vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16, vmovdqu32 and vmovdqu64. * i386-tbl.h: Regenerated. --- gas/config/tc-i386.c | 50 ++++++++++ gas/doc/c-i386.texi | 4 +- gas/testsuite/gas/i386/i386.exp | 2 + gas/testsuite/gas/i386/optimize-1.d | 36 +++++++ gas/testsuite/gas/i386/optimize-1.s | 42 ++++++++ gas/testsuite/gas/i386/optimize-1a.d | 36 +++++++ gas/testsuite/gas/i386/optimize-2.d | 72 ++++++++++++++ gas/testsuite/gas/i386/optimize-2.s | 84 ++++++++++++++++ gas/testsuite/gas/i386/optimize-3.d | 6 ++ gas/testsuite/gas/i386/optimize-3.s | 7 ++ gas/testsuite/gas/i386/optimize-4.d | 36 +++++++ gas/testsuite/gas/i386/optimize-5.d | 42 ++++++++ gas/testsuite/gas/i386/optimize-5.s | 7 ++ gas/testsuite/gas/i386/optimize-7.d | 12 +++ gas/testsuite/gas/i386/optimize-7.s | 6 ++ gas/testsuite/gas/i386/x86-64-optimize-2.d | 48 +++++++++ gas/testsuite/gas/i386/x86-64-optimize-2.s | 56 +++++++++++ gas/testsuite/gas/i386/x86-64-optimize-2a.d | 48 +++++++++ gas/testsuite/gas/i386/x86-64-optimize-3.d | 90 +++++++++++++++++ gas/testsuite/gas/i386/x86-64-optimize-3.s | 105 ++++++++++++++++++++ gas/testsuite/gas/i386/x86-64-optimize-4.d | 6 ++ gas/testsuite/gas/i386/x86-64-optimize-4.s | 7 ++ gas/testsuite/gas/i386/x86-64-optimize-5.d | 54 ++++++++++ gas/testsuite/gas/i386/x86-64-optimize-5.s | 7 ++ gas/testsuite/gas/i386/x86-64-optimize-6.d | 54 ++++++++++ gas/testsuite/gas/i386/x86-64-optimize-6.s | 7 ++ gas/testsuite/gas/i386/x86-64-optimize-8.d | 12 +++ gas/testsuite/gas/i386/x86-64-optimize-8.s | 6 ++ opcodes/i386-opc.tbl | 12 +-- opcodes/i386-tbl.h | 12 +-- 30 files changed, 953 insertions(+), 13 deletions(-) create mode 100644 gas/testsuite/gas/i386/optimize-7.d create mode 100644 gas/testsuite/gas/i386/optimize-7.s create mode 100644 gas/testsuite/gas/i386/x86-64-optimize-8.d create mode 100644 gas/testsuite/gas/i386/x86-64-optimize-8.s diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index 856c18d672..fa060759ae 100644 --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -4075,6 +4075,56 @@ optimize_encoding (void) i.types[j].bitfield.ymmword = 0; } } + else if ((cpu_arch_flags.bitfield.cpuavx + || cpu_arch_isa_flags.bitfield.cpuavx) + && i.vec_encoding != vex_encoding_evex + && !i.types[0].bitfield.zmmword + && !i.mask + && is_evex_encoding (&i.tm) + && (i.tm.base_opcode == 0x666f + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f + || i.tm.base_opcode == 0xf36f + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f + || i.tm.base_opcode == 0xf26f + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) + && i.tm.extension_opcode == None) + { + /* Optimize: -O1: + VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16, + vmovdqu32 and vmovdqu64: + EVEX VOP %xmmM, %xmmN + -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16) + EVEX VOP %ymmM, %ymmN + -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16) + EVEX VOP %xmmM, mem + -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16) + EVEX VOP %ymmM, mem + -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16) + EVEX VOP mem, %xmmN + -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16) + EVEX VOP mem, %ymmN + -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16) + */ + if (i.tm.base_opcode == 0xf26f) + i.tm.base_opcode = 0xf36f; + else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) + i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD; + i.tm.opcode_modifier.vex + = i.types[0].bitfield.ymmword ? VEX256 : VEX128; + i.tm.opcode_modifier.vexw = VEXW0; + i.tm.opcode_modifier.evex = 0; + i.tm.opcode_modifier.masking = 0; + i.tm.opcode_modifier.disp8memshift = 0; + i.memshift = 0; + for (j = 0; j < 2; j++) + if (operand_type_check (i.types[j], disp) + && i.op[j].disps->X_op == O_constant) + { + i.types[j].bitfield.disp8 + = fits_in_disp8 (i.op[j].disps->X_add_number); + break; + } + } } /* This is the guts of the machine-dependent assembler. LINE points to a diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi index 7e5f5c257e..4acd5ff616 100644 --- a/gas/doc/c-i386.texi +++ b/gas/doc/c-i386.texi @@ -456,7 +456,9 @@ immediate as 32-bit register load instructions with 31-bit or 32-bits immediates, encode 64-bit register clearing instructions with 32-bit register clearing instructions and encode 256-bit/512-bit VEX/EVEX vector register clearing instructions with 128-bit VEX vector register -clearing instructions. @samp{-O2} includes @samp{-O1} optimization plus +clearing instructions as well as encode 128-bit/256-bit EVEX vector +register load/store instructions with VEX vector register load/store +instructions. @samp{-O2} includes @samp{-O1} optimization plus encodes 256-bit/512-bit EVEX vector register clearing instructions with 128-bit EVEX vector register clearing instructions. @samp{-Os} includes @samp{-O2} optimization plus encodes 16-bit, 32-bit diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp index 798bfb564a..3067b4a1f1 100644 --- a/gas/testsuite/gas/i386/i386.exp +++ b/gas/testsuite/gas/i386/i386.exp @@ -476,6 +476,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_32_check]] run_dump_test "optimize-6a" run_dump_test "optimize-6b" run_dump_test "optimize-6c" + run_dump_test "optimize-7" # These tests require support for 8 and 16 bit relocs, # so we only run them for ELF and COFF targets. @@ -990,6 +991,7 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t run_dump_test "x86-64-optimize-7a" run_dump_test "x86-64-optimize-7b" run_dump_test "x86-64-optimize-7c" + run_dump_test "x86-64-optimize-8" if { ![istarget "*-*-aix*"] && ![istarget "*-*-beos*"] diff --git a/gas/testsuite/gas/i386/optimize-1.d b/gas/testsuite/gas/i386/optimize-1.d index 4358c19c21..70c802c002 100644 --- a/gas/testsuite/gas/i386/optimize-1.d +++ b/gas/testsuite/gas/i386/optimize-1.d @@ -62,4 +62,40 @@ Disassembly of section .text: +[a-f0-9]+: c5 f4 47 e9 kxorw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) #pass diff --git a/gas/testsuite/gas/i386/optimize-1.s b/gas/testsuite/gas/i386/optimize-1.s index f61a176de8..6dcfbc2799 100644 --- a/gas/testsuite/gas/i386/optimize-1.s +++ b/gas/testsuite/gas/i386/optimize-1.s @@ -72,3 +72,45 @@ _start: kandnd %k1, %k1, %k5 kandnq %k1, %k1, %k5 + + vmovdqa32 %xmm1, %xmm2 + vmovdqa64 %xmm1, %xmm2 + vmovdqu8 %xmm1, %xmm2 + vmovdqu16 %xmm1, %xmm2 + vmovdqu32 %xmm1, %xmm2 + vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 127(%eax), %xmm2 + vmovdqa64 127(%eax), %xmm2 + vmovdqu8 127(%eax), %xmm2 + vmovdqu16 127(%eax), %xmm2 + vmovdqu32 127(%eax), %xmm2 + vmovdqu64 127(%eax), %xmm2 + + vmovdqa32 %xmm1, 128(%eax) + vmovdqa64 %xmm1, 128(%eax) + vmovdqu8 %xmm1, 128(%eax) + vmovdqu16 %xmm1, 128(%eax) + vmovdqu32 %xmm1, 128(%eax) + vmovdqu64 %xmm1, 128(%eax) + + vmovdqa32 %ymm1, %ymm2 + vmovdqa64 %ymm1, %ymm2 + vmovdqu8 %ymm1, %ymm2 + vmovdqu16 %ymm1, %ymm2 + vmovdqu32 %ymm1, %ymm2 + vmovdqu64 %ymm1, %ymm2 + + vmovdqa32 127(%eax), %ymm2 + vmovdqa64 127(%eax), %ymm2 + vmovdqu8 127(%eax), %ymm2 + vmovdqu16 127(%eax), %ymm2 + vmovdqu32 127(%eax), %ymm2 + vmovdqu64 127(%eax), %ymm2 + + vmovdqa32 %ymm1, 128(%eax) + vmovdqa64 %ymm1, 128(%eax) + vmovdqu8 %ymm1, 128(%eax) + vmovdqu16 %ymm1, 128(%eax) + vmovdqu32 %ymm1, 128(%eax) + vmovdqu64 %ymm1, 128(%eax) diff --git a/gas/testsuite/gas/i386/optimize-1a.d b/gas/testsuite/gas/i386/optimize-1a.d index e6e6d81fe4..cee2383d84 100644 --- a/gas/testsuite/gas/i386/optimize-1a.d +++ b/gas/testsuite/gas/i386/optimize-1a.d @@ -63,4 +63,40 @@ Disassembly of section .text: +[a-f0-9]+: c5 f4 47 e9 kxorw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) #pass diff --git a/gas/testsuite/gas/i386/optimize-2.d b/gas/testsuite/gas/i386/optimize-2.d index e8a516997a..19467f5c01 100644 --- a/gas/testsuite/gas/i386/optimize-2.d +++ b/gas/testsuite/gas/i386/optimize-2.d @@ -17,4 +17,76 @@ Disassembly of section .text: +[a-f0-9]+: f7 c7 7f 00 00 00 test \$0x7f,%edi +[a-f0-9]+: 66 f7 c7 7f 00 test \$0x7f,%di +[a-f0-9]+: c5 f1 55 e9 vandnpd %xmm1,%xmm1,%xmm5 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 48 6f d1 vmovdqa32 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 fd 48 6f d1 vmovdqa64 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7f 48 6f d1 vmovdqu8 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 ff 48 6f d1 vmovdqu16 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7e 48 6f d1 vmovdqu32 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 fe 48 6f d1 vmovdqu64 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7d 29 6f d1 vmovdqa32 %ymm1,%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 6f d1 vmovdqa64 %ymm1,%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 6f d1 vmovdqu8 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 6f d1 vmovdqu16 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 6f d1 vmovdqu32 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 6f d1 vmovdqu64 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7d 29 6f 10 vmovdqa32 \(%eax\),%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 6f 10 vmovdqa64 \(%eax\),%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 6f 10 vmovdqu8 \(%eax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 6f 10 vmovdqu16 \(%eax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 6f 10 vmovdqu32 \(%eax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 6f 10 vmovdqu64 \(%eax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7d 29 7f 08 vmovdqa32 %ymm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 7f 08 vmovdqa64 %ymm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 7f 08 vmovdqu8 %xmm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 7f 08 vmovdqu16 %xmm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 7f 08 vmovdqu32 %xmm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 7f 08 vmovdqu64 %xmm1,\(%eax\)\{%k1\} + +[a-f0-9]+: 62 f1 7d 89 6f d1 vmovdqa32 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 fd 89 6f d1 vmovdqa64 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 7f 89 6f d1 vmovdqu8 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 ff 89 6f d1 vmovdqu16 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 7e 89 6f d1 vmovdqu32 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 fe 89 6f d1 vmovdqu64 %xmm1,%xmm2\{%k1\}\{z\} #pass diff --git a/gas/testsuite/gas/i386/optimize-2.s b/gas/testsuite/gas/i386/optimize-2.s index c9b57a8dd1..0a4fb23167 100644 --- a/gas/testsuite/gas/i386/optimize-2.s +++ b/gas/testsuite/gas/i386/optimize-2.s @@ -13,3 +13,87 @@ _start: test $0x7f, %di vandnpd %zmm1, %zmm1, %zmm5 + + vmovdqa32 %xmm1, %xmm2 + vmovdqa64 %xmm1, %xmm2 + vmovdqu8 %xmm1, %xmm2 + vmovdqu16 %xmm1, %xmm2 + vmovdqu32 %xmm1, %xmm2 + vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 127(%eax), %xmm2 + vmovdqa64 127(%eax), %xmm2 + vmovdqu8 127(%eax), %xmm2 + vmovdqu16 127(%eax), %xmm2 + vmovdqu32 127(%eax), %xmm2 + vmovdqu64 127(%eax), %xmm2 + + vmovdqa32 %xmm1, 128(%eax) + vmovdqa64 %xmm1, 128(%eax) + vmovdqu8 %xmm1, 128(%eax) + vmovdqu16 %xmm1, 128(%eax) + vmovdqu32 %xmm1, 128(%eax) + vmovdqu64 %xmm1, 128(%eax) + + vmovdqa32 %ymm1, %ymm2 + vmovdqa64 %ymm1, %ymm2 + vmovdqu8 %ymm1, %ymm2 + vmovdqu16 %ymm1, %ymm2 + vmovdqu32 %ymm1, %ymm2 + vmovdqu64 %ymm1, %ymm2 + + vmovdqa32 127(%eax), %ymm2 + vmovdqa64 127(%eax), %ymm2 + vmovdqu8 127(%eax), %ymm2 + vmovdqu16 127(%eax), %ymm2 + vmovdqu32 127(%eax), %ymm2 + vmovdqu64 127(%eax), %ymm2 + + vmovdqa32 %ymm1, 128(%eax) + vmovdqa64 %ymm1, 128(%eax) + vmovdqu8 %ymm1, 128(%eax) + vmovdqu16 %ymm1, 128(%eax) + vmovdqu32 %ymm1, 128(%eax) + vmovdqu64 %ymm1, 128(%eax) + + vmovdqa32 %zmm1, %zmm2 + vmovdqa64 %zmm1, %zmm2 + vmovdqu8 %zmm1, %zmm2 + vmovdqu16 %zmm1, %zmm2 + vmovdqu32 %zmm1, %zmm2 + vmovdqu64 %zmm1, %zmm2 + + {evex} vmovdqa32 %ymm1, %ymm2 + {evex} vmovdqa64 %ymm1, %ymm2 + {evex} vmovdqu8 %xmm1, %xmm2 + {evex} vmovdqu16 %xmm1, %xmm2 + {evex} vmovdqu32 %xmm1, %xmm2 + {evex} vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 %ymm1, %ymm2{%k1} + vmovdqa64 %ymm1, %ymm2{%k1} + vmovdqu8 %xmm1, %xmm2{%k1} + vmovdqu16 %xmm1, %xmm2{%k1} + vmovdqu32 %xmm1, %xmm2{%k1} + vmovdqu64 %xmm1, %xmm2{%k1} + + vmovdqa32 (%eax), %ymm2{%k1} + vmovdqa64 (%eax), %ymm2{%k1} + vmovdqu8 (%eax), %xmm2{%k1} + vmovdqu16 (%eax), %xmm2{%k1} + vmovdqu32 (%eax), %xmm2{%k1} + vmovdqu64 (%eax), %xmm2{%k1} + + vmovdqa32 %ymm1, (%eax){%k1} + vmovdqa64 %ymm1, (%eax){%k1} + vmovdqu8 %xmm1, (%eax){%k1} + vmovdqu16 %xmm1, (%eax){%k1} + vmovdqu32 %xmm1, (%eax){%k1} + vmovdqu64 %xmm1, (%eax){%k1} + + vmovdqa32 %xmm1, %xmm2{%k1}{z} + vmovdqa64 %xmm1, %xmm2{%k1}{z} + vmovdqu8 %xmm1, %xmm2{%k1}{z} + vmovdqu16 %xmm1, %xmm2{%k1}{z} + vmovdqu32 %xmm1, %xmm2{%k1}{z} + vmovdqu64 %xmm1, %xmm2{%k1}{z} diff --git a/gas/testsuite/gas/i386/optimize-3.d b/gas/testsuite/gas/i386/optimize-3.d index f251a3626d..cd43243b49 100644 --- a/gas/testsuite/gas/i386/optimize-3.d +++ b/gas/testsuite/gas/i386/optimize-3.d @@ -9,4 +9,10 @@ Disassembly of section .text: 0+ <_start>: +[a-f0-9]+: a9 7f 00 00 00 test \$0x7f,%eax + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 #pass diff --git a/gas/testsuite/gas/i386/optimize-3.s b/gas/testsuite/gas/i386/optimize-3.s index 536bf0cfb2..a70893c15d 100644 --- a/gas/testsuite/gas/i386/optimize-3.s +++ b/gas/testsuite/gas/i386/optimize-3.s @@ -4,3 +4,10 @@ .text _start: {nooptimize} testl $0x7f, %eax + + {nooptimize} vmovdqa32 %ymm1, %ymm2 + {nooptimize} vmovdqa64 %ymm1, %ymm2 + {nooptimize} vmovdqu8 %xmm1, %xmm2 + {nooptimize} vmovdqu16 %xmm1, %xmm2 + {nooptimize} vmovdqu32 %xmm1, %xmm2 + {nooptimize} vmovdqu64 %xmm1, %xmm2 diff --git a/gas/testsuite/gas/i386/optimize-4.d b/gas/testsuite/gas/i386/optimize-4.d index 9f99dadf34..2df84654d6 100644 --- a/gas/testsuite/gas/i386/optimize-4.d +++ b/gas/testsuite/gas/i386/optimize-4.d @@ -62,6 +62,42 @@ Disassembly of section .text: +[a-f0-9]+: c5 f4 47 e9 kxorw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 #pass diff --git a/gas/testsuite/gas/i386/optimize-5.d b/gas/testsuite/gas/i386/optimize-5.d index cfd0df04a4..ecc1ab139a 100644 --- a/gas/testsuite/gas/i386/optimize-5.d +++ b/gas/testsuite/gas/i386/optimize-5.d @@ -62,6 +62,48 @@ Disassembly of section .text: +[a-f0-9]+: c5 f4 47 e9 kxorw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 +[a-f0-9]+: c5 f4 42 e9 kandnw %k1,%k1,%k5 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 #pass diff --git a/gas/testsuite/gas/i386/optimize-5.s b/gas/testsuite/gas/i386/optimize-5.s index 66c762bd3b..77d60edb69 100644 --- a/gas/testsuite/gas/i386/optimize-5.s +++ b/gas/testsuite/gas/i386/optimize-5.s @@ -6,3 +6,10 @@ {evex} vandnpd %zmm1, %zmm1, %zmm5 {evex} vandnpd %ymm1, %ymm1, %ymm5 + + {evex} vmovdqa32 %ymm1, %ymm2 + {evex} vmovdqa64 %ymm1, %ymm2 + {evex} vmovdqu8 %xmm1, %xmm2 + {evex} vmovdqu16 %xmm1, %xmm2 + {evex} vmovdqu32 %xmm1, %xmm2 + {evex} vmovdqu64 %xmm1, %xmm2 diff --git a/gas/testsuite/gas/i386/optimize-7.d b/gas/testsuite/gas/i386/optimize-7.d new file mode 100644 index 0000000000..92ca7a6c75 --- /dev/null +++ b/gas/testsuite/gas/i386/optimize-7.d @@ -0,0 +1,12 @@ +#as: -O2 -march=+noavx +#objdump: -drw +#name: optimized encoding 7 with -O2 + +.*: +file format .* + + +Disassembly of section .text: + +0+ <_start>: + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 +#pass diff --git a/gas/testsuite/gas/i386/optimize-7.s b/gas/testsuite/gas/i386/optimize-7.s new file mode 100644 index 0000000000..261b4afa27 --- /dev/null +++ b/gas/testsuite/gas/i386/optimize-7.s @@ -0,0 +1,6 @@ +# Check instructions with optimized encoding + + .allow_index_reg + .text +_start: + vmovdqa32 %ymm1, %ymm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.d b/gas/testsuite/gas/i386/x86-64-optimize-2.d index fa031e8893..7d7340fae0 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-2.d @@ -106,4 +106,52 @@ Disassembly of section .text: +[a-f0-9]+: 62 e1 f5 08 fb c1 vpsubq %xmm1,%xmm1,%xmm16 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.s b/gas/testsuite/gas/i386/x86-64-optimize-2.s index 10ce788ffb..1275610e55 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-2.s @@ -114,3 +114,59 @@ _start: vpsubq %ymm1, %ymm1, %ymm16 vpsubq %zmm17, %zmm17, %zmm1 vpsubq %ymm17, %ymm17, %ymm1 + + vmovdqa32 %xmm1, %xmm2 + vmovdqa64 %xmm1, %xmm2 + vmovdqu8 %xmm1, %xmm2 + vmovdqu16 %xmm1, %xmm2 + vmovdqu32 %xmm1, %xmm2 + vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 %xmm11, %xmm12 + vmovdqa64 %xmm11, %xmm12 + vmovdqu8 %xmm11, %xmm12 + vmovdqu16 %xmm11, %xmm12 + vmovdqu32 %xmm11, %xmm12 + vmovdqu64 %xmm11, %xmm12 + + vmovdqa32 127(%rax), %xmm2 + vmovdqa64 127(%rax), %xmm2 + vmovdqu8 127(%rax), %xmm2 + vmovdqu16 127(%rax), %xmm2 + vmovdqu32 127(%rax), %xmm2 + vmovdqu64 127(%rax), %xmm2 + + vmovdqa32 %xmm1, 128(%rax) + vmovdqa64 %xmm1, 128(%rax) + vmovdqu8 %xmm1, 128(%rax) + vmovdqu16 %xmm1, 128(%rax) + vmovdqu32 %xmm1, 128(%rax) + vmovdqu64 %xmm1, 128(%rax) + + vmovdqa32 %ymm1, %ymm2 + vmovdqa64 %ymm1, %ymm2 + vmovdqu8 %ymm1, %ymm2 + vmovdqu16 %ymm1, %ymm2 + vmovdqu32 %ymm1, %ymm2 + vmovdqu64 %ymm1, %ymm2 + + vmovdqa32 %ymm11, %ymm12 + vmovdqa64 %ymm11, %ymm12 + vmovdqu8 %ymm11, %ymm12 + vmovdqu16 %ymm11, %ymm12 + vmovdqu32 %ymm11, %ymm12 + vmovdqu64 %ymm11, %ymm12 + + vmovdqa32 127(%rax), %ymm2 + vmovdqa64 127(%rax), %ymm2 + vmovdqu8 127(%rax), %ymm2 + vmovdqu16 127(%rax), %ymm2 + vmovdqu32 127(%rax), %ymm2 + vmovdqu64 127(%rax), %ymm2 + + vmovdqa32 %ymm1, 128(%rax) + vmovdqa64 %ymm1, 128(%rax) + vmovdqu8 %ymm1, 128(%rax) + vmovdqu16 %ymm1, 128(%rax) + vmovdqu32 %ymm1, 128(%rax) + vmovdqu64 %ymm1, 128(%rax) diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2a.d b/gas/testsuite/gas/i386/x86-64-optimize-2a.d index 9c6466d4ae..532a1458bc 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2a.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-2a.d @@ -107,4 +107,52 @@ Disassembly of section .text: +[a-f0-9]+: 62 e1 f5 28 fb c1 vpsubq %ymm1,%ymm1,%ymm16 +[a-f0-9]+: 62 b1 f5 40 fb c9 vpsubq %zmm17,%zmm17,%zmm1 +[a-f0-9]+: 62 b1 f5 20 fb c9 vpsubq %ymm17,%ymm17,%ymm1 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-3.d b/gas/testsuite/gas/i386/x86-64-optimize-3.d index f85c0af05e..74336a4fe2 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-3.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-3.d @@ -25,4 +25,94 @@ Disassembly of section .text: +[a-f0-9]+: 41 f6 c1 7f test \$0x7f,%r9b +[a-f0-9]+: 41 f6 c1 7f test \$0x7f,%r9b +[a-f0-9]+: c5 f1 55 e9 vandnpd %xmm1,%xmm1,%xmm5 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 b1 7d 08 6f d5 vmovdqa32 %xmm21,%xmm2 + +[a-f0-9]+: 62 b1 fd 08 6f d5 vmovdqa64 %xmm21,%xmm2 + +[a-f0-9]+: 62 b1 7f 08 6f d5 vmovdqu8 %xmm21,%xmm2 + +[a-f0-9]+: 62 b1 ff 08 6f d5 vmovdqu16 %xmm21,%xmm2 + +[a-f0-9]+: 62 b1 7e 08 6f d5 vmovdqu32 %xmm21,%xmm2 + +[a-f0-9]+: 62 b1 fe 08 6f d5 vmovdqu64 %xmm21,%xmm2 + +[a-f0-9]+: 62 f1 7d 48 6f d1 vmovdqa32 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 fd 48 6f d1 vmovdqa64 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7f 48 6f d1 vmovdqu8 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 ff 48 6f d1 vmovdqu16 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7e 48 6f d1 vmovdqu32 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 fe 48 6f d1 vmovdqu64 %zmm1,%zmm2 + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7d 29 6f d1 vmovdqa32 %ymm1,%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 6f d1 vmovdqa64 %ymm1,%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 6f d1 vmovdqu8 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 6f d1 vmovdqu16 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 6f d1 vmovdqu32 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 6f d1 vmovdqu64 %xmm1,%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7d 29 6f 10 vmovdqa32 \(%rax\),%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 6f 10 vmovdqa64 \(%rax\),%ymm2\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 6f 10 vmovdqu8 \(%rax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 6f 10 vmovdqu16 \(%rax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 6f 10 vmovdqu32 \(%rax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 6f 10 vmovdqu64 \(%rax\),%xmm2\{%k1\} + +[a-f0-9]+: 62 f1 7d 29 7f 08 vmovdqa32 %ymm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 fd 29 7f 08 vmovdqa64 %ymm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 7f 09 7f 08 vmovdqu8 %xmm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 ff 09 7f 08 vmovdqu16 %xmm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 7e 09 7f 08 vmovdqu32 %xmm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 fe 09 7f 08 vmovdqu64 %xmm1,\(%rax\)\{%k1\} + +[a-f0-9]+: 62 f1 7d 89 6f d1 vmovdqa32 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 fd 89 6f d1 vmovdqa64 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 7f 89 6f d1 vmovdqu8 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 ff 89 6f d1 vmovdqu16 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 7e 89 6f d1 vmovdqu32 %xmm1,%xmm2\{%k1\}\{z\} + +[a-f0-9]+: 62 f1 fe 89 6f d1 vmovdqu64 %xmm1,%xmm2\{%k1\}\{z\} #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-3.s b/gas/testsuite/gas/i386/x86-64-optimize-3.s index 4a52a25ddd..d9c2eb86cb 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-3.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-3.s @@ -21,3 +21,108 @@ _start: test $0x7f, %r9b vandnpd %zmm1, %zmm1, %zmm5 + + vmovdqa32 %xmm1, %xmm2 + vmovdqa64 %xmm1, %xmm2 + vmovdqu8 %xmm1, %xmm2 + vmovdqu16 %xmm1, %xmm2 + vmovdqu32 %xmm1, %xmm2 + vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 %xmm11, %xmm12 + vmovdqa64 %xmm11, %xmm12 + vmovdqu8 %xmm11, %xmm12 + vmovdqu16 %xmm11, %xmm12 + vmovdqu32 %xmm11, %xmm12 + vmovdqu64 %xmm11, %xmm12 + + vmovdqa32 127(%rax), %xmm2 + vmovdqa64 127(%rax), %xmm2 + vmovdqu8 127(%rax), %xmm2 + vmovdqu16 127(%rax), %xmm2 + vmovdqu32 127(%rax), %xmm2 + vmovdqu64 127(%rax), %xmm2 + + vmovdqa32 %xmm1, 128(%rax) + vmovdqa64 %xmm1, 128(%rax) + vmovdqu8 %xmm1, 128(%rax) + vmovdqu16 %xmm1, 128(%rax) + vmovdqu32 %xmm1, 128(%rax) + vmovdqu64 %xmm1, 128(%rax) + + vmovdqa32 %ymm1, %ymm2 + vmovdqa64 %ymm1, %ymm2 + vmovdqu8 %ymm1, %ymm2 + vmovdqu16 %ymm1, %ymm2 + vmovdqu32 %ymm1, %ymm2 + vmovdqu64 %ymm1, %ymm2 + + vmovdqa32 %ymm11, %ymm12 + vmovdqa64 %ymm11, %ymm12 + vmovdqu8 %ymm11, %ymm12 + vmovdqu16 %ymm11, %ymm12 + vmovdqu32 %ymm11, %ymm12 + vmovdqu64 %ymm11, %ymm12 + + vmovdqa32 127(%rax), %ymm2 + vmovdqa64 127(%rax), %ymm2 + vmovdqu8 127(%rax), %ymm2 + vmovdqu16 127(%rax), %ymm2 + vmovdqu32 127(%rax), %ymm2 + vmovdqu64 127(%rax), %ymm2 + + vmovdqa32 %ymm1, 128(%rax) + vmovdqa64 %ymm1, 128(%rax) + vmovdqu8 %ymm1, 128(%rax) + vmovdqu16 %ymm1, 128(%rax) + vmovdqu32 %ymm1, 128(%rax) + vmovdqu64 %ymm1, 128(%rax) + + vmovdqa32 %xmm21, %xmm2 + vmovdqa64 %xmm21, %xmm2 + vmovdqu8 %xmm21, %xmm2 + vmovdqu16 %xmm21, %xmm2 + vmovdqu32 %xmm21, %xmm2 + vmovdqu64 %xmm21, %xmm2 + + vmovdqa32 %zmm1, %zmm2 + vmovdqa64 %zmm1, %zmm2 + vmovdqu8 %zmm1, %zmm2 + vmovdqu16 %zmm1, %zmm2 + vmovdqu32 %zmm1, %zmm2 + vmovdqu64 %zmm1, %zmm2 + + {evex} vmovdqa32 %ymm1, %ymm2 + {evex} vmovdqa64 %ymm1, %ymm2 + {evex} vmovdqu8 %xmm1, %xmm2 + {evex} vmovdqu16 %xmm1, %xmm2 + {evex} vmovdqu32 %xmm1, %xmm2 + {evex} vmovdqu64 %xmm1, %xmm2 + + vmovdqa32 %ymm1, %ymm2{%k1} + vmovdqa64 %ymm1, %ymm2{%k1} + vmovdqu8 %xmm1, %xmm2{%k1} + vmovdqu16 %xmm1, %xmm2{%k1} + vmovdqu32 %xmm1, %xmm2{%k1} + vmovdqu64 %xmm1, %xmm2{%k1} + + vmovdqa32 (%rax), %ymm2{%k1} + vmovdqa64 (%rax), %ymm2{%k1} + vmovdqu8 (%rax), %xmm2{%k1} + vmovdqu16 (%rax), %xmm2{%k1} + vmovdqu32 (%rax), %xmm2{%k1} + vmovdqu64 (%rax), %xmm2{%k1} + + vmovdqa32 %ymm1, (%rax){%k1} + vmovdqa64 %ymm1, (%rax){%k1} + vmovdqu8 %xmm1, (%rax){%k1} + vmovdqu16 %xmm1, (%rax){%k1} + vmovdqu32 %xmm1, (%rax){%k1} + vmovdqu64 %xmm1, (%rax){%k1} + + vmovdqa32 %xmm1, %xmm2{%k1}{z} + vmovdqa64 %xmm1, %xmm2{%k1}{z} + vmovdqu8 %xmm1, %xmm2{%k1}{z} + vmovdqu16 %xmm1, %xmm2{%k1}{z} + vmovdqu32 %xmm1, %xmm2{%k1}{z} + vmovdqu64 %xmm1, %xmm2{%k1}{z} diff --git a/gas/testsuite/gas/i386/x86-64-optimize-4.d b/gas/testsuite/gas/i386/x86-64-optimize-4.d index 10e7b02d3a..18fdeb1442 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-4.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-4.d @@ -9,4 +9,10 @@ Disassembly of section .text: 0+ <_start>: +[a-f0-9]+: a9 7f 00 00 00 test \$0x7f,%eax + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-4.s b/gas/testsuite/gas/i386/x86-64-optimize-4.s index 0c4fdcecc5..b6d872db2c 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-4.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-4.s @@ -4,3 +4,10 @@ .text _start: {nooptimize} testl $0x7f, %eax + + {nooptimize} vmovdqa32 %ymm1, %ymm2 + {nooptimize} vmovdqa64 %ymm1, %ymm2 + {nooptimize} vmovdqu8 %xmm1, %xmm2 + {nooptimize} vmovdqu16 %xmm1, %xmm2 + {nooptimize} vmovdqu32 %xmm1, %xmm2 + {nooptimize} vmovdqu64 %xmm1, %xmm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-5.d b/gas/testsuite/gas/i386/x86-64-optimize-5.d index 085f7f29f2..012237df57 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-5.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-5.d @@ -106,6 +106,60 @@ Disassembly of section .text: +[a-f0-9]+: 62 e1 f5 08 fb c1 vpsubq %xmm1,%xmm1,%xmm16 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-5.s b/gas/testsuite/gas/i386/x86-64-optimize-5.s index 6b4ff103ab..9756ae815c 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-5.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-5.s @@ -4,3 +4,10 @@ {evex} vandnpd %zmm1, %zmm1, %zmm5 {evex} vandnpd %ymm1, %ymm1, %ymm5 + + {evex} vmovdqa32 %ymm1, %ymm2 + {evex} vmovdqa64 %ymm1, %ymm2 + {evex} vmovdqu8 %xmm1, %xmm2 + {evex} vmovdqu16 %xmm1, %xmm2 + {evex} vmovdqu32 %xmm1, %xmm2 + {evex} vmovdqu64 %xmm1, %xmm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-6.d b/gas/testsuite/gas/i386/x86-64-optimize-6.d index 0d52c8fcbb..aca119e4f9 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-6.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-6.d @@ -106,6 +106,60 @@ Disassembly of section .text: +[a-f0-9]+: 62 e1 f5 08 fb c1 vpsubq %xmm1,%xmm1,%xmm16 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 +[a-f0-9]+: 62 b1 f5 00 fb c9 vpsubq %xmm17,%xmm17,%xmm1 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 f9 6f d1 vmovdqa %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c5 fa 6f d1 vmovdqu %xmm1,%xmm2 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 79 6f e3 vmovdqa %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c4 41 7a 6f e3 vmovdqu %xmm11,%xmm12 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 6f 50 7f vmovdqa 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7d 6f e3 vmovdqa %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c4 41 7e 6f e3 vmovdqu %ymm11,%ymm12 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 6f 50 7f vmovdqa 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 fd 28 6f d1 vmovdqa64 %ymm1,%ymm2 + +[a-f0-9]+: 62 f1 7f 08 6f d1 vmovdqu8 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 ff 08 6f d1 vmovdqu16 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 7e 08 6f d1 vmovdqu32 %xmm1,%xmm2 + +[a-f0-9]+: 62 f1 fe 08 6f d1 vmovdqu64 %xmm1,%xmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-6.s b/gas/testsuite/gas/i386/x86-64-optimize-6.s index 70ccbc41be..7c403fcc86 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-6.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-6.s @@ -6,3 +6,10 @@ {evex} vandnpd %zmm1, %zmm1, %zmm5 {evex} vandnpd %ymm1, %ymm1, %ymm5 + + {evex} vmovdqa32 %ymm1, %ymm2 + {evex} vmovdqa64 %ymm1, %ymm2 + {evex} vmovdqu8 %xmm1, %xmm2 + {evex} vmovdqu16 %xmm1, %xmm2 + {evex} vmovdqu32 %xmm1, %xmm2 + {evex} vmovdqu64 %xmm1, %xmm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-8.d b/gas/testsuite/gas/i386/x86-64-optimize-8.d new file mode 100644 index 0000000000..46efa5229d --- /dev/null +++ b/gas/testsuite/gas/i386/x86-64-optimize-8.d @@ -0,0 +1,12 @@ +#as: -O2 -march=+noavx +#objdump: -drw +#name: x86-64 optimized encoding 8 with -O2 + +.*: +file format .* + + +Disassembly of section .text: + +0+ <_start>: + +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 +#pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-8.s b/gas/testsuite/gas/i386/x86-64-optimize-8.s new file mode 100644 index 0000000000..4b9865a91b --- /dev/null +++ b/gas/testsuite/gas/i386/x86-64-optimize-8.s @@ -0,0 +1,6 @@ +# Check 64bit instructions with optimized encoding + + .allow_index_reg + .text +_start: + vmovdqa32 %ymm1, %ymm2 diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl index 1194dcd1c0..26a68d8cbe 100644 --- a/opcodes/i386-opc.tbl +++ b/opcodes/i386-opc.tbl @@ -3709,11 +3709,11 @@ vmovd, 2, 0x666E, None, 1, CpuAVX512F, D|Modrm|EVex=2|VexOpcode=0|Disp8MemShift= vmovddup, 2, 0xF212, None, 1, CpuAVX512F, Modrm|Masking=3|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM|RegZMM|Unspecified|BaseIndex, RegYMM|RegZMM } -vmovdqa64, 2, 0x666F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } -vmovdqa32, 2, 0x666F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqa64, 2, 0x666F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqa32, 2, 0x666F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vmovntdq, 2, 0x66E7, None, 1, CpuAVX512F, Modrm|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM, XMMword|YMMword|ZMMword|Unspecified|BaseIndex } -vmovdqu32, 2, 0xF36F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } -vmovdqu64, 2, 0xF36F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqu32, 2, 0xF36F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqu64, 2, 0xF36F, None, 1, CpuAVX512F, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vmovhlps, 3, 0x12, None, 1, CpuAVX512F, Modrm|EVex=4|VexOpcode=0|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM } vmovlhps, 3, 0x16, None, 1, CpuAVX512F, Modrm|EVex=4|VexOpcode=0|VexVVVV=1|VexW=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM } @@ -4190,8 +4190,8 @@ kshiftrq, 3, 0x6631, None, 1, CpuAVX512BW, Modrm|Vex=1|VexOpcode=2|VexW=2|No_bSu vdbpsadbw, 4, 0x6642, None, 1, CpuAVX512BW, Modrm|Masking=3|VexOpcode=2|VexVVVV=1|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Imm8, RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } -vmovdqu8, 2, 0xF26F, None, 1, CpuAVX512BW, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } -vmovdqu16, 2, 0xF26F, None, 1, CpuAVX512BW, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqu8, 2, 0xF26F, None, 1, CpuAVX512BW, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } +vmovdqu16, 2, 0xF26F, None, 1, CpuAVX512BW, D|Modrm|MaskingMorZ|VexOpcode=0|VexW=2|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|Optimize, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vpabsb, 2, 0x661C, None, 1, CpuAVX512BW, Modrm|Masking=3|VexOpcode=1|VexWIG|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM } vpmaxsb, 3, 0x663C, None, 1, CpuAVX512BW, Modrm|Masking=3|VexOpcode=1|VexWIG|VexVVVV=1|Disp8ShiftVL|CheckRegSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM } diff --git a/opcodes/i386-tbl.h b/opcodes/i386-tbl.h index 81575df3f2..bd33eb5ce5 100644 --- a/opcodes/i386-tbl.h +++ b/opcodes/i386-tbl.h @@ -60123,7 +60123,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, @@ -60139,7 +60139,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, @@ -60155,7 +60155,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, @@ -60171,7 +60171,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, @@ -63555,7 +63555,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, @@ -63571,7 +63571,7 @@ const insn_template i386_optab[] = 0, 0, 0, 0, 0, 0 } }, { 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, - 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0 }, + 2, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 7, 0, 0, 1, 0, 0, 0, 0, 0 }, { { { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0 } }, -- 2.20.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: V2 [PATCH] x86: Optimize EVEX vector load/store instructions 2019-03-17 20:47 ` V2 " H.J. Lu @ 2019-03-18 13:49 ` Jan Beulich 2019-03-19 6:21 ` [PATCH] x86: Correct EVEX vector load/store optimization H.J. Lu 0 siblings, 1 reply; 7+ messages in thread From: Jan Beulich @ 2019-03-18 13:49 UTC (permalink / raw) To: H.J. Lu; +Cc: binutils >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote: > --- a/gas/config/tc-i386.c > +++ b/gas/config/tc-i386.c > @@ -4075,6 +4075,56 @@ optimize_encoding (void) > i.types[j].bitfield.ymmword = 0; > } > } > + else if ((cpu_arch_flags.bitfield.cpuavx > + || cpu_arch_isa_flags.bitfield.cpuavx) Once again a questionable condition, as per earlier replies to other patches of yours. > + && i.vec_encoding != vex_encoding_evex > + && !i.types[0].bitfield.zmmword > + && !i.mask > + && is_evex_encoding (&i.tm) > + && (i.tm.base_opcode == 0x666f > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f > + || i.tm.base_opcode == 0xf36f > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f > + || i.tm.base_opcode == 0xf26f > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) All three of these can be expressed with just a single comparison, using & or | instead of ^ and (if necessary) adjusting the literal value compared against. > + && i.tm.extension_opcode == None) > + { > + /* Optimize: -O1: > + VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16, > + vmovdqu32 and vmovdqu64: > + EVEX VOP %xmmM, %xmmN > + -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16) > + EVEX VOP %ymmM, %ymmN > + -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16) > + EVEX VOP %xmmM, mem > + -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16) > + EVEX VOP %ymmM, mem > + -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16) > + EVEX VOP mem, %xmmN > + -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16) There's some confusion on this line. > + EVEX VOP mem, %ymmN > + -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16) > + */ For the variants with a memory operand I doubt the conversion is always a win, and it may be against the user request in case of -Os. This is because of the Disp8 scaling the EVEX encoding permits. > + if (i.tm.base_opcode == 0xf26f) > + i.tm.base_opcode = 0xf36f; > + else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) > + i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD; This again can be expressed without "else if()" afaict. Jan ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH] x86: Correct EVEX vector load/store optimization 2019-03-18 13:49 ` Jan Beulich @ 2019-03-19 6:21 ` H.J. Lu 2019-03-19 8:30 ` Jan Beulich 0 siblings, 1 reply; 7+ messages in thread From: H.J. Lu @ 2019-03-19 6:21 UTC (permalink / raw) To: Jan Beulich; +Cc: Binutils [-- Attachment #1: Type: text/plain, Size: 2568 bytes --] On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote: > > >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote: > > --- a/gas/config/tc-i386.c > > +++ b/gas/config/tc-i386.c > > @@ -4075,6 +4075,56 @@ optimize_encoding (void) > > i.types[j].bitfield.ymmword = 0; > > } > > } > > + else if ((cpu_arch_flags.bitfield.cpuavx > > + || cpu_arch_isa_flags.bitfield.cpuavx) > > Once again a questionable condition, as per earlier replies to > other patches of yours. Fixed. > > + && i.vec_encoding != vex_encoding_evex > > + && !i.types[0].bitfield.zmmword > > + && !i.mask > > + && is_evex_encoding (&i.tm) > > + && (i.tm.base_opcode == 0x666f > > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f > > + || i.tm.base_opcode == 0xf36f > > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f > > + || i.tm.base_opcode == 0xf26f > > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) > > All three of these can be expressed with just a single comparison, > using & or | instead of ^ and (if necessary) adjusting the literal > value compared against. Fixed. > > + && i.tm.extension_opcode == None) > > + { > > + /* Optimize: -O1: > > + VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16, > > + vmovdqu32 and vmovdqu64: > > + EVEX VOP %xmmM, %xmmN > > + -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16) > > + EVEX VOP %ymmM, %ymmN > > + -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16) > > + EVEX VOP %xmmM, mem > > + -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16) > > + EVEX VOP %ymmM, mem > > + -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16) > > + EVEX VOP mem, %xmmN > > + -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16) > > There's some confusion on this line. > > > + EVEX VOP mem, %ymmN > > + -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16) > > + */ > > For the variants with a memory operand I doubt the conversion > is always a win, and it may be against the user request in case of > -Os. This is because of the Disp8 scaling the EVEX encoding permits. Fixed. > > + if (i.tm.base_opcode == 0xf26f) > > + i.tm.base_opcode = 0xf36f; > > + else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) > > + i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD; > > This again can be expressed without "else if()" afaict. > Fixed. Here is the patch. Thanks. -- H.J. [-- Attachment #2: 0001-x86-Correct-EVEX-vector-load-store-optimization.patch --] [-- Type: text/x-patch, Size: 34028 bytes --] From 84ecabf0624411c1ab95bfadbd864aa4b226b2e8 Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Tue, 19 Mar 2019 10:56:39 +0800 Subject: [PATCH] x86: Correct EVEX vector load/store optimization Update EVEX vector load/store optimization: 1. There is no need to check AVX since AVX2 is required for AVX512F. 2. We need to check both operands for ZMM register since AT&T syntax may not set zmmword on the first operand. 3. Update Opcode_SIMD_IntD check and set. 4. Since the VEX prefix has 2 or 3 bytes, the EVEX prefix has 4 bytes, EVEX Disp8 has 1 byte and VEX Disp32 has 4 bytes, we choose EVEX Disp8 over VEX Disp32. * config/tc-i386.c (optimize_encoding): Don't check AVX for EVEX vector load/store optimization. Check both operands for ZMM register. Update EVEX vector load/store opcode check. Choose EVEX Disp8 over VEX Disp32. * testsuite/gas/i386/optimize-1.d: Updated. * testsuite/gas/i386/optimize-1a.d: Likewise. * testsuite/gas/i386/optimize-2.d: Likewise. * testsuite/gas/i386/optimize-4.d: Likewise. * testsuite/gas/i386/optimize-5.d: Likewise. * testsuite/gas/i386/x86-64-optimize-2.d: Likewise. * testsuite/gas/i386/x86-64-optimize-2a.d: Likewise. * testsuite/gas/i386/x86-64-optimize-2b.d: Likewise. * testsuite/gas/i386/x86-64-optimize-3.d: Likewise. * testsuite/gas/i386/x86-64-optimize-5.d: Likewise. * testsuite/gas/i386/x86-64-optimize-6.d: Likewise. * testsuite/gas/i386/optimize-1.s: Add ZMM register load test. * testsuite/gas/i386/x86-64-optimize-2.s: Likewise. --- gas/config/tc-i386.c | 46 +++++++++++++++------ gas/testsuite/gas/i386/optimize-1.d | 25 +++++------ gas/testsuite/gas/i386/optimize-1.s | 2 + gas/testsuite/gas/i386/optimize-1a.d | 25 +++++------ gas/testsuite/gas/i386/optimize-2.d | 24 +++++------ gas/testsuite/gas/i386/optimize-4.d | 25 +++++------ gas/testsuite/gas/i386/optimize-5.d | 25 +++++------ gas/testsuite/gas/i386/x86-64-optimize-2.d | 25 +++++------ gas/testsuite/gas/i386/x86-64-optimize-2.s | 2 + gas/testsuite/gas/i386/x86-64-optimize-2a.d | 25 +++++------ gas/testsuite/gas/i386/x86-64-optimize-2b.d | 25 +++++------ gas/testsuite/gas/i386/x86-64-optimize-3.d | 24 +++++------ gas/testsuite/gas/i386/x86-64-optimize-5.d | 25 +++++------ gas/testsuite/gas/i386/x86-64-optimize-6.d | 25 +++++------ 14 files changed, 178 insertions(+), 145 deletions(-) diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index 3885728de7..3447fe0fa3 100644 --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -4068,18 +4068,14 @@ optimize_encoding (void) i.types[j].bitfield.ymmword = 0; } } - else if ((cpu_arch_flags.bitfield.cpuavx - || cpu_arch_isa_flags.bitfield.cpuavx) - && i.vec_encoding != vex_encoding_evex + else if (i.vec_encoding != vex_encoding_evex && !i.types[0].bitfield.zmmword + && !i.types[1].bitfield.zmmword && !i.mask && is_evex_encoding (&i.tm) - && (i.tm.base_opcode == 0x666f - || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f - || i.tm.base_opcode == 0xf36f - || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f - || i.tm.base_opcode == 0xf26f - || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) + && ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0x666f + || (i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf36f + || (i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f) && i.tm.extension_opcode == None) { /* Optimize: -O1: @@ -4098,10 +4094,34 @@ optimize_encoding (void) EVEX VOP mem, %ymmN -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16) */ - if (i.tm.base_opcode == 0xf26f) - i.tm.base_opcode = 0xf36f; - else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) - i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD; + for (j = 0; j < 2; j++) + if (operand_type_check (i.types[j], disp) + && i.op[j].disps->X_op == O_constant) + { + /* Since the VEX prefix has 2 or 3 bytes, the EVEX prefix + has 4 bytes, EVEX Disp8 has 1 byte and VEX Disp32 has 4 + bytes, we choose EVEX Disp8 over VEX Disp32. */ + int evex_disp8, vex_disp8; + unsigned int memshift = i.memshift; + offsetT n = i.op[j].disps->X_add_number; + + evex_disp8 = fits_in_disp8 (n); + i.memshift = 0; + vex_disp8 = fits_in_disp8 (n); + if (evex_disp8 != vex_disp8) + { + i.memshift = memshift; + return; + } + + i.types[j].bitfield.disp8 = vex_disp8; + break; + } + if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f) + { + i.tm.base_opcode &= Opcode_SIMD_IntD; + i.tm.base_opcode |= 0xf36f; + } i.tm.opcode_modifier.vex = i.types[0].bitfield.ymmword ? VEX256 : VEX128; i.tm.opcode_modifier.vexw = VEXW0; diff --git a/gas/testsuite/gas/i386/optimize-1.d b/gas/testsuite/gas/i386/optimize-1.d index 70c802c002..2f40c72a4e 100644 --- a/gas/testsuite/gas/i386/optimize-1.d +++ b/gas/testsuite/gas/i386/optimize-1.d @@ -74,12 +74,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -92,10 +92,11 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2 #pass diff --git a/gas/testsuite/gas/i386/optimize-1.s b/gas/testsuite/gas/i386/optimize-1.s index 6dcfbc2799..4c15d16c2a 100644 --- a/gas/testsuite/gas/i386/optimize-1.s +++ b/gas/testsuite/gas/i386/optimize-1.s @@ -114,3 +114,5 @@ _start: vmovdqu16 %ymm1, 128(%eax) vmovdqu32 %ymm1, 128(%eax) vmovdqu64 %ymm1, 128(%eax) + + vmovdqa32 (%eax), %zmm2 diff --git a/gas/testsuite/gas/i386/optimize-1a.d b/gas/testsuite/gas/i386/optimize-1a.d index cee2383d84..d7c253a6fa 100644 --- a/gas/testsuite/gas/i386/optimize-1a.d +++ b/gas/testsuite/gas/i386/optimize-1a.d @@ -75,12 +75,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -93,10 +93,11 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2 #pass diff --git a/gas/testsuite/gas/i386/optimize-2.d b/gas/testsuite/gas/i386/optimize-2.d index 19467f5c01..ed61dec6fa 100644 --- a/gas/testsuite/gas/i386/optimize-2.d +++ b/gas/testsuite/gas/i386/optimize-2.d @@ -29,12 +29,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -47,12 +47,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\) +[a-f0-9]+: 62 f1 7d 48 6f d1 vmovdqa32 %zmm1,%zmm2 +[a-f0-9]+: 62 f1 fd 48 6f d1 vmovdqa64 %zmm1,%zmm2 +[a-f0-9]+: 62 f1 7f 48 6f d1 vmovdqu8 %zmm1,%zmm2 diff --git a/gas/testsuite/gas/i386/optimize-4.d b/gas/testsuite/gas/i386/optimize-4.d index 2df84654d6..f062ad7717 100644 --- a/gas/testsuite/gas/i386/optimize-4.d +++ b/gas/testsuite/gas/i386/optimize-4.d @@ -74,12 +74,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -92,12 +92,13 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 #pass diff --git a/gas/testsuite/gas/i386/optimize-5.d b/gas/testsuite/gas/i386/optimize-5.d index ecc1ab139a..fdf5561af8 100644 --- a/gas/testsuite/gas/i386/optimize-5.d +++ b/gas/testsuite/gas/i386/optimize-5.d @@ -74,12 +74,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -92,12 +92,13 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.d b/gas/testsuite/gas/i386/x86-64-optimize-2.d index 067df076f7..45b98ae694 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-2.d @@ -124,12 +124,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -148,10 +148,11 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.s b/gas/testsuite/gas/i386/x86-64-optimize-2.s index 1275610e55..e5d298225a 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-2.s @@ -170,3 +170,5 @@ _start: vmovdqu16 %ymm1, 128(%rax) vmovdqu32 %ymm1, 128(%rax) vmovdqu64 %ymm1, 128(%rax) + + vmovdqa32 (%rax), %zmm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2a.d b/gas/testsuite/gas/i386/x86-64-optimize-2a.d index 532a1458bc..39385b96ec 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2a.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-2a.d @@ -125,12 +125,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -149,10 +149,11 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2b.d b/gas/testsuite/gas/i386/x86-64-optimize-2b.d index 09474a1016..3eb3a59eac 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2b.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-2b.d @@ -124,12 +124,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -148,10 +148,11 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-3.d b/gas/testsuite/gas/i386/x86-64-optimize-3.d index 74336a4fe2..5e2832df4c 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-3.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-3.d @@ -43,12 +43,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -67,12 +67,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) +[a-f0-9]+: 62 b1 7d 08 6f d5 vmovdqa32 %xmm21,%xmm2 +[a-f0-9]+: 62 b1 fd 08 6f d5 vmovdqa64 %xmm21,%xmm2 +[a-f0-9]+: 62 b1 7f 08 6f d5 vmovdqu8 %xmm21,%xmm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-5.d b/gas/testsuite/gas/i386/x86-64-optimize-5.d index 012237df57..5065d650d4 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-5.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-5.d @@ -124,12 +124,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -148,12 +148,13 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-6.d b/gas/testsuite/gas/i386/x86-64-optimize-6.d index aca119e4f9..8ebd9b2475 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-6.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-6.d @@ -124,12 +124,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -148,12 +148,13 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 -- 2.20.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86: Correct EVEX vector load/store optimization 2019-03-19 6:21 ` [PATCH] x86: Correct EVEX vector load/store optimization H.J. Lu @ 2019-03-19 8:30 ` Jan Beulich 2019-03-19 8:48 ` H.J. Lu 0 siblings, 1 reply; 7+ messages in thread From: Jan Beulich @ 2019-03-19 8:30 UTC (permalink / raw) To: H.J. Lu; +Cc: binutils >>> On 19.03.19 at 07:20, <hjl.tools@gmail.com> wrote: > On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote: >> >> >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote: >> > --- a/gas/config/tc-i386.c >> > +++ b/gas/config/tc-i386.c >> > @@ -4075,6 +4075,56 @@ optimize_encoding (void) >> > i.types[j].bitfield.ymmword = 0; >> > } >> > } >> > + else if ((cpu_arch_flags.bitfield.cpuavx >> > + || cpu_arch_isa_flags.bitfield.cpuavx) >> >> Once again a questionable condition, as per earlier replies to >> other patches of yours. > > Fixed. > >> > + && i.vec_encoding != vex_encoding_evex >> > + && !i.types[0].bitfield.zmmword >> > + && !i.mask >> > + && is_evex_encoding (&i.tm) >> > + && (i.tm.base_opcode == 0x666f >> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f >> > + || i.tm.base_opcode == 0xf36f >> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f >> > + || i.tm.base_opcode == 0xf26f >> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) >> >> All three of these can be expressed with just a single comparison, >> using & or | instead of ^ and (if necessary) adjusting the literal >> value compared against. > > Fixed. > >> > + && i.tm.extension_opcode == None) >> > + { >> > + /* Optimize: -O1: >> > + VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16, >> > + vmovdqu32 and vmovdqu64: >> > + EVEX VOP %xmmM, %xmmN >> > + -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16) >> > + EVEX VOP %ymmM, %ymmN >> > + -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16) >> > + EVEX VOP %xmmM, mem >> > + -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16) >> > + EVEX VOP %ymmM, mem >> > + -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16) >> > + EVEX VOP mem, %xmmN >> > + -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16) >> >> There's some confusion on this line. >> >> > + EVEX VOP mem, %ymmN >> > + -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16) >> > + */ >> >> For the variants with a memory operand I doubt the conversion >> is always a win, and it may be against the user request in case of >> -Os. This is because of the Disp8 scaling the EVEX encoding permits. > > Fixed. > >> > + if (i.tm.base_opcode == 0xf26f) >> > + i.tm.base_opcode = 0xf36f; >> > + else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) >> > + i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD; >> >> This again can be expressed without "else if()" afaict. >> > > Fixed. > > Here is the patch. Thanks. >--- a/gas/config/tc-i386.c >+++ b/gas/config/tc-i386.c >@@ -4068,18 +4068,14 @@ optimize_encoding (void) > i.types[j].bitfield.ymmword = 0; > } > } >- else if ((cpu_arch_flags.bitfield.cpuavx >- || cpu_arch_isa_flags.bitfield.cpuavx) >- && i.vec_encoding != vex_encoding_evex >+ else if (i.vec_encoding != vex_encoding_evex > && !i.types[0].bitfield.zmmword Ah, here the remaining cpuavx goes away as well. >+ if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f) >+ { >+ i.tm.base_opcode &= Opcode_SIMD_IntD; >+ i.tm.base_opcode |= 0xf36f; >+ } How about the even simpler if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f) i.tm.base_opcode ^= 0xf36f ^ 0xf26f; ? Jan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86: Correct EVEX vector load/store optimization 2019-03-19 8:30 ` Jan Beulich @ 2019-03-19 8:48 ` H.J. Lu 2019-03-19 8:52 ` Jan Beulich 0 siblings, 1 reply; 7+ messages in thread From: H.J. Lu @ 2019-03-19 8:48 UTC (permalink / raw) To: Jan Beulich; +Cc: Binutils [-- Attachment #1: Type: text/plain, Size: 3814 bytes --] On Tue, Mar 19, 2019 at 4:30 PM Jan Beulich <JBeulich@suse.com> wrote: > > >>> On 19.03.19 at 07:20, <hjl.tools@gmail.com> wrote: > > On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote: > >> > >> >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote: > >> > --- a/gas/config/tc-i386.c > >> > +++ b/gas/config/tc-i386.c > >> > @@ -4075,6 +4075,56 @@ optimize_encoding (void) > >> > i.types[j].bitfield.ymmword = 0; > >> > } > >> > } > >> > + else if ((cpu_arch_flags.bitfield.cpuavx > >> > + || cpu_arch_isa_flags.bitfield.cpuavx) > >> > >> Once again a questionable condition, as per earlier replies to > >> other patches of yours. > > > > Fixed. > > > >> > + && i.vec_encoding != vex_encoding_evex > >> > + && !i.types[0].bitfield.zmmword > >> > + && !i.mask > >> > + && is_evex_encoding (&i.tm) > >> > + && (i.tm.base_opcode == 0x666f > >> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f > >> > + || i.tm.base_opcode == 0xf36f > >> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f > >> > + || i.tm.base_opcode == 0xf26f > >> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) > >> > >> All three of these can be expressed with just a single comparison, > >> using & or | instead of ^ and (if necessary) adjusting the literal > >> value compared against. > > > > Fixed. > > > >> > + && i.tm.extension_opcode == None) > >> > + { > >> > + /* Optimize: -O1: > >> > + VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16, > >> > + vmovdqu32 and vmovdqu64: > >> > + EVEX VOP %xmmM, %xmmN > >> > + -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16) > >> > + EVEX VOP %ymmM, %ymmN > >> > + -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16) > >> > + EVEX VOP %xmmM, mem > >> > + -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16) > >> > + EVEX VOP %ymmM, mem > >> > + -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16) > >> > + EVEX VOP mem, %xmmN > >> > + -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16) > >> > >> There's some confusion on this line. > >> > >> > + EVEX VOP mem, %ymmN > >> > + -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16) > >> > + */ > >> > >> For the variants with a memory operand I doubt the conversion > >> is always a win, and it may be against the user request in case of > >> -Os. This is because of the Disp8 scaling the EVEX encoding permits. > > > > Fixed. > > > >> > + if (i.tm.base_opcode == 0xf26f) > >> > + i.tm.base_opcode = 0xf36f; > >> > + else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) > >> > + i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD; > >> > >> This again can be expressed without "else if()" afaict. > >> > > > > Fixed. > > > > Here is the patch. > > Thanks. > > >--- a/gas/config/tc-i386.c > >+++ b/gas/config/tc-i386.c > >@@ -4068,18 +4068,14 @@ optimize_encoding (void) > > i.types[j].bitfield.ymmword = 0; > > } > > } > >- else if ((cpu_arch_flags.bitfield.cpuavx > >- || cpu_arch_isa_flags.bitfield.cpuavx) > >- && i.vec_encoding != vex_encoding_evex > >+ else if (i.vec_encoding != vex_encoding_evex > > && !i.types[0].bitfield.zmmword > > Ah, here the remaining cpuavx goes away as well. > > >+ if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f) > >+ { > >+ i.tm.base_opcode &= Opcode_SIMD_IntD; > >+ i.tm.base_opcode |= 0xf36f; > >+ } > > How about the even simpler > > if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f) > i.tm.base_opcode ^= 0xf36f ^ 0xf26f; > It works. I am going to check in this patch together with other 2. Thanks. -- H.J. [-- Attachment #2: 0001-x86-Correct-EVEX-vector-load-store-optimization.patch --] [-- Type: text/x-patch, Size: 33996 bytes --] From 177fca87fa53139e3a409876c0d9333e6b33780c Mon Sep 17 00:00:00 2001 From: "H.J. Lu" <hjl.tools@gmail.com> Date: Tue, 19 Mar 2019 10:56:39 +0800 Subject: [PATCH] x86: Correct EVEX vector load/store optimization Update EVEX vector load/store optimization: 1. There is no need to check AVX since AVX2 is required for AVX512F. 2. We need to check both operands for ZMM register since AT&T syntax may not set zmmword on the first operand. 3. Update Opcode_SIMD_IntD check and set. 4. Since the VEX prefix has 2 or 3 bytes, the EVEX prefix has 4 bytes, EVEX Disp8 has 1 byte and VEX Disp32 has 4 bytes, we choose EVEX Disp8 over VEX Disp32. * config/tc-i386.c (optimize_encoding): Don't check AVX for EVEX vector load/store optimization. Check both operands for ZMM register. Update EVEX vector load/store opcode check. Choose EVEX Disp8 over VEX Disp32. * testsuite/gas/i386/optimize-1.d: Updated. * testsuite/gas/i386/optimize-1a.d: Likewise. * testsuite/gas/i386/optimize-2.d: Likewise. * testsuite/gas/i386/optimize-4.d: Likewise. * testsuite/gas/i386/optimize-5.d: Likewise. * testsuite/gas/i386/x86-64-optimize-2.d: Likewise. * testsuite/gas/i386/x86-64-optimize-2a.d: Likewise. * testsuite/gas/i386/x86-64-optimize-2b.d: Likewise. * testsuite/gas/i386/x86-64-optimize-3.d: Likewise. * testsuite/gas/i386/x86-64-optimize-5.d: Likewise. * testsuite/gas/i386/x86-64-optimize-6.d: Likewise. * testsuite/gas/i386/optimize-1.s: Add ZMM register load test. * testsuite/gas/i386/x86-64-optimize-2.s: Likewise. --- gas/config/tc-i386.c | 43 ++++++++++++++------- gas/testsuite/gas/i386/optimize-1.d | 25 ++++++------ gas/testsuite/gas/i386/optimize-1.s | 2 + gas/testsuite/gas/i386/optimize-1a.d | 25 ++++++------ gas/testsuite/gas/i386/optimize-2.d | 24 ++++++------ gas/testsuite/gas/i386/optimize-4.d | 25 ++++++------ gas/testsuite/gas/i386/optimize-5.d | 25 ++++++------ gas/testsuite/gas/i386/x86-64-optimize-2.d | 25 ++++++------ gas/testsuite/gas/i386/x86-64-optimize-2.s | 2 + gas/testsuite/gas/i386/x86-64-optimize-2a.d | 25 ++++++------ gas/testsuite/gas/i386/x86-64-optimize-2b.d | 25 ++++++------ gas/testsuite/gas/i386/x86-64-optimize-3.d | 24 ++++++------ gas/testsuite/gas/i386/x86-64-optimize-5.d | 25 ++++++------ gas/testsuite/gas/i386/x86-64-optimize-6.d | 25 ++++++------ 14 files changed, 175 insertions(+), 145 deletions(-) diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c index 3885728de7..690fd23ff0 100644 --- a/gas/config/tc-i386.c +++ b/gas/config/tc-i386.c @@ -4068,18 +4068,14 @@ optimize_encoding (void) i.types[j].bitfield.ymmword = 0; } } - else if ((cpu_arch_flags.bitfield.cpuavx - || cpu_arch_isa_flags.bitfield.cpuavx) - && i.vec_encoding != vex_encoding_evex + else if (i.vec_encoding != vex_encoding_evex && !i.types[0].bitfield.zmmword + && !i.types[1].bitfield.zmmword && !i.mask && is_evex_encoding (&i.tm) - && (i.tm.base_opcode == 0x666f - || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f - || i.tm.base_opcode == 0xf36f - || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f - || i.tm.base_opcode == 0xf26f - || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) + && ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0x666f + || (i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf36f + || (i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f) && i.tm.extension_opcode == None) { /* Optimize: -O1: @@ -4098,10 +4094,31 @@ optimize_encoding (void) EVEX VOP mem, %ymmN -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16) */ - if (i.tm.base_opcode == 0xf26f) - i.tm.base_opcode = 0xf36f; - else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) - i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD; + for (j = 0; j < 2; j++) + if (operand_type_check (i.types[j], disp) + && i.op[j].disps->X_op == O_constant) + { + /* Since the VEX prefix has 2 or 3 bytes, the EVEX prefix + has 4 bytes, EVEX Disp8 has 1 byte and VEX Disp32 has 4 + bytes, we choose EVEX Disp8 over VEX Disp32. */ + int evex_disp8, vex_disp8; + unsigned int memshift = i.memshift; + offsetT n = i.op[j].disps->X_add_number; + + evex_disp8 = fits_in_disp8 (n); + i.memshift = 0; + vex_disp8 = fits_in_disp8 (n); + if (evex_disp8 != vex_disp8) + { + i.memshift = memshift; + return; + } + + i.types[j].bitfield.disp8 = vex_disp8; + break; + } + if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f) + i.tm.base_opcode ^= 0xf36f ^ 0xf26f; i.tm.opcode_modifier.vex = i.types[0].bitfield.ymmword ? VEX256 : VEX128; i.tm.opcode_modifier.vexw = VEXW0; diff --git a/gas/testsuite/gas/i386/optimize-1.d b/gas/testsuite/gas/i386/optimize-1.d index 70c802c002..2f40c72a4e 100644 --- a/gas/testsuite/gas/i386/optimize-1.d +++ b/gas/testsuite/gas/i386/optimize-1.d @@ -74,12 +74,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -92,10 +92,11 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2 #pass diff --git a/gas/testsuite/gas/i386/optimize-1.s b/gas/testsuite/gas/i386/optimize-1.s index 6dcfbc2799..4c15d16c2a 100644 --- a/gas/testsuite/gas/i386/optimize-1.s +++ b/gas/testsuite/gas/i386/optimize-1.s @@ -114,3 +114,5 @@ _start: vmovdqu16 %ymm1, 128(%eax) vmovdqu32 %ymm1, 128(%eax) vmovdqu64 %ymm1, 128(%eax) + + vmovdqa32 (%eax), %zmm2 diff --git a/gas/testsuite/gas/i386/optimize-1a.d b/gas/testsuite/gas/i386/optimize-1a.d index cee2383d84..d7c253a6fa 100644 --- a/gas/testsuite/gas/i386/optimize-1a.d +++ b/gas/testsuite/gas/i386/optimize-1a.d @@ -75,12 +75,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -93,10 +93,11 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2 #pass diff --git a/gas/testsuite/gas/i386/optimize-2.d b/gas/testsuite/gas/i386/optimize-2.d index 19467f5c01..ed61dec6fa 100644 --- a/gas/testsuite/gas/i386/optimize-2.d +++ b/gas/testsuite/gas/i386/optimize-2.d @@ -29,12 +29,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -47,12 +47,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\) +[a-f0-9]+: 62 f1 7d 48 6f d1 vmovdqa32 %zmm1,%zmm2 +[a-f0-9]+: 62 f1 fd 48 6f d1 vmovdqa64 %zmm1,%zmm2 +[a-f0-9]+: 62 f1 7f 48 6f d1 vmovdqu8 %zmm1,%zmm2 diff --git a/gas/testsuite/gas/i386/optimize-4.d b/gas/testsuite/gas/i386/optimize-4.d index 2df84654d6..f062ad7717 100644 --- a/gas/testsuite/gas/i386/optimize-4.d +++ b/gas/testsuite/gas/i386/optimize-4.d @@ -74,12 +74,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -92,12 +92,13 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 #pass diff --git a/gas/testsuite/gas/i386/optimize-5.d b/gas/testsuite/gas/i386/optimize-5.d index ecc1ab139a..fdf5561af8 100644 --- a/gas/testsuite/gas/i386/optimize-5.d +++ b/gas/testsuite/gas/i386/optimize-5.d @@ -74,12 +74,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%eax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%eax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -92,12 +92,13 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%eax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%eax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%eax\),%zmm2 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.d b/gas/testsuite/gas/i386/x86-64-optimize-2.d index 067df076f7..45b98ae694 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-2.d @@ -124,12 +124,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -148,10 +148,11 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2.s b/gas/testsuite/gas/i386/x86-64-optimize-2.s index 1275610e55..e5d298225a 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2.s +++ b/gas/testsuite/gas/i386/x86-64-optimize-2.s @@ -170,3 +170,5 @@ _start: vmovdqu16 %ymm1, 128(%rax) vmovdqu32 %ymm1, 128(%rax) vmovdqu64 %ymm1, 128(%rax) + + vmovdqa32 (%rax), %zmm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2a.d b/gas/testsuite/gas/i386/x86-64-optimize-2a.d index 532a1458bc..39385b96ec 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2a.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-2a.d @@ -125,12 +125,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -149,10 +149,11 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-2b.d b/gas/testsuite/gas/i386/x86-64-optimize-2b.d index 09474a1016..3eb3a59eac 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-2b.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-2b.d @@ -124,12 +124,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -148,10 +148,11 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2 #pass diff --git a/gas/testsuite/gas/i386/x86-64-optimize-3.d b/gas/testsuite/gas/i386/x86-64-optimize-3.d index 74336a4fe2..5e2832df4c 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-3.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-3.d @@ -43,12 +43,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -67,12 +67,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) +[a-f0-9]+: 62 b1 7d 08 6f d5 vmovdqa32 %xmm21,%xmm2 +[a-f0-9]+: 62 b1 fd 08 6f d5 vmovdqa64 %xmm21,%xmm2 +[a-f0-9]+: 62 b1 7f 08 6f d5 vmovdqu8 %xmm21,%xmm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-5.d b/gas/testsuite/gas/i386/x86-64-optimize-5.d index 012237df57..5065d650d4 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-5.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-5.d @@ -124,12 +124,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -148,12 +148,13 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 diff --git a/gas/testsuite/gas/i386/x86-64-optimize-6.d b/gas/testsuite/gas/i386/x86-64-optimize-6.d index aca119e4f9..8ebd9b2475 100644 --- a/gas/testsuite/gas/i386/x86-64-optimize-6.d +++ b/gas/testsuite/gas/i386/x86-64-optimize-6.d @@ -124,12 +124,12 @@ Disassembly of section .text: +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 +[a-f0-9]+: c5 fa 6f 50 7f vmovdqu 0x7f\(%rax\),%xmm2 - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 f9 7f 88 80 00 00 00 vmovdqa %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) - +[a-f0-9]+: c5 fa 7f 88 80 00 00 00 vmovdqu %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 08 7f 48 08 vmovdqa32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 08 7f 48 08 vmovdqa64 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 08 7f 48 08 vmovdqu8 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 08 7f 48 08 vmovdqu16 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 08 7f 48 08 vmovdqu32 %xmm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 08 7f 48 08 vmovdqu64 %xmm1,0x80\(%rax\) +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fd 6f d1 vmovdqa %ymm1,%ymm2 +[a-f0-9]+: c5 fe 6f d1 vmovdqu %ymm1,%ymm2 @@ -148,12 +148,13 @@ Disassembly of section .text: +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 +[a-f0-9]+: c5 fe 6f 50 7f vmovdqu 0x7f\(%rax\),%ymm2 - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fd 7f 88 80 00 00 00 vmovdqa %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) - +[a-f0-9]+: c5 fe 7f 88 80 00 00 00 vmovdqu %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 28 7f 48 04 vmovdqa32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fd 28 7f 48 04 vmovdqa64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7f 28 7f 48 04 vmovdqu8 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 ff 28 7f 48 04 vmovdqu16 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7e 28 7f 48 04 vmovdqu32 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 fe 28 7f 48 04 vmovdqu64 %ymm1,0x80\(%rax\) + +[a-f0-9]+: 62 f1 7d 48 6f 10 vmovdqa32 \(%rax\),%zmm2 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 f5 08 55 e9 vandnpd %xmm1,%xmm1,%xmm5 +[a-f0-9]+: 62 f1 7d 28 6f d1 vmovdqa32 %ymm1,%ymm2 -- 2.20.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] x86: Correct EVEX vector load/store optimization 2019-03-19 8:48 ` H.J. Lu @ 2019-03-19 8:52 ` Jan Beulich 0 siblings, 0 replies; 7+ messages in thread From: Jan Beulich @ 2019-03-19 8:52 UTC (permalink / raw) To: H.J. Lu; +Cc: binutils >>> On 19.03.19 at 09:48, <hjl.tools@gmail.com> wrote: > On Tue, Mar 19, 2019 at 4:30 PM Jan Beulich <JBeulich@suse.com> wrote: >> >> >>> On 19.03.19 at 07:20, <hjl.tools@gmail.com> wrote: >> > On Mon, Mar 18, 2019 at 9:49 PM Jan Beulich <JBeulich@suse.com> wrote: >> >> >> >> >>> On 17.03.19 at 21:47, <hjl.tools@gmail.com> wrote: >> >> > --- a/gas/config/tc-i386.c >> >> > +++ b/gas/config/tc-i386.c >> >> > @@ -4075,6 +4075,56 @@ optimize_encoding (void) >> >> > i.types[j].bitfield.ymmword = 0; >> >> > } >> >> > } >> >> > + else if ((cpu_arch_flags.bitfield.cpuavx >> >> > + || cpu_arch_isa_flags.bitfield.cpuavx) >> >> >> >> Once again a questionable condition, as per earlier replies to >> >> other patches of yours. >> > >> > Fixed. >> > >> >> > + && i.vec_encoding != vex_encoding_evex >> >> > + && !i.types[0].bitfield.zmmword >> >> > + && !i.mask >> >> > + && is_evex_encoding (&i.tm) >> >> > + && (i.tm.base_opcode == 0x666f >> >> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0x666f >> >> > + || i.tm.base_opcode == 0xf36f >> >> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf36f >> >> > + || i.tm.base_opcode == 0xf26f >> >> > + || (i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) >> >> >> >> All three of these can be expressed with just a single comparison, >> >> using & or | instead of ^ and (if necessary) adjusting the literal >> >> value compared against. >> > >> > Fixed. >> > >> >> > + && i.tm.extension_opcode == None) >> >> > + { >> >> > + /* Optimize: -O1: >> >> > + VOP, one of vmovdqa32, vmovdqa64, vmovdqu8, vmovdqu16, >> >> > + vmovdqu32 and vmovdqu64: >> >> > + EVEX VOP %xmmM, %xmmN >> >> > + -> VEX vmovdqa|vmovdqu %xmmM, %xmmN (M and N < 16) >> >> > + EVEX VOP %ymmM, %ymmN >> >> > + -> VEX vmovdqa|vmovdqu %ymmM, %ymmN (M and N < 16) >> >> > + EVEX VOP %xmmM, mem >> >> > + -> VEX vmovdqa|vmovdqu %xmmM, mem (M < 16) >> >> > + EVEX VOP %ymmM, mem >> >> > + -> VEX vmovdqa|vmovdqu %ymmM, mem (M < 16) >> >> > + EVEX VOP mem, %xmmN >> >> > + -> VEX mvmovdqa|vmovdquem, %xmmN (N < 16) >> >> >> >> There's some confusion on this line. >> >> >> >> > + EVEX VOP mem, %ymmN >> >> > + -> VEX vmovdqa|vmovdqu mem, %ymmN (N < 16) >> >> > + */ >> >> >> >> For the variants with a memory operand I doubt the conversion >> >> is always a win, and it may be against the user request in case of >> >> -Os. This is because of the Disp8 scaling the EVEX encoding permits. >> > >> > Fixed. >> > >> >> > + if (i.tm.base_opcode == 0xf26f) >> >> > + i.tm.base_opcode = 0xf36f; >> >> > + else if ((i.tm.base_opcode ^ Opcode_SIMD_IntD) == 0xf26f) >> >> > + i.tm.base_opcode = 0xf36f ^ Opcode_SIMD_IntD; >> >> >> >> This again can be expressed without "else if()" afaict. >> >> >> > >> > Fixed. >> > >> > Here is the patch. >> >> Thanks. >> >> >--- a/gas/config/tc-i386.c >> >+++ b/gas/config/tc-i386.c >> >@@ -4068,18 +4068,14 @@ optimize_encoding (void) >> > i.types[j].bitfield.ymmword = 0; >> > } >> > } >> >- else if ((cpu_arch_flags.bitfield.cpuavx >> >- || cpu_arch_isa_flags.bitfield.cpuavx) >> >- && i.vec_encoding != vex_encoding_evex >> >+ else if (i.vec_encoding != vex_encoding_evex >> > && !i.types[0].bitfield.zmmword >> >> Ah, here the remaining cpuavx goes away as well. >> >> >+ if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f) >> >+ { >> >+ i.tm.base_opcode &= Opcode_SIMD_IntD; >> >+ i.tm.base_opcode |= 0xf36f; >> >+ } >> >> How about the even simpler >> >> if ((i.tm.base_opcode & ~Opcode_SIMD_IntD) == 0xf26f) >> i.tm.base_opcode ^= 0xf36f ^ 0xf26f; >> > > It works. > > I am going to check in this patch together with other 2. > > Thanks. Thank you as well. Jan ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-03-19 8:52 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-15 23:58 [PATCH] x86: Optimize EVEX vector load/store instructions H.J. Lu 2019-03-17 20:47 ` V2 " H.J. Lu 2019-03-18 13:49 ` Jan Beulich 2019-03-19 6:21 ` [PATCH] x86: Correct EVEX vector load/store optimization H.J. Lu 2019-03-19 8:30 ` Jan Beulich 2019-03-19 8:48 ` H.J. Lu 2019-03-19 8:52 ` Jan Beulich
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).