From: "Roger Sayle" <roger@nextmovesoftware.com>
To: <gcc-patches@gcc.gnu.org>
Cc: "'Hongtao Liu'" <crazylht@gmail.com>,
"'Uros Bizjak'" <ubizjak@gmail.com>
Subject: [x86 PATCH] PR target/106060: Improved SSE vector constant materialization.
Date: Tue, 16 Jan 2024 21:59:18 -0000 [thread overview]
Message-ID: <031901da48c7$42c37b10$c84a7130$@nextmovesoftware.com> (raw)
[-- Attachment #1: Type: text/plain, Size: 3015 bytes --]
I thought I'd just missed the bug fixing season of stage3, but there
appears to a little latitude in early stage4 (for vector patches), so
I'll post this now.
This patch resolves PR target/106060 by providing efficient methods for
materializing/synthesizing special "vector" constants on x86. Currently
there are three methods of materializing a vector constant; the most
general is to load a vector from the constant pool, secondly "duplicated"
constants can be synthesized by moving an integer between units and
broadcasting (or shuffling it), and finally the special cases of the
all-zeros vector and all-ones vectors can be loaded via a single SSE
instruction. This patch handles additional cases that can be synthesized
in two instructions, loading an all-ones vector followed by another SSE
instruction. Following my recent patch for PR target/112992, there's
conveniently a single place in i386-expand.cc where these special cases
can be handled.
Two examples are given in the original bugzilla PR for 106060.
__m256i
should_be_cmpeq_abs ()
{
return _mm256_set1_epi8 (1);
}
is now generated (with -O3 -march=x86-64-v3) as:
vpcmpeqd %ymm0, %ymm0, %ymm0
vpabsb %ymm0, %ymm0
ret
and
__m256i
should_be_cmpeq_add ()
{
return _mm256_set1_epi8 (-2);
}
is now generated as:
vpcmpeqd %ymm0, %ymm0, %ymm0
vpaddb %ymm0, %ymm0, %ymm0
ret
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures. Ok for mainline?
2024-01-16 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
PR target/106060
* config/i386/i386-expand.cc (enum ix86_vec_bcast_alg): New.
(struct ix86_vec_bcast_map_simode_t): New type for table below.
(ix86_vec_bcast_map_simode): Table of SImode constants that may
be efficiently synthesized by a ix86_vec_bcast_alg method.
(ix86_vec_bcast_map_simode_cmp): New comparator for bsearch.
(ix86_vector_duplicate_simode_const): Efficiently synthesize
V4SImode and V8SImode constants that duplicate special constants.
(ix86_vector_duplicate_value): Attempt to synthesize "special"
vector constants using ix86_vector_duplicate_simode_const.
* config/i386/i386.cc (ix86_rtx_costs) <case ABS>: ABS of a
vector integer mode costs with a single SSE instruction.
gcc/testsuite/ChangeLog
PR target/106060
* gcc.target/i386/auto-init-8.c: Update test case.
* gcc.target/i386/avx512fp16-3.c: Likewise.
* gcc.target/i386/pr100865-9a.c: Likewise.
* gcc.target/i386/pr106060-1.c: New test case.
* gcc.target/i386/pr106060-2.c: Likewise.
* gcc.target/i386/pr106060-3.c: Likewise.
* gcc.target/i386/pr70314-3.c: Update test case.
* gcc.target/i386/vect-shiftv4qi.c: Likewise.
* gcc.target/i386/vect-shiftv8qi.c: Likewise.
Thanks in advance,
Roger
--
[-- Attachment #2: patchvc.txt --]
[-- Type: text/plain, Size: 15583 bytes --]
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 52754e1..f8f8af6 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -15638,6 +15638,288 @@ s4fma_expand:
gcc_unreachable ();
}
+/* See below where shifts are handled for explanation of this enum. */
+enum ix86_vec_bcast_alg
+{
+ VEC_BCAST_PXOR,
+ VEC_BCAST_PCMPEQ,
+ VEC_BCAST_PABSB,
+ VEC_BCAST_PADDB,
+ VEC_BCAST_PSRLW,
+ VEC_BCAST_PSRLD,
+ VEC_BCAST_PSLLW,
+ VEC_BCAST_PSLLD
+};
+
+struct ix86_vec_bcast_map_simode_t
+{
+ unsigned int key;
+ enum ix86_vec_bcast_alg alg;
+ unsigned int arg;
+};
+
+/* This table must be kept sorted as values are looked-up using bsearch. */
+static const ix86_vec_bcast_map_simode_t ix86_vec_bcast_map_simode[] = {
+ { 0x00000000, VEC_BCAST_PXOR, 0 },
+ { 0x00000001, VEC_BCAST_PSRLD, 31 },
+ { 0x00000003, VEC_BCAST_PSRLD, 30 },
+ { 0x00000007, VEC_BCAST_PSRLD, 29 },
+ { 0x0000000f, VEC_BCAST_PSRLD, 28 },
+ { 0x0000001f, VEC_BCAST_PSRLD, 27 },
+ { 0x0000003f, VEC_BCAST_PSRLD, 26 },
+ { 0x0000007f, VEC_BCAST_PSRLD, 25 },
+ { 0x000000ff, VEC_BCAST_PSRLD, 24 },
+ { 0x000001ff, VEC_BCAST_PSRLD, 23 },
+ { 0x000003ff, VEC_BCAST_PSRLD, 22 },
+ { 0x000007ff, VEC_BCAST_PSRLD, 21 },
+ { 0x00000fff, VEC_BCAST_PSRLD, 20 },
+ { 0x00001fff, VEC_BCAST_PSRLD, 19 },
+ { 0x00003fff, VEC_BCAST_PSRLD, 18 },
+ { 0x00007fff, VEC_BCAST_PSRLD, 17 },
+ { 0x0000ffff, VEC_BCAST_PSRLD, 16 },
+ { 0x00010001, VEC_BCAST_PSRLW, 15 },
+ { 0x0001ffff, VEC_BCAST_PSRLD, 15 },
+ { 0x00030003, VEC_BCAST_PSRLW, 14 },
+ { 0x0003ffff, VEC_BCAST_PSRLD, 14 },
+ { 0x00070007, VEC_BCAST_PSRLW, 13 },
+ { 0x0007ffff, VEC_BCAST_PSRLD, 13 },
+ { 0x000f000f, VEC_BCAST_PSRLW, 12 },
+ { 0x000fffff, VEC_BCAST_PSRLD, 12 },
+ { 0x001f001f, VEC_BCAST_PSRLW, 11 },
+ { 0x001fffff, VEC_BCAST_PSRLD, 11 },
+ { 0x003f003f, VEC_BCAST_PSRLW, 10 },
+ { 0x003fffff, VEC_BCAST_PSRLD, 10 },
+ { 0x007f007f, VEC_BCAST_PSRLW, 9 },
+ { 0x007fffff, VEC_BCAST_PSRLD, 9 },
+ { 0x00ff00ff, VEC_BCAST_PSRLW, 8 },
+ { 0x00ffffff, VEC_BCAST_PSRLD, 8 },
+ { 0x01010101, VEC_BCAST_PABSB, 0 },
+ { 0x01ff01ff, VEC_BCAST_PSRLW, 7 },
+ { 0x01ffffff, VEC_BCAST_PSRLD, 7 },
+ { 0x03ff03ff, VEC_BCAST_PSRLW, 6 },
+ { 0x03ffffff, VEC_BCAST_PSRLD, 6 },
+ { 0x07ff07ff, VEC_BCAST_PSRLW, 5 },
+ { 0x07ffffff, VEC_BCAST_PSRLD, 5 },
+ { 0x0fff0fff, VEC_BCAST_PSRLW, 4 },
+ { 0x0fffffff, VEC_BCAST_PSRLD, 4 },
+ { 0x1fff1fff, VEC_BCAST_PSRLW, 3 },
+ { 0x1fffffff, VEC_BCAST_PSRLD, 3 },
+ { 0x3fff3fff, VEC_BCAST_PSRLW, 2 },
+ { 0x3fffffff, VEC_BCAST_PSRLD, 2 },
+ { 0x7fff7fff, VEC_BCAST_PSRLW, 1 },
+ { 0x7fffffff, VEC_BCAST_PSRLD, 1 },
+ { 0x80000000, VEC_BCAST_PSLLD, 31 },
+ { 0x80008000, VEC_BCAST_PSLLW, 15 },
+ { 0xc0000000, VEC_BCAST_PSLLD, 30 },
+ { 0xc000c000, VEC_BCAST_PSLLW, 14 },
+ { 0xe0000000, VEC_BCAST_PSLLD, 29 },
+ { 0xe000e000, VEC_BCAST_PSLLW, 13 },
+ { 0xf0000000, VEC_BCAST_PSLLD, 28 },
+ { 0xf000f000, VEC_BCAST_PSLLW, 12 },
+ { 0xf8000000, VEC_BCAST_PSLLD, 27 },
+ { 0xf800f800, VEC_BCAST_PSLLW, 11 },
+ { 0xfc000000, VEC_BCAST_PSLLD, 26 },
+ { 0xfc00fc00, VEC_BCAST_PSLLW, 10 },
+ { 0xfe000000, VEC_BCAST_PSLLD, 25 },
+ { 0xfe00fe00, VEC_BCAST_PSLLW, 9 },
+ { 0xfefefefe, VEC_BCAST_PADDB, 0 },
+ { 0xff000000, VEC_BCAST_PSLLD, 24 },
+ { 0xff00ff00, VEC_BCAST_PSLLW, 8 },
+ { 0xff800000, VEC_BCAST_PSLLD, 23 },
+ { 0xff80ff80, VEC_BCAST_PSLLW, 7 },
+ { 0xffc00000, VEC_BCAST_PSLLD, 22 },
+ { 0xffc0ffc0, VEC_BCAST_PSLLW, 6 },
+ { 0xffe00000, VEC_BCAST_PSLLD, 21 },
+ { 0xffe0ffe0, VEC_BCAST_PSLLW, 5 },
+ { 0xfff00000, VEC_BCAST_PSLLD, 20 },
+ { 0xfff0fff0, VEC_BCAST_PSLLW, 4 },
+ { 0xfff80000, VEC_BCAST_PSLLD, 19 },
+ { 0xfff8fff8, VEC_BCAST_PSLLW, 3 },
+ { 0xfffc0000, VEC_BCAST_PSLLD, 18 },
+ { 0xfffcfffc, VEC_BCAST_PSLLW, 2 },
+ { 0xfffe0000, VEC_BCAST_PSLLD, 17 },
+ { 0xfffefffe, VEC_BCAST_PSLLW, 1 },
+ { 0xffff0000, VEC_BCAST_PSLLD, 16 },
+ { 0xffff8000, VEC_BCAST_PSLLD, 15 },
+ { 0xffffc000, VEC_BCAST_PSLLD, 14 },
+ { 0xffffe000, VEC_BCAST_PSLLD, 13 },
+ { 0xfffff000, VEC_BCAST_PSLLD, 12 },
+ { 0xfffff800, VEC_BCAST_PSLLD, 11 },
+ { 0xfffffc00, VEC_BCAST_PSLLD, 10 },
+ { 0xfffffe00, VEC_BCAST_PSLLD, 9 },
+ { 0xffffff00, VEC_BCAST_PSLLD, 8 },
+ { 0xffffff80, VEC_BCAST_PSLLD, 7 },
+ { 0xffffffc0, VEC_BCAST_PSLLD, 6 },
+ { 0xffffffe0, VEC_BCAST_PSLLD, 5 },
+ { 0xfffffff0, VEC_BCAST_PSLLD, 4 },
+ { 0xfffffff8, VEC_BCAST_PSLLD, 3 },
+ { 0xfffffffc, VEC_BCAST_PSLLD, 2 },
+ { 0xfffffffe, VEC_BCAST_PSLLD, 1 },
+ { 0xffffffff, VEC_BCAST_PCMPEQ, 0 }
+};
+
+/* Comparator for bsearch on ix86_vec_bcast_map. */
+static int
+ix86_vec_bcast_map_simode_cmp (const void *key, const void *entry)
+{
+ return (*(const unsigned int*)key)
+ - ((const ix86_vec_bcast_map_simode_t*)entry)->key;
+}
+
+/* A subroutine of ix86_vector_duplicate_value. Tries to efficiently
+ materialize V4SImode and V8SImode vectors from SImode integer
+ constants. */
+static bool
+ix86_vector_duplicate_simode_const (machine_mode mode, rtx target,
+ unsigned int val)
+{
+ const ix86_vec_bcast_map_simode_t *entry;
+ rtx tmp1, tmp2;
+
+ entry = (const ix86_vec_bcast_map_simode_t*)
+ bsearch(&val, ix86_vec_bcast_map_simode,
+ ARRAY_SIZE (ix86_vec_bcast_map_simode),
+ sizeof (ix86_vec_bcast_map_simode_t),
+ ix86_vec_bcast_map_simode_cmp);
+ if (!entry)
+ return false;
+
+ switch (entry->alg)
+ {
+ case VEC_BCAST_PXOR:
+ if (mode == V8SImode && !TARGET_AVX2)
+ return false;
+ emit_move_insn (target, CONST0_RTX (mode));
+ return true;
+
+ case VEC_BCAST_PCMPEQ:
+ if ((mode == V4SImode && !TARGET_SSE2)
+ || (mode == V8SImode && !TARGET_AVX2))
+ return false;
+ emit_move_insn (target, CONSTM1_RTX (mode));
+ return true;
+
+ case VEC_BCAST_PABSB:
+ if (mode == V4SImode)
+ {
+ tmp1 = gen_reg_rtx (V16QImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V16QImode));
+ tmp2 = gen_reg_rtx (V16QImode);
+ emit_insn (gen_absv16qi2 (tmp2, tmp1));
+ }
+ else if (mode == V8SImode && TARGET_AVX2)
+ {
+ tmp1 = gen_reg_rtx (V32QImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V32QImode));
+ tmp2 = gen_reg_rtx (V32QImode);
+ emit_insn (gen_absv32qi2 (tmp2, tmp1));
+ }
+ else
+ return false;
+ break;
+
+ case VEC_BCAST_PADDB:
+ if (mode == V4SImode)
+ {
+ tmp1 = gen_reg_rtx (V16QImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V16QImode));
+ tmp2 = gen_reg_rtx (V16QImode);
+ emit_insn (gen_addv16qi3 (tmp2, tmp1, tmp1));
+ }
+ else if (mode == V8SImode && TARGET_AVX2)
+ {
+ tmp1 = gen_reg_rtx (V32QImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V32QImode));
+ tmp2 = gen_reg_rtx (V32QImode);
+ emit_insn (gen_addv32qi3 (tmp2, tmp1, tmp1));
+ }
+ else
+ return false;
+ break;
+
+ case VEC_BCAST_PSRLW:
+ if (mode == V4SImode)
+ {
+ tmp1 = gen_reg_rtx (V8HImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V8HImode));
+ tmp2 = gen_reg_rtx (V8HImode);
+ emit_insn (gen_lshrv8hi3 (tmp2, tmp1, GEN_INT (entry->arg)));
+ }
+ else if (mode == V8SImode && TARGET_AVX2)
+ {
+ tmp1 = gen_reg_rtx (V16HImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V16HImode));
+ tmp2 = gen_reg_rtx (V16HImode);
+ emit_insn (gen_lshrv16hi3 (tmp2, tmp1, GEN_INT (entry->arg)));
+ }
+ else
+ return false;
+ break;
+
+ case VEC_BCAST_PSRLD:
+ if (mode == V4SImode)
+ {
+ tmp1 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V4SImode));
+ emit_insn (gen_lshrv4si3 (target, tmp1, GEN_INT (entry->arg)));
+ return true;
+ }
+ else if (mode == V8SImode && TARGET_AVX2)
+ {
+ tmp1 = gen_reg_rtx (V8SImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V8SImode));
+ emit_insn (gen_lshrv8si3 (target, tmp1, GEN_INT (entry->arg)));
+ return true;
+ }
+ else
+ return false;
+ break;
+
+ case VEC_BCAST_PSLLW:
+ if (mode == V4SImode)
+ {
+ tmp1 = gen_reg_rtx (V8HImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V8HImode));
+ tmp2 = gen_reg_rtx (V8HImode);
+ emit_insn (gen_ashlv8hi3 (tmp2, tmp1, GEN_INT (entry->arg)));
+ }
+ else if (mode == V8SImode && TARGET_AVX2)
+ {
+ tmp1 = gen_reg_rtx (V16HImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V16HImode));
+ tmp2 = gen_reg_rtx (V16HImode);
+ emit_insn (gen_ashlv16hi3 (tmp2, tmp1, GEN_INT (entry->arg)));
+ }
+ else
+ return false;
+ break;
+
+ case VEC_BCAST_PSLLD:
+ if (mode == V4SImode)
+ {
+ tmp1 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V4SImode));
+ emit_insn (gen_ashlv4si3 (target, tmp1, GEN_INT (entry->arg)));
+ return true;
+ }
+ else if (mode == V8SImode && TARGET_AVX2)
+ {
+ tmp1 = gen_reg_rtx (V8SImode);
+ emit_move_insn (tmp1, CONSTM1_RTX (V8SImode));
+ emit_insn (gen_ashlv8si3 (target, tmp1, GEN_INT (entry->arg)));
+ return true;
+ }
+ else
+ return false;
+
+ default:
+ return false;
+ }
+
+ emit_move_insn (target, gen_lowpart (mode, tmp2));
+ return true;
+}
+
/* A subroutine of ix86_expand_vector_init_duplicate. Tries to
fill target with val via vec_duplicate. */
@@ -15647,6 +15929,12 @@ ix86_vector_duplicate_value (machine_mode mode, rtx target, rtx val)
bool ok;
rtx_insn *insn;
rtx dup;
+
+ if ((mode == V4SImode || mode == V8SImode)
+ && CONST_INT_P (val)
+ && ix86_vector_duplicate_simode_const (mode, target, INTVAL (val)))
+ return true;
+
/* Save/restore recog_data in case this is called from splitters
or other routines where recog_data needs to stay valid across
force_reg. See PR106577. */
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 8010532..da4a6dd 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -22076,6 +22076,8 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno,
*total = cost->fabs;
else if (FLOAT_MODE_P (mode))
*total = ix86_vec_cost (mode, cost->sse_op);
+ else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ *total = cost->sse_op;
return false;
case SQRT:
diff --git a/gcc/testsuite/gcc.target/i386/auto-init-8.c b/gcc/testsuite/gcc.target/i386/auto-init-8.c
index 7023d72..666ee14 100644
--- a/gcc/testsuite/gcc.target/i386/auto-init-8.c
+++ b/gcc/testsuite/gcc.target/i386/auto-init-8.c
@@ -29,7 +29,7 @@ double foo()
return result;
}
-/* { dg-final { scan-rtl-dump-times "0xfffffffffefefefe" 3 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "0xfffffffffefefefe" 1 "expand" } } */
/* { dg-final { scan-rtl-dump-times "\\\[0xfefefefefefefefe\\\]" 2 "expand" } } */
/* { dg-final { scan-rtl-dump-times "0xfffffffffffffffe\\\]\\\) repeated x16" 2 "expand" } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c
index f431b8a..9902c81 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-13.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-13.c
@@ -126,7 +126,6 @@ abs256_ph (__m256h a)
return _mm256_abs_ph (a);
}
-/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%ymm\[0-9\]+" 1 { target { ! ia32 } } } } */
/* { dg-final { scan-assembler-times "vpand\[^\n\]*%ymm\[0-9\]+" 1 } } */
__m128h
@@ -136,5 +135,4 @@ abs_ph (__m128h a)
return _mm_abs_ph (a);
}
-/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%xmm\[0-9\]+" 1 { target { ! ia32 } } } } */
/* { dg-final { scan-assembler-times "vpand\[^\n\]*%xmm\[0-9\]+" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr100865-9a.c b/gcc/testsuite/gcc.target/i386/pr100865-9a.c
index f2ac1bd..91cfeda 100644
--- a/gcc/testsuite/gcc.target/i386/pr100865-9a.c
+++ b/gcc/testsuite/gcc.target/i386/pr100865-9a.c
@@ -18,7 +18,7 @@ foo (void)
{
int i;
for (i = 0; i < sizeof (array) / sizeof (array[0]); i++)
- array[i] = MK_CONST128_BROADCAST (0x1fff);
+ array[i] = MK_CONST128_BROADCAST (0x1234);
}
/* { dg-final { scan-assembler-times "vpbroadcastd\[\\t \]+%xmm\[0-9\]+, %xmm\[0-9\]+" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr106060-1.c b/gcc/testsuite/gcc.target/i386/pr106060-1.c
new file mode 100644
index 0000000..a734d56
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106060-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=x86-64-v3" } */
+#include <immintrin.h>
+
+__m256i
+foo ()
+{
+ /* shouldnt_have_movabs */
+ return _mm256_set1_epi8 (123);
+}
+
+/* { dg-final { scan-assembler-not "movabs" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr106060-2.c b/gcc/testsuite/gcc.target/i386/pr106060-2.c
new file mode 100644
index 0000000..23933ab
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106060-2.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=x86-64-v3" } */
+#include <immintrin.h>
+
+__m256i
+foo ()
+{
+ /* should_be_cmpeq_abs */
+ return _mm256_set1_epi8 (1);
+}
+
+/* { dg-final { scan-assembler "pcmpeq" } } */
+/* { dg-final { scan-assembler "pabsb" } } */
diff --git a/gcc/testsuite/gcc.target/i386/pr106060-3.c b/gcc/testsuite/gcc.target/i386/pr106060-3.c
new file mode 100644
index 0000000..59c128c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr106060-3.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -march=x86-64-v3" } */
+#include <immintrin.h>
+
+__m256i
+foo ()
+{
+ /* should_be_cmpeq_add */
+ return _mm256_set1_epi8 (-2);
+}
+
+/* { dg-final { scan-assembler "pcmpeq" } } */
+/* { dg-final { scan-assembler "paddb" } } */
+
diff --git a/gcc/testsuite/gcc.target/i386/pr70314.c b/gcc/testsuite/gcc.target/i386/pr70314.c
index aad8dd9..181d2b4 100644
--- a/gcc/testsuite/gcc.target/i386/pr70314.c
+++ b/gcc/testsuite/gcc.target/i386/pr70314.c
@@ -1,6 +1,6 @@
/* { dg-do compile } */
/* { dg-options "-march=skylake-avx512 -O2" } */
-/* { dg-final { scan-assembler-times "cmp" 2 } } */
+/* { dg-final { scan-assembler-times "cmp\[dq\]" 2 } } */
/* { dg-final { scan-assembler-not "and" } } */
typedef long vec __attribute__((vector_size(16)));
diff --git a/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c b/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c
index c6a6390..b7e45c2 100644
--- a/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c
+++ b/gcc/testsuite/gcc.target/i386/vect-shiftv4qi.c
@@ -28,7 +28,7 @@ __vu srl_c (__vu a)
return a >> 5;
}
-/* { dg-final { scan-assembler-times "psrlw" 2 } } */
+/* { dg-final { scan-assembler-times "psrlw" 5 } } */
__vi sra (__vi a, int n)
{
diff --git a/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c b/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c
index 244b0db..2471e6e 100644
--- a/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c
+++ b/gcc/testsuite/gcc.target/i386/vect-shiftv8qi.c
@@ -28,7 +28,7 @@ __vu srl_c (__vu a)
return a >> 5;
}
-/* { dg-final { scan-assembler-times "psrlw" 2 } } */
+/* { dg-final { scan-assembler-times "psrlw" 5 } } */
__vi sra (__vi a, int n)
{
next reply other threads:[~2024-01-16 21:59 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-16 21:59 Roger Sayle [this message]
2024-01-17 3:13 ` Hongtao Liu
2024-01-25 19:03 ` Roger Sayle
2024-01-26 1:34 ` Hongtao Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='031901da48c7$42c37b10$c84a7130$@nextmovesoftware.com' \
--to=roger@nextmovesoftware.com \
--cc=crazylht@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=ubizjak@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).