* [PATCH] Fix and improve avx2 broadcasts (PR target/63594)
@ 2014-10-21 16:13 Jakub Jelinek
2014-10-22 6:17 ` Uros Bizjak
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: Jakub Jelinek @ 2014-10-21 16:13 UTC (permalink / raw)
To: Uros Bizjak, Kirill Yukhin; +Cc: gcc-patches
Hi!
This patch fixes a bunch of recent regressions:
FAIL: gcc.target/i386/avx-1.c (internal compiler error)
FAIL: gcc.target/i386/avx-1.c (test for excess errors)
FAIL: gcc.target/i386/avx-2.c (internal compiler error)
FAIL: gcc.target/i386/avx-2.c (test for excess errors)
FAIL: gcc.target/i386/avx512f-vec-init.c (internal compiler error)
FAIL: gcc.target/i386/avx512f-vec-init.c (test for excess errors)
UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vbroadcastsd 1
UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vbroadcastss 1
UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vmovdqa64[ \\\\t]+%zmm 2
UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastb 2
UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastd 1
UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastq 1
UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastw 2
FAIL: gcc.target/i386/sse-14.c (internal compiler error)
FAIL: gcc.target/i386/sse-14.c (test for excess errors)
FAIL: gcc.target/i386/sse-22.c (internal compiler error)
FAIL: gcc.target/i386/sse-22.c (test for excess errors)
FAIL: gcc.target/i386/sse-22a.c (internal compiler error)
FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
FAIL: gcc.target/i386/sse-23.c (internal compiler error)
FAIL: gcc.target/i386/sse-23.c (test for excess errors)
FAIL: gcc.target/i386/sse-24.c (internal compiler error)
FAIL: gcc.target/i386/sse-24.c (test for excess errors)
and improves quality of code generated for AVX2 and AVX512F broadcasts;
as AVX2 broadcast instructions can have source in memory or vector register
(but only AVX512F can have it in GPRs), the patch adds splitter for the
GPR case and adds ! for that, so that RA can choose what is best and if
broadcast from GPR is desirable, it first performs vmovd from GPR into
the dest register and then vpbroadcast{b,w,d} it.
The AVX512* patterns should be IMHO merged, so that whether GPR or MEM is used
are just alternatives of the same define_insn rather than different define_insns,
but am not changing that right now, will leave that to Kirill as a follow-up.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
2014-10-21 Jakub Jelinek <jakub@redhat.com>
PR target/63594
* config/i386/i386.c (ix86_expand_vector_init_duplicate): For
V{8HI,16QI,16HI,32QI}mode call ix86_vector_duplicate_value
even for just TARGET_AVX2, not only for
TARGET_AVX512VL && TARGET_AVX512BW. For V{32HI,64QI}mode,
call ix86_vector_duplicate_value only if TARGET_AVX512BW,
otherwise build it using concatenation of 256-bit
broadcast.
* config/i386/sse.md (AVX_VEC_DUP_MODE): Moved after
avx512 broadcast patterns.
(vec_dup<mode>): Likewise. For avx2 use
v<sseintprefix>broadcast<bcstscalarsuff> instead of
vbroadcast<ssescalarmodesuffix>.
(AVX2_VEC_DUP_MODE): New mode iterator.
(*vec_dup<mode>): New TARGET_AVX2 define_insn with
AVX2_VEC_DUP_MODE iterator, add a splitter for that.
* gcc.dg/pr63594-1.c: New test.
* gcc.dg/pr63594-2.c: New test.
* gcc.target/i386/sse2-pr63594-1.c: New test.
* gcc.target/i386/sse2-pr63594-2.c: New test.
* gcc.target/i386/avx-pr63594-1.c: New test.
* gcc.target/i386/avx-pr63594-2.c: New test.
* gcc.target/i386/avx2-pr63594-1.c: New test.
* gcc.target/i386/avx2-pr63594-2.c: New test.
* gcc.target/i386/avx512f-pr63594-1.c: New test.
* gcc.target/i386/avx512f-pr63594-2.c: New test.
* gcc.target/i386/avx512f-vec-init.c: Adjust expected
insn counts.
--- gcc/config/i386/i386.c.jj 2014-10-21 13:59:39.102650495 +0200
+++ gcc/config/i386/i386.c 2014-10-21 14:35:54.941980175 +0200
@@ -39855,8 +39855,6 @@ ix86_expand_vector_init_duplicate (bool
case V8SFmode:
case V8SImode:
case V2DFmode:
- case V64QImode:
- case V32HImode:
case V2DImode:
case V4SFmode:
case V4SImode:
@@ -39887,8 +39885,8 @@ ix86_expand_vector_init_duplicate (bool
goto widen;
case V8HImode:
- if (TARGET_AVX512VL && TARGET_AVX512BW)
- return ix86_vector_duplicate_value (mode, target, val);
+ if (TARGET_AVX2)
+ return ix86_vector_duplicate_value (mode, target, val);
if (TARGET_SSE2)
{
@@ -39920,8 +39918,8 @@ ix86_expand_vector_init_duplicate (bool
goto widen;
case V16QImode:
- if (TARGET_AVX512VL && TARGET_AVX512BW)
- return ix86_vector_duplicate_value (mode, target, val);
+ if (TARGET_AVX2)
+ return ix86_vector_duplicate_value (mode, target, val);
if (TARGET_SSE2)
goto permute;
@@ -39952,14 +39950,31 @@ ix86_expand_vector_init_duplicate (bool
case V16HImode:
case V32QImode:
- if (TARGET_AVX512VL && TARGET_AVX512BW)
- return ix86_vector_duplicate_value (mode, target, val);
+ if (TARGET_AVX2)
+ return ix86_vector_duplicate_value (mode, target, val);
else
{
enum machine_mode hvmode = (mode == V16HImode ? V8HImode : V16QImode);
rtx x = gen_reg_rtx (hvmode);
ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val);
+ gcc_assert (ok);
+
+ x = gen_rtx_VEC_CONCAT (mode, x, x);
+ emit_insn (gen_rtx_SET (VOIDmode, target, x));
+ }
+ return true;
+
+ case V64QImode:
+ case V32HImode:
+ if (TARGET_AVX512BW)
+ return ix86_vector_duplicate_value (mode, target, val);
+ else
+ {
+ enum machine_mode hvmode = (mode == V32HImode ? V16HImode : V32QImode);
+ rtx x = gen_reg_rtx (hvmode);
+
+ ok = ix86_expand_vector_init_duplicate (false, hvmode, x, val);
gcc_assert (ok);
x = gen_rtx_VEC_CONCAT (mode, x, x);
--- gcc/config/i386/sse.md.jj 2014-10-21 11:51:30.976626802 +0200
+++ gcc/config/i386/sse.md 2014-10-21 14:38:20.690228844 +0200
@@ -16523,25 +16523,6 @@ (define_insn "avx2_vec_dupv4df"
(set_attr "prefix" "vex")
(set_attr "mode" "V4DF")])
-;; Modes handled by AVX vec_dup patterns.
-(define_mode_iterator AVX_VEC_DUP_MODE
- [V8SI V8SF V4DI V4DF])
-
-(define_insn "vec_dup<mode>"
- [(set (match_operand:AVX_VEC_DUP_MODE 0 "register_operand" "=x,v,x")
- (vec_duplicate:AVX_VEC_DUP_MODE
- (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "m,v,?x")))]
- "TARGET_AVX"
- "@
- vbroadcast<ssescalarmodesuffix>\t{%1, %0|%0, %1}
- vbroadcast<ssescalarmodesuffix>\t{%x1, %0|%0, %x1}
- #"
- [(set_attr "type" "ssemov")
- (set_attr "prefix_extra" "1")
- (set_attr "prefix" "maybe_evex")
- (set_attr "isa" "*,avx2,noavx2")
- (set_attr "mode" "V8SF")])
-
(define_insn "<avx512>_vec_dup<mode><mask_name>"
[(set (match_operand:V48_AVX512VL 0 "register_operand" "=v")
(vec_duplicate:V48_AVX512VL
@@ -16644,6 +16625,59 @@ (define_insn "avx2_vbroadcasti128_<mode>
(set_attr "prefix" "vex")
(set_attr "mode" "OI")])
+;; Modes handled by AVX vec_dup patterns.
+(define_mode_iterator AVX_VEC_DUP_MODE
+ [V8SI V8SF V4DI V4DF])
+;; Modes handled by AVX2 vec_dup patterns.
+(define_mode_iterator AVX2_VEC_DUP_MODE
+ [V32QI V16QI V16HI V8HI V8SI V4SI])
+
+(define_insn "*vec_dup<mode>"
+ [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand" "=x,x,x")
+ (vec_duplicate:AVX2_VEC_DUP_MODE
+ (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "m,x,!r")))]
+ "TARGET_AVX2"
+ "@
+ v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0|%0, %1}
+ v<sseintprefix>broadcast<bcstscalarsuff>\t{%x1, %0|%0, %x1}
+ #"
+ [(set_attr "type" "ssemov")
+ (set_attr "prefix_extra" "1")
+ (set_attr "prefix" "maybe_evex")
+ (set_attr "mode" "<sseinsnmode>")])
+
+(define_insn "vec_dup<mode>"
+ [(set (match_operand:AVX_VEC_DUP_MODE 0 "register_operand" "=x,x,v,x")
+ (vec_duplicate:AVX_VEC_DUP_MODE
+ (match_operand:<ssescalarmode> 1 "nonimmediate_operand" "m,m,v,?x")))]
+ "TARGET_AVX"
+ "@
+ v<sseintprefix>broadcast<bcstscalarsuff>\t{%1, %0|%0, %1}
+ vbroadcast<ssescalarmodesuffix>\t{%1, %0|%0, %1}
+ v<sseintprefix>broadcast<bcstscalarsuff>\t{%x1, %0|%0, %x1}
+ #"
+ [(set_attr "type" "ssemov")
+ (set_attr "prefix_extra" "1")
+ (set_attr "prefix" "maybe_evex")
+ (set_attr "isa" "avx2,noavx2,avx2,noavx2")
+ (set_attr "mode" "<sseinsnmode>,V8SF,<sseinsnmode>,V8SF")])
+
+(define_split
+ [(set (match_operand:AVX2_VEC_DUP_MODE 0 "register_operand")
+ (vec_duplicate:AVX2_VEC_DUP_MODE
+ (match_operand:<ssescalarmode> 1 "register_operand")))]
+ "TARGET_AVX2 && reload_completed && GENERAL_REG_P (operands[1])"
+ [(const_int 0)]
+{
+ emit_insn (gen_vec_setv4si_0 (gen_lowpart (V4SImode, operands[0]),
+ CONST0_RTX (V4SImode),
+ gen_lowpart (SImode, operands[1])));
+ emit_insn (gen_avx2_pbroadcast<mode> (operands[0],
+ gen_lowpart (<ssexmmmode>mode,
+ operands[0])));
+ DONE;
+})
+
(define_split
[(set (match_operand:AVX_VEC_DUP_MODE 0 "register_operand")
(vec_duplicate:AVX_VEC_DUP_MODE
--- gcc/testsuite/gcc.dg/pr63594-1.c.jj 2014-10-21 14:49:41.756393903 +0200
+++ gcc/testsuite/gcc.dg/pr63594-1.c 2014-10-21 15:35:16.556274687 +0200
@@ -0,0 +1,65 @@
+/* PR target/63594 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -Wno-psabi" } */
+/* { dg-additional-options "-mno-mmx" { target i?86-*-linux* x86_64-*-linux* } } */
+
+#define C1 c
+#define C2 C1, C1
+#define C4 C2, C2
+#define C8 C4, C4
+#define C16 C8, C8
+#define C32 C16, C16
+#define C64 C32, C32
+#define C_(n) n
+#define C(n) C_(C##n)
+
+#define T(t,s) \
+typedef t v##t##s __attribute__ ((__vector_size__ (s * sizeof (t)))); \
+v##t##s \
+test1##t##s (t c) \
+{ \
+ v##t##s v = { C(s) }; \
+ return v; \
+} \
+ \
+v##t##s \
+test2##t##s (t *p) \
+{ \
+ t c = *p; \
+ v##t##s v = { C(s) }; \
+ return v; \
+}
+
+typedef long long llong;
+
+T(char, 64)
+T(char, 32)
+T(char, 16)
+T(char, 8)
+T(char, 4)
+T(char, 2)
+T(char, 1)
+T(short, 32)
+T(short, 16)
+T(short, 8)
+T(short, 4)
+T(short, 2)
+T(short, 1)
+T(int, 16)
+T(int, 8)
+T(int, 4)
+T(int, 2)
+T(int, 1)
+T(float, 16)
+T(float, 8)
+T(float, 4)
+T(float, 2)
+T(float, 1)
+T(llong, 8)
+T(llong, 4)
+T(llong, 2)
+T(llong, 1)
+T(double, 8)
+T(double, 4)
+T(double, 2)
+T(double, 1)
--- gcc/testsuite/gcc.dg/pr63594-2.c.jj 2014-10-21 14:51:30.562343449 +0200
+++ gcc/testsuite/gcc.dg/pr63594-2.c 2014-10-21 15:36:31.532843201 +0200
@@ -0,0 +1,92 @@
+/* PR target/63594 */
+/* { dg-do run } */
+/* { dg-options "-O2 -Wno-psabi" } */
+/* { dg-additional-options "-mno-mmx" { target i?86-*-linux* x86_64-*-linux* } } */
+
+#define C1 c
+#define C2 C1, C1
+#define C4 C2, C2
+#define C8 C4, C4
+#define C16 C8, C8
+#define C32 C16, C16
+#define C64 C32, C32
+#define C_(n) n
+#define C(n) C_(C##n)
+
+#define T(t,s) \
+typedef t v##t##s __attribute__ ((__vector_size__ (s * sizeof (t)))); \
+__attribute__((noinline, noclone)) v##t##s \
+test1##t##s (t c) \
+{ \
+ v##t##s v = { C(s) }; \
+ return v; \
+} \
+ \
+__attribute__((noinline, noclone)) v##t##s \
+test2##t##s (t *p) \
+{ \
+ t c = *p; \
+ v##t##s v = { C(s) }; \
+ return v; \
+} \
+ \
+void \
+test3##t##s (void) \
+{ \
+ t c = 17; \
+ int i; \
+ v##t##s a = test1##t##s (c); \
+ for (i = 0; i < s; i++) \
+ if (a[i] != 17) \
+ __builtin_abort (); \
+ v##t##s b = test2##t##s (&c); \
+ for (i = 0; i < s; i++) \
+ if (a[i] != 17) \
+ __builtin_abort (); \
+}
+
+typedef long long llong;
+
+#define TESTS \
+T(char, 64) \
+T(char, 32) \
+T(char, 16) \
+T(char, 8) \
+T(char, 4) \
+T(char, 2) \
+T(char, 1) \
+T(short, 32) \
+T(short, 16) \
+T(short, 8) \
+T(short, 4) \
+T(short, 2) \
+T(short, 1) \
+T(int, 16) \
+T(int, 8) \
+T(int, 4) \
+T(int, 2) \
+T(int, 1) \
+T(float, 16) \
+T(float, 8) \
+T(float, 4) \
+T(float, 2) \
+T(float, 1) \
+T(llong, 8) \
+T(llong, 4) \
+T(llong, 2) \
+T(llong, 1) \
+T(double, 8) \
+T(double, 4) \
+T(double, 2) \
+T(double, 1)
+
+TESTS
+
+int
+main ()
+{
+#undef T
+#define T(t,s) test3##t##s ();
+ TESTS
+ return 0;
+}
--- gcc/testsuite/gcc.target/i386/sse2-pr63594-1.c.jj 2014-10-21 15:41:08.081652929 +0200
+++ gcc/testsuite/gcc.target/i386/sse2-pr63594-1.c 2014-10-21 15:41:49.322893733 +0200
@@ -0,0 +1,5 @@
+/* PR target/63594 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mno-mmx -Wno-psabi" } */
+
+#include "../../gcc.dg/pr63594-1.c"
--- gcc/testsuite/gcc.target/i386/sse2-pr63594-2.c.jj 2014-10-21 15:40:13.361676458 +0200
+++ gcc/testsuite/gcc.target/i386/sse2-pr63594-2.c 2014-10-21 15:41:27.480287985 +0200
@@ -0,0 +1,18 @@
+/* PR target/63594 */
+/* { dg-do run { target sse2 } } */
+/* { dg-options "-O2 -msse2 -mno-mmx -Wno-psabi" } */
+
+#include "sse2-check.h"
+
+int do_main (void);
+
+static void
+sse2_test (void)
+{
+ do_main ();
+}
+
+#undef main
+#define main() do_main ()
+
+#include "../../gcc.dg/pr63594-2.c"
--- gcc/testsuite/gcc.target/i386/avx-pr63594-1.c.jj 2014-10-21 15:41:08.000000000 +0200
+++ gcc/testsuite/gcc.target/i386/avx-pr63594-1.c 2014-10-21 15:43:16.577240468 +0200
@@ -0,0 +1,5 @@
+/* PR target/63594 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx -mno-mmx -Wno-psabi" } */
+
+#include "../../gcc.dg/pr63594-1.c"
--- gcc/testsuite/gcc.target/i386/avx-pr63594-2.c.jj 2014-10-21 15:40:13.000000000 +0200
+++ gcc/testsuite/gcc.target/i386/avx-pr63594-2.c 2014-10-21 15:43:25.527072754 +0200
@@ -0,0 +1,18 @@
+/* PR target/63594 */
+/* { dg-do run { target avx } } */
+/* { dg-options "-O2 -mavx -mno-mmx -Wno-psabi" } */
+
+#include "avx-check.h"
+
+int do_main (void);
+
+static void
+avx_test (void)
+{
+ do_main ();
+}
+
+#undef main
+#define main() do_main ()
+
+#include "../../gcc.dg/pr63594-2.c"
--- gcc/testsuite/gcc.target/i386/avx2-pr63594-1.c.jj 2014-10-21 15:41:08.000000000 +0200
+++ gcc/testsuite/gcc.target/i386/avx2-pr63594-1.c 2014-10-21 15:44:04.167347796 +0200
@@ -0,0 +1,5 @@
+/* PR target/63594 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx2 -mno-mmx -Wno-psabi" } */
+
+#include "../../gcc.dg/pr63594-1.c"
--- gcc/testsuite/gcc.target/i386/avx2-pr63594-2.c.jj 2014-10-21 15:40:13.000000000 +0200
+++ gcc/testsuite/gcc.target/i386/avx2-pr63594-2.c 2014-10-21 15:44:11.155210402 +0200
@@ -0,0 +1,18 @@
+/* PR target/63594 */
+/* { dg-do run { target avx2 } } */
+/* { dg-options "-O2 -mavx2 -mno-mmx -Wno-psabi" } */
+
+#include "avx2-check.h"
+
+int do_main (void);
+
+static void
+avx2_test (void)
+{
+ do_main ();
+}
+
+#undef main
+#define main() do_main ()
+
+#include "../../gcc.dg/pr63594-2.c"
--- gcc/testsuite/gcc.target/i386/avx512f-pr63594-1.c.jj 2014-10-21 15:41:08.000000000 +0200
+++ gcc/testsuite/gcc.target/i386/avx512f-pr63594-1.c 2014-10-21 15:45:26.997790887 +0200
@@ -0,0 +1,5 @@
+/* PR target/63594 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512f -mno-mmx -Wno-psabi" } */
+
+#include "../../gcc.dg/pr63594-1.c"
--- gcc/testsuite/gcc.target/i386/avx512f-pr63594-2.c.jj 2014-10-21 15:40:13.000000000 +0200
+++ gcc/testsuite/gcc.target/i386/avx512f-pr63594-2.c 2014-10-21 15:45:45.048455116 +0200
@@ -0,0 +1,18 @@
+/* PR target/63594 */
+/* { dg-do run { target avx512f } } */
+/* { dg-options "-O2 -mavx512f -mno-mmx -Wno-psabi" } */
+
+#include "avx512f-check.h"
+
+int do_main (void);
+
+static void
+avx512f_test (void)
+{
+ do_main ();
+}
+
+#undef main
+#define main() do_main ()
+
+#include "../../gcc.dg/pr63594-2.c"
--- gcc/testsuite/gcc.target/i386/avx512f-vec-init.c.jj 2014-01-14 09:59:05.000000000 +0100
+++ gcc/testsuite/gcc.target/i386/avx512f-vec-init.c 2014-10-21 17:43:03.000000000 +0200
@@ -1,12 +1,12 @@
/* { dg-do compile } */
/* { dg-options "-O3 -mavx512f" } */
-/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+%zmm" 2 } } */
-/* { dg-final { scan-assembler-times "vpbroadcastd" 1 } } */
-/* { dg-final { scan-assembler-times "vpbroadcastq" 1 } } */
-/* { dg-final { scan-assembler-times "vpbroadcastb" 2 } } */
-/* { dg-final { scan-assembler-times "vpbroadcastw" 2 } } */
-/* { dg-final { scan-assembler-times "vbroadcastss" 1 } } */
-/* { dg-final { scan-assembler-times "vbroadcastsd" 1 } } */
+/* { dg-final { scan-assembler-times "vmovdqa64\[ \\t\]+%zmm" 0 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastd" 2 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastq" 2 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastb" 3 } } */
+/* { dg-final { scan-assembler-times "vpbroadcastw" 3 } } */
+/* { dg-final { scan-assembler-times "vbroadcastss" 0 } } */
+/* { dg-final { scan-assembler-times "vbroadcastsd" 0 } } */
#include <x86intrin.h>
Jakub
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Fix and improve avx2 broadcasts (PR target/63594)
2014-10-21 16:13 [PATCH] Fix and improve avx2 broadcasts (PR target/63594) Jakub Jelinek
@ 2014-10-22 6:17 ` Uros Bizjak
2014-10-23 13:00 ` Rainer Orth
2014-11-29 22:21 ` H.J. Lu
2 siblings, 0 replies; 7+ messages in thread
From: Uros Bizjak @ 2014-10-22 6:17 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Kirill Yukhin, gcc-patches
On Tue, Oct 21, 2014 at 6:10 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch fixes a bunch of recent regressions:
> FAIL: gcc.target/i386/avx-1.c (internal compiler error)
> FAIL: gcc.target/i386/avx-1.c (test for excess errors)
> FAIL: gcc.target/i386/avx-2.c (internal compiler error)
> FAIL: gcc.target/i386/avx-2.c (test for excess errors)
> FAIL: gcc.target/i386/avx512f-vec-init.c (internal compiler error)
> FAIL: gcc.target/i386/avx512f-vec-init.c (test for excess errors)
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vbroadcastsd 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vbroadcastss 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vmovdqa64[ \\\\t]+%zmm 2
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastb 2
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastd 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastq 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastw 2
> FAIL: gcc.target/i386/sse-14.c (internal compiler error)
> FAIL: gcc.target/i386/sse-14.c (test for excess errors)
> FAIL: gcc.target/i386/sse-22.c (internal compiler error)
> FAIL: gcc.target/i386/sse-22.c (test for excess errors)
> FAIL: gcc.target/i386/sse-22a.c (internal compiler error)
> FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
> FAIL: gcc.target/i386/sse-23.c (internal compiler error)
> FAIL: gcc.target/i386/sse-23.c (test for excess errors)
> FAIL: gcc.target/i386/sse-24.c (internal compiler error)
> FAIL: gcc.target/i386/sse-24.c (test for excess errors)
> and improves quality of code generated for AVX2 and AVX512F broadcasts;
> as AVX2 broadcast instructions can have source in memory or vector register
> (but only AVX512F can have it in GPRs), the patch adds splitter for the
> GPR case and adds ! for that, so that RA can choose what is best and if
> broadcast from GPR is desirable, it first performs vmovd from GPR into
> the dest register and then vpbroadcast{b,w,d} it.
>
> The AVX512* patterns should be IMHO merged, so that whether GPR or MEM is used
> are just alternatives of the same define_insn rather than different define_insns,
> but am not changing that right now, will leave that to Kirill as a follow-up.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2014-10-21 Jakub Jelinek <jakub@redhat.com>
>
> PR target/63594
> * config/i386/i386.c (ix86_expand_vector_init_duplicate): For
> V{8HI,16QI,16HI,32QI}mode call ix86_vector_duplicate_value
> even for just TARGET_AVX2, not only for
> TARGET_AVX512VL && TARGET_AVX512BW. For V{32HI,64QI}mode,
> call ix86_vector_duplicate_value only if TARGET_AVX512BW,
> otherwise build it using concatenation of 256-bit
> broadcast.
> * config/i386/sse.md (AVX_VEC_DUP_MODE): Moved after
> avx512 broadcast patterns.
> (vec_dup<mode>): Likewise. For avx2 use
> v<sseintprefix>broadcast<bcstscalarsuff> instead of
> vbroadcast<ssescalarmodesuffix>.
> (AVX2_VEC_DUP_MODE): New mode iterator.
> (*vec_dup<mode>): New TARGET_AVX2 define_insn with
> AVX2_VEC_DUP_MODE iterator, add a splitter for that.
>
> * gcc.dg/pr63594-1.c: New test.
> * gcc.dg/pr63594-2.c: New test.
> * gcc.target/i386/sse2-pr63594-1.c: New test.
> * gcc.target/i386/sse2-pr63594-2.c: New test.
> * gcc.target/i386/avx-pr63594-1.c: New test.
> * gcc.target/i386/avx-pr63594-2.c: New test.
> * gcc.target/i386/avx2-pr63594-1.c: New test.
> * gcc.target/i386/avx2-pr63594-2.c: New test.
> * gcc.target/i386/avx512f-pr63594-1.c: New test.
> * gcc.target/i386/avx512f-pr63594-2.c: New test.
> * gcc.target/i386/avx512f-vec-init.c: Adjust expected
> insn counts.
OK.
Thanks,
Uros.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Fix and improve avx2 broadcasts (PR target/63594)
2014-10-21 16:13 [PATCH] Fix and improve avx2 broadcasts (PR target/63594) Jakub Jelinek
2014-10-22 6:17 ` Uros Bizjak
@ 2014-10-23 13:00 ` Rainer Orth
2014-10-23 13:14 ` Jakub Jelinek
2014-11-29 22:21 ` H.J. Lu
2 siblings, 1 reply; 7+ messages in thread
From: Rainer Orth @ 2014-10-23 13:00 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, gcc-patches
Hi Jakub,
> This patch fixes a bunch of recent regressions:
> FAIL: gcc.target/i386/avx-1.c (internal compiler error)
> FAIL: gcc.target/i386/avx-1.c (test for excess errors)
> FAIL: gcc.target/i386/avx-2.c (internal compiler error)
> FAIL: gcc.target/i386/avx-2.c (test for excess errors)
> FAIL: gcc.target/i386/avx512f-vec-init.c (internal compiler error)
> FAIL: gcc.target/i386/avx512f-vec-init.c (test for excess errors)
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vbroadcastsd 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vbroadcastss 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vmovdqa64[ \\\\t]+%zmm 2
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastb 2
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastd 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastq 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastw 2
> FAIL: gcc.target/i386/sse-14.c (internal compiler error)
> FAIL: gcc.target/i386/sse-14.c (test for excess errors)
> FAIL: gcc.target/i386/sse-22.c (internal compiler error)
> FAIL: gcc.target/i386/sse-22.c (test for excess errors)
> FAIL: gcc.target/i386/sse-22a.c (internal compiler error)
> FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
> FAIL: gcc.target/i386/sse-23.c (internal compiler error)
> FAIL: gcc.target/i386/sse-23.c (test for excess errors)
> FAIL: gcc.target/i386/sse-24.c (internal compiler error)
> FAIL: gcc.target/i386/sse-24.c (test for excess errors)
> and improves quality of code generated for AVX2 and AVX512F broadcasts;
> as AVX2 broadcast instructions can have source in memory or vector register
> (but only AVX512F can have it in GPRs), the patch adds splitter for the
> GPR case and adds ! for that, so that RA can choose what is best and if
> broadcast from GPR is desirable, it first performs vmovd from GPR into
> the dest register and then vpbroadcast{b,w,d} it.
>
> The AVX512* patterns should be IMHO merged, so that whether GPR or MEM is used
> are just alternatives of the same define_insn rather than different define_insns,
> but am not changing that right now, will leave that to Kirill as a follow-up.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
[...]
> * gcc.dg/pr63594-1.c: New test.
> * gcc.dg/pr63594-2.c: New test.
Unfortunately, I see some problems with those tests on Solaris:
* On Solaris/x86, I get
FAIL: gcc.dg/pr63594-2.c execution test
for 32-bit. Any particular reason to restrict -mno-mmx to Linux/x86?
Manually building the testcase with -mno-mmx on Solaris/x86 seems to
cure the failure.
* On 64-bit Solaris/SPARC, I get
FAIL: gcc.dg/pr63594-1.c (internal compiler error)
FAIL: gcc.dg/pr63594-1.c (test for excess errors)
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c: In function 'test1float1':
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:19:1: internal compiler error: Bus Error
/vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:57:1: note: in expansion of macro 'T'
0x751c03 crash_signal
/vol/gcc/src/hg/trunk/local/gcc/toplev.c:349
0x44ffb4 gen_group_rtx(rtx_def*)
/vol/gcc/src/hg/trunk/local/gcc/expr.c:1624
0x4f8167 expand_function_start(tree_node*)
/vol/gcc/src/hg/trunk/local/gcc/function.c:4803
0x36278f execute
/vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:5709
In gdb, I see a SEGV instead:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1 (LWP 1)]
gen_group_rtx (orig=0xfb5a3690) at /vol/gcc/src/hg/trunk/local/gcc/expr.c:1624
1624 if (i)
(gdb) where
#0 gen_group_rtx (orig=0xfb5a3690)
at /vol/gcc/src/hg/trunk/local/gcc/expr.c:1624
#1 0x004f8168 in expand_function_start (subr=0xfb497680)
at /vol/gcc/src/hg/trunk/local/gcc/function.c:4803
#2 0x00362790 in (anonymous namespace)::pass_expand::execute (
this=<optimized out>, fun=0xfb4a11e0)
at /vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:5709
#3 0x006819b8 in execute_one_pass (pass=pass@entry=0x112aab0)
at /vol/gcc/src/hg/trunk/local/gcc/passes.c:2156
#4 0x00682020 in execute_pass_list_1 (pass=0x112aab0, pass@entry=0x1128610)
at /vol/gcc/src/hg/trunk/local/gcc/passes.c:2208
#5 0x00682088 in execute_pass_list (fn=0xfb4a11e0, pass=0x1128610)
at /vol/gcc/src/hg/trunk/local/gcc/passes.c:2219
#6 0x0038fda4 in cgraph_node::expand (this=this@entry=0xfb4b2700)
at /vol/gcc/src/hg/trunk/local/gcc/cgraphunit.c:1742
#7 0x003918c4 in expand_all_functions ()
at /vol/gcc/src/hg/trunk/local/gcc/cgraphunit.c:1878
#8 symbol_table::compile (this=0xfb410000)
at /vol/gcc/src/hg/trunk/local/gcc/cgraphunit.c:2213
#9 0x003935f0 in symbol_table::finalize_compilation_unit (this=0xfb410000)
at /vol/gcc/src/hg/trunk/local/gcc/cgraphunit.c:2290
#10 0x002205dc in c_write_global_declarations ()
at /vol/gcc/src/hg/trunk/local/gcc/c/c-decl.c:10640
#11 0x00751cc4 in compile_file ()
at /vol/gcc/src/hg/trunk/local/gcc/toplev.c:574
#12 0x00e20b10 in toplev::main(int, char**) ()
#13 0x00e21344 in main (argc=20, argv=0xffbff43c)
at /vol/gcc/src/hg/trunk/local/gcc/main.c:38
FAIL: gcc.dg/pr63594-2.c (internal compiler error)
FAIL: gcc.dg/pr63594-2.c (test for excess errors)
WARNING: gcc.dg/pr63594-2.c compilation failed to produce executable
Rainer
--
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Fix and improve avx2 broadcasts (PR target/63594)
2014-10-23 13:00 ` Rainer Orth
@ 2014-10-23 13:14 ` Jakub Jelinek
2014-10-23 13:24 ` Rainer Orth
0 siblings, 1 reply; 7+ messages in thread
From: Jakub Jelinek @ 2014-10-23 13:14 UTC (permalink / raw)
To: Rainer Orth; +Cc: Uros Bizjak, Kirill Yukhin, gcc-patches
On Thu, Oct 23, 2014 at 02:58:06PM +0200, Rainer Orth wrote:
> Unfortunately, I see some problems with those tests on Solaris:
>
> * On Solaris/x86, I get
>
> FAIL: gcc.dg/pr63594-2.c execution test
>
> for 32-bit. Any particular reason to restrict -mno-mmx to Linux/x86?
> Manually building the testcase with -mno-mmx on Solaris/x86 seems to
> cure the failure.
No reason, probably finger memory without lots of thinking.
The reason for -mno-mmx is that the functions use floating point vectors
and scalar floating point arithmetics in the same function.
Feel free to change both pr63594-{1,2}.c with s/linux//g .
>
> * On 64-bit Solaris/SPARC, I get
>
> FAIL: gcc.dg/pr63594-1.c (internal compiler error)
> FAIL: gcc.dg/pr63594-1.c (test for excess errors)
>
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c: In function 'test1float1':
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:19:1: internal compiler error: Bus Error
> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:57:1: note: in expansion of macro 'T'
> 0x751c03 crash_signal
> /vol/gcc/src/hg/trunk/local/gcc/toplev.c:349
> 0x44ffb4 gen_group_rtx(rtx_def*)
> /vol/gcc/src/hg/trunk/local/gcc/expr.c:1624
> 0x4f8167 expand_function_start(tree_node*)
> /vol/gcc/src/hg/trunk/local/gcc/function.c:4803
> 0x36278f execute
> /vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:5709
Works fine on x86_64, and doesn't seem to be related to the fix in any way,
it seems the ICE is related to returning or passing the vectors, so
supposedly some latent Solaris/SPARC issue?
Jakub
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Fix and improve avx2 broadcasts (PR target/63594)
2014-10-23 13:14 ` Jakub Jelinek
@ 2014-10-23 13:24 ` Rainer Orth
2014-10-24 9:35 ` Rainer Orth
0 siblings, 1 reply; 7+ messages in thread
From: Rainer Orth @ 2014-10-23 13:24 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, gcc-patches
Jakub Jelinek <jakub@redhat.com> writes:
> On Thu, Oct 23, 2014 at 02:58:06PM +0200, Rainer Orth wrote:
>> Unfortunately, I see some problems with those tests on Solaris:
>>
>> * On Solaris/x86, I get
>>
>> FAIL: gcc.dg/pr63594-2.c execution test
>>
>> for 32-bit. Any particular reason to restrict -mno-mmx to Linux/x86?
>> Manually building the testcase with -mno-mmx on Solaris/x86 seems to
>> cure the failure.
>
> No reason, probably finger memory without lots of thinking.
> The reason for -mno-mmx is that the functions use floating point vectors
> and scalar floating point arithmetics in the same function.
> Feel free to change both pr63594-{1,2}.c with s/linux//g .
Ok, will do and commit after Linux and Solaris testing.
>> * On 64-bit Solaris/SPARC, I get
>>
>> FAIL: gcc.dg/pr63594-1.c (internal compiler error)
>> FAIL: gcc.dg/pr63594-1.c (test for excess errors)
>>
>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c: In function
>> 'test1float1':
>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:19:1:
>> internal compiler error: Bus Error
>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:57:1: note:
>> in expansion of macro 'T'
>> 0x751c03 crash_signal
>> /vol/gcc/src/hg/trunk/local/gcc/toplev.c:349
>> 0x44ffb4 gen_group_rtx(rtx_def*)
>> /vol/gcc/src/hg/trunk/local/gcc/expr.c:1624
>> 0x4f8167 expand_function_start(tree_node*)
>> /vol/gcc/src/hg/trunk/local/gcc/function.c:4803
>> 0x36278f execute
>> /vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:5709
>
> Works fine on x86_64, and doesn't seem to be related to the fix in any way,
> it seems the ICE is related to returning or passing the vectors, so
> supposedly some latent Solaris/SPARC issue?
Ok, I'll file a PR and Cc Eric.
Thanks.
Rainer
--
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Fix and improve avx2 broadcasts (PR target/63594)
2014-10-23 13:24 ` Rainer Orth
@ 2014-10-24 9:35 ` Rainer Orth
0 siblings, 0 replies; 7+ messages in thread
From: Rainer Orth @ 2014-10-24 9:35 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, gcc-patches, Eric Botcazou
[-- Attachment #1: Type: text/plain, Size: 1123 bytes --]
Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE> writes:
> Jakub Jelinek <jakub@redhat.com> writes:
>
>> On Thu, Oct 23, 2014 at 02:58:06PM +0200, Rainer Orth wrote:
>>> Unfortunately, I see some problems with those tests on Solaris:
>>>
>>> * On Solaris/x86, I get
>>>
>>> FAIL: gcc.dg/pr63594-2.c execution test
>>>
>>> for 32-bit. Any particular reason to restrict -mno-mmx to Linux/x86?
>>> Manually building the testcase with -mno-mmx on Solaris/x86 seems to
>>> cure the failure.
>>
>> No reason, probably finger memory without lots of thinking.
>> The reason for -mno-mmx is that the functions use floating point vectors
>> and scalar floating point arithmetics in the same function.
>> Feel free to change both pr63594-{1,2}.c with s/linux//g .
>
> Ok, will do and commit after Linux and Solaris testing.
Here's what I've checked in after i686-unknown-linux-gnu,
x86_64-unknown-linux-gnu, and i386-pc-solaris2.11 testing:
2014-10-24 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
* gcc.dg/pr63594-1.c: Apply -mno-mmx to all i?86-*-* and x86_64-*-*
targets.
* gcc.dg/pr63594-2.c: Likewise.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: pr63594.patch --]
[-- Type: text/x-patch, Size: 911 bytes --]
diff --git a/gcc/testsuite/gcc.dg/pr63594-1.c b/gcc/testsuite/gcc.dg/pr63594-1.c
--- a/gcc/testsuite/gcc.dg/pr63594-1.c
+++ b/gcc/testsuite/gcc.dg/pr63594-1.c
@@ -1,7 +1,7 @@
/* PR target/63594 */
/* { dg-do compile } */
/* { dg-options "-O2 -Wno-psabi" } */
-/* { dg-additional-options "-mno-mmx" { target i?86-*-linux* x86_64-*-linux* } } */
+/* { dg-additional-options "-mno-mmx" { target i?86-*-* x86_64-*-* } } */
#define C1 c
#define C2 C1, C1
diff --git a/gcc/testsuite/gcc.dg/pr63594-2.c b/gcc/testsuite/gcc.dg/pr63594-2.c
--- a/gcc/testsuite/gcc.dg/pr63594-2.c
+++ b/gcc/testsuite/gcc.dg/pr63594-2.c
@@ -1,7 +1,7 @@
/* PR target/63594 */
/* { dg-do run } */
/* { dg-options "-O2 -Wno-psabi" } */
-/* { dg-additional-options "-mno-mmx" { target i?86-*-linux* x86_64-*-linux* } } */
+/* { dg-additional-options "-mno-mmx" { target i?86-*-* x86_64-*-* } } */
#define C1 c
#define C2 C1, C1
[-- Attachment #3: Type: text/plain, Size: 1277 bytes --]
>>> * On 64-bit Solaris/SPARC, I get
>>>
>>> FAIL: gcc.dg/pr63594-1.c (internal compiler error)
>>> FAIL: gcc.dg/pr63594-1.c (test for excess errors)
>>>
>>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c: In function
>>> 'test1float1':
>>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:19:1:
>>> internal compiler error: Bus Error
>>> /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/pr63594-1.c:57:1: note:
>>> in expansion of macro 'T'
>>> 0x751c03 crash_signal
>>> /vol/gcc/src/hg/trunk/local/gcc/toplev.c:349
>>> 0x44ffb4 gen_group_rtx(rtx_def*)
>>> /vol/gcc/src/hg/trunk/local/gcc/expr.c:1624
>>> 0x4f8167 expand_function_start(tree_node*)
>>> /vol/gcc/src/hg/trunk/local/gcc/function.c:4803
>>> 0x36278f execute
>>> /vol/gcc/src/hg/trunk/local/gcc/cfgexpand.c:5709
>>
>> Works fine on x86_64, and doesn't seem to be related to the fix in any way,
>> it seems the ICE is related to returning or passing the vectors, so
>> supposedly some latent Solaris/SPARC issue?
>
> Ok, I'll file a PR and Cc Eric.
This seems to be the same issue as PR target/61535.
Rainer
--
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] Fix and improve avx2 broadcasts (PR target/63594)
2014-10-21 16:13 [PATCH] Fix and improve avx2 broadcasts (PR target/63594) Jakub Jelinek
2014-10-22 6:17 ` Uros Bizjak
2014-10-23 13:00 ` Rainer Orth
@ 2014-11-29 22:21 ` H.J. Lu
2 siblings, 0 replies; 7+ messages in thread
From: H.J. Lu @ 2014-11-29 22:21 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Uros Bizjak, Kirill Yukhin, GCC Patches
On Tue, Oct 21, 2014 at 9:10 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> Hi!
>
> This patch fixes a bunch of recent regressions:
> FAIL: gcc.target/i386/avx-1.c (internal compiler error)
> FAIL: gcc.target/i386/avx-1.c (test for excess errors)
> FAIL: gcc.target/i386/avx-2.c (internal compiler error)
> FAIL: gcc.target/i386/avx-2.c (test for excess errors)
> FAIL: gcc.target/i386/avx512f-vec-init.c (internal compiler error)
> FAIL: gcc.target/i386/avx512f-vec-init.c (test for excess errors)
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vbroadcastsd 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vbroadcastss 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vmovdqa64[ \\\\t]+%zmm 2
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastb 2
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastd 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastq 1
> UNRESOLVED: gcc.target/i386/avx512f-vec-init.c scan-assembler-times vpbroadcastw 2
> FAIL: gcc.target/i386/sse-14.c (internal compiler error)
> FAIL: gcc.target/i386/sse-14.c (test for excess errors)
> FAIL: gcc.target/i386/sse-22.c (internal compiler error)
> FAIL: gcc.target/i386/sse-22.c (test for excess errors)
> FAIL: gcc.target/i386/sse-22a.c (internal compiler error)
> FAIL: gcc.target/i386/sse-22a.c (test for excess errors)
> FAIL: gcc.target/i386/sse-23.c (internal compiler error)
> FAIL: gcc.target/i386/sse-23.c (test for excess errors)
> FAIL: gcc.target/i386/sse-24.c (internal compiler error)
> FAIL: gcc.target/i386/sse-24.c (test for excess errors)
> and improves quality of code generated for AVX2 and AVX512F broadcasts;
> as AVX2 broadcast instructions can have source in memory or vector register
> (but only AVX512F can have it in GPRs), the patch adds splitter for the
> GPR case and adds ! for that, so that RA can choose what is best and if
> broadcast from GPR is desirable, it first performs vmovd from GPR into
> the dest register and then vpbroadcast{b,w,d} it.
>
> The AVX512* patterns should be IMHO merged, so that whether GPR or MEM is used
> are just alternatives of the same define_insn rather than different define_insns,
> but am not changing that right now, will leave that to Kirill as a follow-up.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2014-10-21 Jakub Jelinek <jakub@redhat.com>
>
> PR target/63594
> * config/i386/i386.c (ix86_expand_vector_init_duplicate): For
> V{8HI,16QI,16HI,32QI}mode call ix86_vector_duplicate_value
> even for just TARGET_AVX2, not only for
> TARGET_AVX512VL && TARGET_AVX512BW. For V{32HI,64QI}mode,
> call ix86_vector_duplicate_value only if TARGET_AVX512BW,
> otherwise build it using concatenation of 256-bit
> broadcast.
> * config/i386/sse.md (AVX_VEC_DUP_MODE): Moved after
> avx512 broadcast patterns.
> (vec_dup<mode>): Likewise. For avx2 use
> v<sseintprefix>broadcast<bcstscalarsuff> instead of
> vbroadcast<ssescalarmodesuffix>.
> (AVX2_VEC_DUP_MODE): New mode iterator.
> (*vec_dup<mode>): New TARGET_AVX2 define_insn with
> AVX2_VEC_DUP_MODE iterator, add a splitter for that.
>
> * gcc.dg/pr63594-1.c: New test.
> * gcc.dg/pr63594-2.c: New test.
> * gcc.target/i386/sse2-pr63594-1.c: New test.
> * gcc.target/i386/sse2-pr63594-2.c: New test.
> * gcc.target/i386/avx-pr63594-1.c: New test.
> * gcc.target/i386/avx-pr63594-2.c: New test.
> * gcc.target/i386/avx2-pr63594-1.c: New test.
> * gcc.target/i386/avx2-pr63594-2.c: New test.
> * gcc.target/i386/avx512f-pr63594-1.c: New test.
> * gcc.target/i386/avx512f-pr63594-2.c: New test.
> * gcc.target/i386/avx512f-vec-init.c: Adjust expected
> insn counts.
>
This caused:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64110
--
H.J.
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2014-11-29 13:04 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-21 16:13 [PATCH] Fix and improve avx2 broadcasts (PR target/63594) Jakub Jelinek
2014-10-22 6:17 ` Uros Bizjak
2014-10-23 13:00 ` Rainer Orth
2014-10-23 13:14 ` Jakub Jelinek
2014-10-23 13:24 ` Rainer Orth
2014-10-24 9:35 ` Rainer Orth
2014-11-29 22:21 ` H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).