From: Thomas Schwinge <thomas@codesourcery.com>
To: Andrew Stubbs <ams@codesourcery.com>, <gcc-patches@gcc.gnu.org>
Subject: Re: [committed] amdgcn: minimal V64TImode vector support
Date: Tue, 20 Jun 2023 22:47:03 +0200 [thread overview]
Message-ID: <877crxx3nc.fsf@euler.schwinge.homeip.net> (raw)
In-Reply-To: <afd440fa-a7b9-a539-9056-7f9e85060ef0@codesourcery.com>
Hi!
On 2023-06-19T12:37:52+0100, Andrew Stubbs <ams@codesourcery.com> wrote:
> This patch adds just enough TImode vector support to use them for moving
> data about.
Andrew tells me this need not be worried about, but -- for my future self
searching email archives for FAILs/ICEs -- I'd like to at least document
here that commit 8aeabd9f63d8a54a5fa0b038ad4425a999e1cc75
"amdgcn: minimal V64TImode vector support" in '-march=gfx90a' testing
does regress:
[-PASS:-]{+FAIL: gcc.dg/pr78526.c (internal compiler error: in extract_insn, at recog.cc:2791)+}
{+FAIL:+} gcc.dg/pr78526.c (test for excess errors)
[...]/gcc/testsuite/gcc.dg/pr78526.c: In function 'foo':
[...]/gcc/testsuite/gcc.dg/pr78526.c:21:1: error: unrecognizable insn:
(insn 41 40 42 8 (set (reg:V4TI 443)
(vec_merge:V4TI (vec_duplicate:V4TI (reg:TI 433))
(reg:V4TI 443)
(ashift (const_int 1 [0x1])
(const_int 0 [0])))) "[...]/gcc/testsuite/gcc.dg/pr78526.c":13:11 -1
(nil))
during RTL pass: vregs
[...]/gcc/testsuite/gcc.dg/pr78526.c:21:1: internal compiler error: in extract_insn, at recog.cc:2791
0x73d9f9 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
[...]/gcc/rtl-error.cc:108
0x73da7a _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
[...]/gcc/rtl-error.cc:116
0xeb019e extract_insn(rtx_insn*)
[...]/gcc/recog.cc:2791
0xb2683c instantiate_virtual_regs_in_insn
[...]/gcc/function.cc:1611
0xb2683c instantiate_virtual_regs
[...]/gcc/function.cc:1984
0xb2683c execute
[...]/gcc/function.cc:2033
Similarly:
[-PASS:-]{+FAIL: gcc.dg/pr78540.c (internal compiler error: in extract_insn, at recog.cc:2791)+}
{+FAIL:+} gcc.dg/pr78540.c (test for excess errors)
[...]/gcc/testsuite/gcc.dg/pr78540.c: In function 'bar':
[...]/gcc/testsuite/gcc.dg/pr78540.c:27:1: error: unrecognizable insn:
(insn 68 67 69 2 (set (reg:V4TI 472)
(vec_merge:V4TI (vec_duplicate:V4TI (reg:TI 464))
(reg:V4TI 472)
(ashift (const_int 1 [0x1])
(reg:SI 474)))) "[...]/gcc/testsuite/gcc.dg/pr78540.c":25:21 discrim 1 -1
(nil))
during RTL pass: vregs
[...]/gcc/testsuite/gcc.dg/pr78540.c:27:1: internal compiler error: in extract_insn, at recog.cc:2791
0x73d9f9 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
[...]/gcc/rtl-error.cc:108
0x73da7a _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
[...]/gcc/rtl-error.cc:116
0xeb019e extract_insn(rtx_insn*)
[...]/gcc/recog.cc:2791
0xb2683c instantiate_virtual_regs_in_insn
[...]/gcc/function.cc:1611
0xb2683c instantiate_virtual_regs
[...]/gcc/function.cc:1984
0xb2683c execute
[...]/gcc/function.cc:2033
Differently:
[-PASS:-]{+FAIL: gcc.dg/pr78575.c (internal compiler error: in gen_ds_bpermutevNm, at config/gcn/gcn.cc:1377)+}
{+FAIL:+} gcc.dg/pr78575.cg/ (test for excess errors)
during RTL pass: expand
[...]/gcc/testsuite/gcc.dg/pr78575.c: In function 'foo':
[...]/gcc/testsuite/gcc.dg/pr78575.c:10:1: internal compiler error: in gen_ds_bpermutevNm, at config/gcn/gcn.cc:1377
0x1390c33 gen_ds_bpermutevNm
[...]/gcc/config/gcn/gcn.cc:1376
0x13a0f3a gcn_vectorize_vec_perm_const
[...]/gcc/config/gcn/gcn.cc:4867
0xded44b expand_vec_perm_const(machine_mode, rtx_def*, rtx_def*, int_vector_builder<poly_int<1u, long> > const&, machine_mode, rtx_def*)
[...]/gcc/optabs.cc:6456
0xaae98d expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier)
[...]/gcc/expr.cc:10446
0x941803 expand_gimple_stmt_1
[...]/gcc/cfgexpand.cc:3984
0x941803 expand_gimple_stmt
[...]/gcc/cfgexpand.cc:4044
0x942eba expand_gimple_basic_block
[...]/gcc/cfgexpand.cc:6096
0x9453d3 execute
[...]/gcc/cfgexpand.cc:6831
That's all. ;-)
Grüße
Thomas
> This is primarily for the use of divmodv64di4, which will
> use TImode to return a pair of DImode values.
>
> The TImode vectors have no other operators defined, and there are no
> hardware instructions to support this mode, beyond load and store.
>
> Committed to mainline, and OG13 will follow shortly.
>
> Andrew
> amdgcn: minimal V64TImode vector support
>
> Just enough support for TImode vectors to exist, load, store, move,
> without any real instructions available.
>
> This is primarily for the use of divmodv64di4, which uses TImode to
> return a pair of DImode values.
>
> gcc/ChangeLog:
>
> * config/gcn/gcn-protos.h (vgpr_4reg_mode_p): New function.
> * config/gcn/gcn-valu.md (V_4REG, V_4REG_ALT): New iterators.
> (V_MOV, V_MOV_ALT): Likewise.
> (scalar_mode, SCALAR_MODE): Add TImode.
> (vnsi, VnSI, vndi, VnDI): Likewise.
> (vec_merge, vec_merge_with_clobber, vec_merge_with_vcc): Use V_MOV.
> (mov<mode>, mov<mode>_unspec): Use V_MOV.
> (*mov<mode>_4reg): New insn.
> (mov<mode>_exec): New 4reg variant.
> (mov<mode>_sgprbase): Likewise.
> (reload_in<mode>, reload_out<mode>): Use V_MOV.
> (vec_set<mode>): Likewise.
> (vec_duplicate<mode><exec>): New 4reg variant.
> (vec_extract<mode><scalar_mode>): Likewise.
> (vec_extract<V_ALL:mode><V_ALL_ALT:mode>): Rename to ...
> (vec_extract<V_MOV:mode><V_MOV_ALT:mode>): ... this, and use V_MOV.
> (vec_extract<V_4REG:mode><V_4REG_ALT:mode>_nop): New 4reg variant.
> (fold_extract_last_<mode>): Use V_MOV.
> (vec_init<V_ALL:mode><V_ALL_ALT:mode>): Rename to ...
> (vec_init<V_MOV:mode><V_MOV_ALT:mode>): ... this, and use V_MOV.
> (gather_load<mode><vnsi>, gather<mode>_expr<exec>,
> gather<mode>_insn_1offset<exec>, gather<mode>_insn_1offset_ds<exec>,
> gather<mode>_insn_2offsets<exec>): Use V_MOV.
> (scatter_store<mode><vnsi>, scatter<mode>_expr<exec_scatter>,
> scatter<mode>_insn_1offset<exec_scatter>,
> scatter<mode>_insn_1offset_ds<exec_scatter>,
> scatter<mode>_insn_2offsets<exec_scatter>): Likewise.
> (maskload<mode>di, maskstore<mode>di, mask_gather_load<mode><vnsi>,
> mask_scatter_store<mode><vnsi>): Likewise.
> * config/gcn/gcn.cc (gcn_class_max_nregs): Use vgpr_4reg_mode_p.
> (gcn_hard_regno_mode_ok): Likewise.
> (GEN_VNM): Add TImode support.
> (USE_TI): New macro. Separate TImode operations from non-TImode ones.
> (gcn_vector_mode_supported_p): Add V64TImode, V32TImode, V16TImode,
> V8TImode, and V2TImode.
> (print_operand): Add 'J' and 'K' print codes.
>
> diff --git a/gcc/config/gcn/gcn-protos.h b/gcc/config/gcn/gcn-protos.h
> index 287ce17d422..3befb2b7caa 100644
> --- a/gcc/config/gcn/gcn-protos.h
> +++ b/gcc/config/gcn/gcn-protos.h
> @@ -136,6 +136,17 @@ vgpr_2reg_mode_p (machine_mode mode)
> return (mode == DImode || mode == DFmode);
> }
>
> +/* Return true if MODE is valid for four VGPR registers. */
> +
> +inline bool
> +vgpr_4reg_mode_p (machine_mode mode)
> +{
> + if (VECTOR_MODE_P (mode))
> + mode = GET_MODE_INNER (mode);
> +
> + return (mode == TImode);
> +}
> +
> /* Return true if MODE can be handled directly by VGPR operations. */
>
> inline bool
> diff --git a/gcc/config/gcn/gcn-valu.md b/gcc/config/gcn/gcn-valu.md
> index 7290cdc2fd0..284dda73da9 100644
> --- a/gcc/config/gcn/gcn-valu.md
> +++ b/gcc/config/gcn/gcn-valu.md
> @@ -96,6 +96,10 @@ (define_mode_iterator V_2REG_ALT
> V32DI V32DF
> V64DI V64DF])
>
> +; Vector modes for four vector registers
> +(define_mode_iterator V_4REG [V2TI V4TI V8TI V16TI V32TI V64TI])
> +(define_mode_iterator V_4REG_ALT [V2TI V4TI V8TI V16TI V32TI V64TI])
> +
> ; Vector modes with native support
> (define_mode_iterator V_noQI
> [V2HI V2HF V2SI V2SF V2DI V2DF
> @@ -136,7 +140,7 @@ (define_mode_iterator SV_SFDF
> V32SF V32DF
> V64SF V64DF])
>
> -; All of above
> +; All modes in which we want to do more than just moves.
> (define_mode_iterator V_ALL
> [V2QI V2HI V2HF V2SI V2SF V2DI V2DF
> V4QI V4HI V4HF V4SI V4SF V4DI V4DF
> @@ -175,97 +179,113 @@ (define_mode_iterator SV_FP
> V32HF V32SF V32DF
> V64HF V64SF V64DF])
>
> +; All modes that need moves, including those without many insns.
> +(define_mode_iterator V_MOV
> + [V2QI V2HI V2HF V2SI V2SF V2DI V2DF V2TI
> + V4QI V4HI V4HF V4SI V4SF V4DI V4DF V4TI
> + V8QI V8HI V8HF V8SI V8SF V8DI V8DF V8TI
> + V16QI V16HI V16HF V16SI V16SF V16DI V16DF V16TI
> + V32QI V32HI V32HF V32SI V32SF V32DI V32DF V32TI
> + V64QI V64HI V64HF V64SI V64SF V64DI V64DF V64TI])
> +(define_mode_iterator V_MOV_ALT
> + [V2QI V2HI V2HF V2SI V2SF V2DI V2DF V2TI
> + V4QI V4HI V4HF V4SI V4SF V4DI V4DF V4TI
> + V8QI V8HI V8HF V8SI V8SF V8DI V8DF V8TI
> + V16QI V16HI V16HF V16SI V16SF V16DI V16DF V16TI
> + V32QI V32HI V32HF V32SI V32SF V32DI V32DF V32TI
> + V64QI V64HI V64HF V64SI V64SF V64DI V64DF V64TI])
> +
> (define_mode_attr scalar_mode
> - [(QI "qi") (HI "hi") (SI "si")
> + [(QI "qi") (HI "hi") (SI "si") (TI "ti")
> (HF "hf") (SF "sf") (DI "di") (DF "df")
> - (V2QI "qi") (V2HI "hi") (V2SI "si")
> + (V2QI "qi") (V2HI "hi") (V2SI "si") (V2TI "ti")
> (V2HF "hf") (V2SF "sf") (V2DI "di") (V2DF "df")
> - (V4QI "qi") (V4HI "hi") (V4SI "si")
> + (V4QI "qi") (V4HI "hi") (V4SI "si") (V4TI "ti")
> (V4HF "hf") (V4SF "sf") (V4DI "di") (V4DF "df")
> - (V8QI "qi") (V8HI "hi") (V8SI "si")
> + (V8QI "qi") (V8HI "hi") (V8SI "si") (V8TI "ti")
> (V8HF "hf") (V8SF "sf") (V8DI "di") (V8DF "df")
> - (V16QI "qi") (V16HI "hi") (V16SI "si")
> + (V16QI "qi") (V16HI "hi") (V16SI "si") (V16TI "ti")
> (V16HF "hf") (V16SF "sf") (V16DI "di") (V16DF "df")
> - (V32QI "qi") (V32HI "hi") (V32SI "si")
> + (V32QI "qi") (V32HI "hi") (V32SI "si") (V32TI "ti")
> (V32HF "hf") (V32SF "sf") (V32DI "di") (V32DF "df")
> - (V64QI "qi") (V64HI "hi") (V64SI "si")
> + (V64QI "qi") (V64HI "hi") (V64SI "si") (V64TI "ti")
> (V64HF "hf") (V64SF "sf") (V64DI "di") (V64DF "df")])
>
> (define_mode_attr SCALAR_MODE
> - [(QI "QI") (HI "HI") (SI "SI")
> + [(QI "QI") (HI "HI") (SI "SI") (TI "TI")
> (HF "HF") (SF "SF") (DI "DI") (DF "DF")
> - (V2QI "QI") (V2HI "HI") (V2SI "SI")
> + (V2QI "QI") (V2HI "HI") (V2SI "SI") (V2TI "TI")
> (V2HF "HF") (V2SF "SF") (V2DI "DI") (V2DF "DF")
> - (V4QI "QI") (V4HI "HI") (V4SI "SI")
> + (V4QI "QI") (V4HI "HI") (V4SI "SI") (V4TI "TI")
> (V4HF "HF") (V4SF "SF") (V4DI "DI") (V4DF "DF")
> - (V8QI "QI") (V8HI "HI") (V8SI "SI")
> + (V8QI "QI") (V8HI "HI") (V8SI "SI") (V8TI "TI")
> (V8HF "HF") (V8SF "SF") (V8DI "DI") (V8DF "DF")
> - (V16QI "QI") (V16HI "HI") (V16SI "SI")
> + (V16QI "QI") (V16HI "HI") (V16SI "SI") (V16TI "TI")
> (V16HF "HF") (V16SF "SF") (V16DI "DI") (V16DF "DF")
> - (V32QI "QI") (V32HI "HI") (V32SI "SI")
> + (V32QI "QI") (V32HI "HI") (V32SI "SI") (V32TI "TI")
> (V32HF "HF") (V32SF "SF") (V32DI "DI") (V32DF "DF")
> - (V64QI "QI") (V64HI "HI") (V64SI "SI")
> + (V64QI "QI") (V64HI "HI") (V64SI "SI") (V64TI "TI")
> (V64HF "HF") (V64SF "SF") (V64DI "DI") (V64DF "DF")])
>
> (define_mode_attr vnsi
> - [(QI "si") (HI "si") (SI "si")
> + [(QI "si") (HI "si") (SI "si") (TI "si")
> (HF "si") (SF "si") (DI "si") (DF "si")
> (V2QI "v2si") (V2HI "v2si") (V2HF "v2si") (V2SI "v2si")
> - (V2SF "v2si") (V2DI "v2si") (V2DF "v2si")
> + (V2SF "v2si") (V2DI "v2si") (V2DF "v2si") (V2TI "v2si")
> (V4QI "v4si") (V4HI "v4si") (V4HF "v4si") (V4SI "v4si")
> - (V4SF "v4si") (V4DI "v4si") (V4DF "v4si")
> + (V4SF "v4si") (V4DI "v4si") (V4DF "v4si") (V4TI "v4si")
> (V8QI "v8si") (V8HI "v8si") (V8HF "v8si") (V8SI "v8si")
> - (V8SF "v8si") (V8DI "v8si") (V8DF "v8si")
> + (V8SF "v8si") (V8DI "v8si") (V8DF "v8si") (V8TI "v8si")
> (V16QI "v16si") (V16HI "v16si") (V16HF "v16si") (V16SI "v16si")
> - (V16SF "v16si") (V16DI "v16si") (V16DF "v16si")
> + (V16SF "v16si") (V16DI "v16si") (V16DF "v16si") (V16TI "v16si")
> (V32QI "v32si") (V32HI "v32si") (V32HF "v32si") (V32SI "v32si")
> - (V32SF "v32si") (V32DI "v32si") (V32DF "v32si")
> + (V32SF "v32si") (V32DI "v32si") (V32DF "v32si") (V32TI "v32si")
> (V64QI "v64si") (V64HI "v64si") (V64HF "v64si") (V64SI "v64si")
> - (V64SF "v64si") (V64DI "v64si") (V64DF "v64si")])
> + (V64SF "v64si") (V64DI "v64si") (V64DF "v64si") (V64TI "v64si")])
>
> (define_mode_attr VnSI
> - [(QI "SI") (HI "SI") (SI "SI")
> + [(QI "SI") (HI "SI") (SI "SI") (TI "SI")
> (HF "SI") (SF "SI") (DI "SI") (DF "SI")
> (V2QI "V2SI") (V2HI "V2SI") (V2HF "V2SI") (V2SI "V2SI")
> - (V2SF "V2SI") (V2DI "V2SI") (V2DF "V2SI")
> + (V2SF "V2SI") (V2DI "V2SI") (V2DF "V2SI") (V2TI "V2SI")
> (V4QI "V4SI") (V4HI "V4SI") (V4HF "V4SI") (V4SI "V4SI")
> - (V4SF "V4SI") (V4DI "V4SI") (V4DF "V4SI")
> + (V4SF "V4SI") (V4DI "V4SI") (V4DF "V4SI") (V4TI "V4SI")
> (V8QI "V8SI") (V8HI "V8SI") (V8HF "V8SI") (V8SI "V8SI")
> - (V8SF "V8SI") (V8DI "V8SI") (V8DF "V8SI")
> + (V8SF "V8SI") (V8DI "V8SI") (V8DF "V8SI") (V8TI "V8SI")
> (V16QI "V16SI") (V16HI "V16SI") (V16HF "V16SI") (V16SI "V16SI")
> - (V16SF "V16SI") (V16DI "V16SI") (V16DF "V16SI")
> + (V16SF "V16SI") (V16DI "V16SI") (V16DF "V16SI") (V16TI "V16SI")
> (V32QI "V32SI") (V32HI "V32SI") (V32HF "V32SI") (V32SI "V32SI")
> - (V32SF "V32SI") (V32DI "V32SI") (V32DF "V32SI")
> + (V32SF "V32SI") (V32DI "V32SI") (V32DF "V32SI") (V32TI "V32SI")
> (V64QI "V64SI") (V64HI "V64SI") (V64HF "V64SI") (V64SI "V64SI")
> - (V64SF "V64SI") (V64DI "V64SI") (V64DF "V64SI")])
> + (V64SF "V64SI") (V64DI "V64SI") (V64DF "V64SI") (V64TI "V64SI")])
>
> (define_mode_attr vndi
> [(V2QI "v2di") (V2HI "v2di") (V2HF "v2di") (V2SI "v2di")
> - (V2SF "v2di") (V2DI "v2di") (V2DF "v2di")
> + (V2SF "v2di") (V2DI "v2di") (V2DF "v2di") (V2TI "v2di")
> (V4QI "v4di") (V4HI "v4di") (V4HF "v4di") (V4SI "v4di")
> - (V4SF "v4di") (V4DI "v4di") (V4DF "v4di")
> + (V4SF "v4di") (V4DI "v4di") (V4DF "v4di") (V4TI "v4di")
> (V8QI "v8di") (V8HI "v8di") (V8HF "v8di") (V8SI "v8di")
> - (V8SF "v8di") (V8DI "v8di") (V8DF "v8di")
> + (V8SF "v8di") (V8DI "v8di") (V8DF "v8di") (V8TI "v8di")
> (V16QI "v16di") (V16HI "v16di") (V16HF "v16di") (V16SI "v16di")
> - (V16SF "v16di") (V16DI "v16di") (V16DF "v16di")
> + (V16SF "v16di") (V16DI "v16di") (V16DF "v16di") (V16TI "v16di")
> (V32QI "v32di") (V32HI "v32di") (V32HF "v32di") (V32SI "v32di")
> - (V32SF "v32di") (V32DI "v32di") (V32DF "v32di")
> + (V32SF "v32di") (V32DI "v32di") (V32DF "v32di") (V32TI "v32di")
> (V64QI "v64di") (V64HI "v64di") (V64HF "v64di") (V64SI "v64di")
> - (V64SF "v64di") (V64DI "v64di") (V64DF "v64di")])
> + (V64SF "v64di") (V64DI "v64di") (V64DF "v64di") (V64TI "v64di")])
>
> (define_mode_attr VnDI
> [(V2QI "V2DI") (V2HI "V2DI") (V2HF "V2DI") (V2SI "V2DI")
> - (V2SF "V2DI") (V2DI "V2DI") (V2DF "V2DI")
> + (V2SF "V2DI") (V2DI "V2DI") (V2DF "V2DI") (V2TI "V2DI")
> (V4QI "V4DI") (V4HI "V4DI") (V4HF "V4DI") (V4SI "V4DI")
> - (V4SF "V4DI") (V4DI "V4DI") (V4DF "V4DI")
> + (V4SF "V4DI") (V4DI "V4DI") (V4DF "V4DI") (V4TI "V4DI")
> (V8QI "V8DI") (V8HI "V8DI") (V8HF "V8DI") (V8SI "V8DI")
> - (V8SF "V8DI") (V8DI "V8DI") (V8DF "V8DI")
> + (V8SF "V8DI") (V8DI "V8DI") (V8DF "V8DI") (V8TI "V8DI")
> (V16QI "V16DI") (V16HI "V16DI") (V16HF "V16DI") (V16SI "V16DI")
> - (V16SF "V16DI") (V16DI "V16DI") (V16DF "V16DI")
> + (V16SF "V16DI") (V16DI "V16DI") (V16DF "V16DI") (V16TI "V16DI")
> (V32QI "V32DI") (V32HI "V32DI") (V32HF "V32DI") (V32SI "V32DI")
> - (V32SF "V32DI") (V32DI "V32DI") (V32DF "V32DI")
> + (V32SF "V32DI") (V32DI "V32DI") (V32DF "V32DI") (V32TI "V32DI")
> (V64QI "V64DI") (V64HI "V64DI") (V64HF "V64DI") (V64SI "V64DI")
> - (V64SF "V64DI") (V64DI "V64DI") (V64DF "V64DI")])
> + (V64SF "V64DI") (V64DI "V64DI") (V64DF "V64DI") (V64TI "V64DI")])
>
> (define_mode_attr sdwa
> [(V2QI "BYTE_0") (V2HI "WORD_0") (V2SI "DWORD")
> @@ -288,38 +308,38 @@ (define_subst_attr "exec_scatter" "scatter_store"
> "" "_exec")
>
> (define_subst "vec_merge"
> - [(set (match_operand:V_ALL 0)
> - (match_operand:V_ALL 1))]
> + [(set (match_operand:V_MOV 0)
> + (match_operand:V_MOV 1))]
> ""
> [(set (match_dup 0)
> - (vec_merge:V_ALL
> + (vec_merge:V_MOV
> (match_dup 1)
> - (match_operand:V_ALL 3 "gcn_register_or_unspec_operand" "U0")
> + (match_operand:V_MOV 3 "gcn_register_or_unspec_operand" "U0")
> (match_operand:DI 4 "gcn_exec_reg_operand" "e")))])
>
> (define_subst "vec_merge_with_clobber"
> - [(set (match_operand:V_ALL 0)
> - (match_operand:V_ALL 1))
> + [(set (match_operand:V_MOV 0)
> + (match_operand:V_MOV 1))
> (clobber (match_operand 2))]
> ""
> [(set (match_dup 0)
> - (vec_merge:V_ALL
> + (vec_merge:V_MOV
> (match_dup 1)
> - (match_operand:V_ALL 3 "gcn_register_or_unspec_operand" "U0")
> + (match_operand:V_MOV 3 "gcn_register_or_unspec_operand" "U0")
> (match_operand:DI 4 "gcn_exec_reg_operand" "e")))
> (clobber (match_dup 2))])
>
> (define_subst "vec_merge_with_vcc"
> - [(set (match_operand:V_ALL 0)
> - (match_operand:V_ALL 1))
> + [(set (match_operand:V_MOV 0)
> + (match_operand:V_MOV 1))
> (set (match_operand:DI 2)
> (match_operand:DI 3))]
> ""
> [(parallel
> [(set (match_dup 0)
> - (vec_merge:V_ALL
> + (vec_merge:V_MOV
> (match_dup 1)
> - (match_operand:V_ALL 4 "gcn_register_or_unspec_operand" "U0")
> + (match_operand:V_MOV 4 "gcn_register_or_unspec_operand" "U0")
> (match_operand:DI 5 "gcn_exec_reg_operand" "e")))
> (set (match_dup 2)
> (and:DI (match_dup 3)
> @@ -351,8 +371,8 @@ (define_subst "scatter_store"
> ; gather/scatter, maskload/store, etc.
>
> (define_expand "mov<mode>"
> - [(set (match_operand:V_ALL 0 "nonimmediate_operand")
> - (match_operand:V_ALL 1 "general_operand"))]
> + [(set (match_operand:V_MOV 0 "nonimmediate_operand")
> + (match_operand:V_MOV 1 "general_operand"))]
> ""
> {
> /* Bitwise reinterpret casts via SUBREG don't work with GCN vector
> @@ -421,8 +441,8 @@ (define_expand "mov<mode>"
> ; A pseudo instruction that helps LRA use the "U0" constraint.
>
> (define_insn "mov<mode>_unspec"
> - [(set (match_operand:V_ALL 0 "nonimmediate_operand" "=v")
> - (match_operand:V_ALL 1 "gcn_unspec_operand" " U"))]
> + [(set (match_operand:V_MOV 0 "nonimmediate_operand" "=v")
> + (match_operand:V_MOV 1 "gcn_unspec_operand" " U"))]
> ""
> ""
> [(set_attr "type" "unknown")
> @@ -527,6 +547,69 @@ (define_insn "mov<mode>_exec"
> [(set_attr "type" "vmult,vmult,vmult,*,*")
> (set_attr "length" "16,16,16,16,16")])
>
> +(define_insn "*mov<mode>_4reg"
> + [(set (match_operand:V_4REG 0 "nonimmediate_operand" "=v")
> + (match_operand:V_4REG 1 "general_operand" "vDB"))]
> + ""
> + {
> + return "v_mov_b32\t%L0, %L1\;"
> + "v_mov_b32\t%H0, %H1\;"
> + "v_mov_b32\t%J0, %J1\;"
> + "v_mov_b32\t%K0, %K1\;";
> + }
> + [(set_attr "type" "vmult")
> + (set_attr "length" "16")])
> +
> +(define_insn "mov<mode>_exec"
> + [(set (match_operand:V_4REG 0 "nonimmediate_operand" "= v, v, v, v, m")
> + (vec_merge:V_4REG
> + (match_operand:V_4REG 1 "general_operand" "vDB, v0, v0, m, v")
> + (match_operand:V_4REG 2 "gcn_alu_or_unspec_operand"
> + " U0,vDA0,vDA0,U0,U0")
> + (match_operand:DI 3 "register_operand" " e, cV, Sv, e, e")))
> + (clobber (match_scratch:<VnDI> 4 "= X, X, X,&v,&v"))]
> + "!MEM_P (operands[0]) || REG_P (operands[1])"
> + {
> + if (!REG_P (operands[1]) || REGNO (operands[0]) <= REGNO (operands[1]))
> + switch (which_alternative)
> + {
> + case 0:
> + return "v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1\;"
> + "v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1";
> + case 1:
> + return "v_cndmask_b32\t%L0, %L2, %L1, vcc\;"
> + "v_cndmask_b32\t%H0, %H2, %H1, vcc\;"
> + "v_cndmask_b32\t%J0, %J2, %J1, vcc\;"
> + "v_cndmask_b32\t%K0, %K2, %K1, vcc";
> + case 2:
> + return "v_cndmask_b32\t%L0, %L2, %L1, %3\;"
> + "v_cndmask_b32\t%H0, %H2, %H1, %3\;"
> + "v_cndmask_b32\t%J0, %J2, %J1, %3\;"
> + "v_cndmask_b32\t%K0, %K2, %K1, %3";
> + }
> + else
> + switch (which_alternative)
> + {
> + case 0:
> + return "v_mov_b32\t%H0, %H1\;v_mov_b32\t%L0, %L1\;"
> + "v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1";
> + case 1:
> + return "v_cndmask_b32\t%H0, %H2, %H1, vcc\;"
> + "v_cndmask_b32\t%L0, %L2, %L1, vcc\;"
> + "v_cndmask_b32\t%J0, %J2, %J1, vcc\;"
> + "v_cndmask_b32\t%K0, %K2, %K1, vcc";
> + case 2:
> + return "v_cndmask_b32\t%H0, %H2, %H1, %3\;"
> + "v_cndmask_b32\t%L0, %L2, %L1, %3\;"
> + "v_cndmask_b32\t%J0, %J2, %J1, %3\;"
> + "v_cndmask_b32\t%K0, %K2, %K1, %3";
> + }
> +
> + return "#";
> + }
> + [(set_attr "type" "vmult,vmult,vmult,*,*")
> + (set_attr "length" "32")])
> +
> ; This variant does not accept an unspec, but does permit MEM
> ; read/modify/write which is necessary for maskstore.
>
> @@ -592,12 +675,25 @@ (define_insn "mov<mode>_sgprbase"
> [(set_attr "type" "vmult,*,*")
> (set_attr "length" "8,12,12")])
>
> +(define_insn "mov<mode>_sgprbase"
> + [(set (match_operand:V_4REG 0 "nonimmediate_operand" "= v, v, m")
> + (unspec:V_4REG
> + [(match_operand:V_4REG 1 "general_operand" "vDB, m, v")]
> + UNSPEC_SGPRBASE))
> + (clobber (match_operand:<VnDI> 2 "register_operand" "=&v,&v,&v"))]
> + "lra_in_progress || reload_completed"
> + "v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1\;v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1
> + #
> + #"
> + [(set_attr "type" "vmult,*,*")
> + (set_attr "length" "8,12,12")])
> +
> ; reload_in was once a standard name, but here it's only referenced by
> ; gcn_secondary_reload. It allows a reload with a scratch register.
>
> (define_expand "reload_in<mode>"
> - [(set (match_operand:V_ALL 0 "register_operand" "= v")
> - (match_operand:V_ALL 1 "memory_operand" " m"))
> + [(set (match_operand:V_MOV 0 "register_operand" "= v")
> + (match_operand:V_MOV 1 "memory_operand" " m"))
> (clobber (match_operand:<VnDI> 2 "register_operand" "=&v"))]
> ""
> {
> @@ -608,8 +704,8 @@ (define_expand "reload_in<mode>"
> ; reload_out is similar to reload_in, above.
>
> (define_expand "reload_out<mode>"
> - [(set (match_operand:V_ALL 0 "memory_operand" "= m")
> - (match_operand:V_ALL 1 "register_operand" " v"))
> + [(set (match_operand:V_MOV 0 "memory_operand" "= m")
> + (match_operand:V_MOV 1 "register_operand" " v"))
> (clobber (match_operand:<VnDI> 2 "register_operand" "=&v"))]
> ""
> {
> @@ -620,9 +716,9 @@ (define_expand "reload_out<mode>"
> ; Expand scalar addresses into gather/scatter patterns
>
> (define_split
> - [(set (match_operand:V_ALL 0 "memory_operand")
> - (unspec:V_ALL
> - [(match_operand:V_ALL 1 "general_operand")]
> + [(set (match_operand:V_MOV 0 "memory_operand")
> + (unspec:V_MOV
> + [(match_operand:V_MOV 1 "general_operand")]
> UNSPEC_SGPRBASE))
> (clobber (match_scratch:<VnDI> 2))]
> ""
> @@ -638,10 +734,10 @@ (define_split
> })
>
> (define_split
> - [(set (match_operand:V_ALL 0 "memory_operand")
> - (vec_merge:V_ALL
> - (match_operand:V_ALL 1 "general_operand")
> - (match_operand:V_ALL 2 "")
> + [(set (match_operand:V_MOV 0 "memory_operand")
> + (vec_merge:V_MOV
> + (match_operand:V_MOV 1 "general_operand")
> + (match_operand:V_MOV 2 "")
> (match_operand:DI 3 "gcn_exec_reg_operand")))
> (clobber (match_scratch:<VnDI> 4))]
> ""
> @@ -659,14 +755,14 @@ (define_split
> })
>
> (define_split
> - [(set (match_operand:V_ALL 0 "nonimmediate_operand")
> - (unspec:V_ALL
> - [(match_operand:V_ALL 1 "memory_operand")]
> + [(set (match_operand:V_MOV 0 "nonimmediate_operand")
> + (unspec:V_MOV
> + [(match_operand:V_MOV 1 "memory_operand")]
> UNSPEC_SGPRBASE))
> (clobber (match_scratch:<VnDI> 2))]
> ""
> [(set (match_dup 0)
> - (unspec:V_ALL [(match_dup 5) (match_dup 6) (match_dup 7)
> + (unspec:V_MOV [(match_dup 5) (match_dup 6) (match_dup 7)
> (mem:BLK (scratch))]
> UNSPEC_GATHER))]
> {
> @@ -678,16 +774,16 @@ (define_split
> })
>
> (define_split
> - [(set (match_operand:V_ALL 0 "nonimmediate_operand")
> - (vec_merge:V_ALL
> - (match_operand:V_ALL 1 "memory_operand")
> - (match_operand:V_ALL 2 "")
> + [(set (match_operand:V_MOV 0 "nonimmediate_operand")
> + (vec_merge:V_MOV
> + (match_operand:V_MOV 1 "memory_operand")
> + (match_operand:V_MOV 2 "")
> (match_operand:DI 3 "gcn_exec_reg_operand")))
> (clobber (match_scratch:<VnDI> 4))]
> ""
> [(set (match_dup 0)
> - (vec_merge:V_ALL
> - (unspec:V_ALL [(match_dup 5) (match_dup 6) (match_dup 7)
> + (vec_merge:V_MOV
> + (unspec:V_MOV [(match_dup 5) (match_dup 6) (match_dup 7)
> (mem:BLK (scratch))]
> UNSPEC_GATHER)
> (match_dup 2)
> @@ -744,9 +840,9 @@ (define_insn "*vec_set<mode>"
> (set_attr "laneselect" "yes")])
>
> (define_expand "vec_set<mode>"
> - [(set (match_operand:V_ALL 0 "register_operand")
> - (vec_merge:V_ALL
> - (vec_duplicate:V_ALL
> + [(set (match_operand:V_MOV 0 "register_operand")
> + (vec_merge:V_MOV
> + (vec_duplicate:V_MOV
> (match_operand:<SCALAR_MODE> 1 "register_operand"))
> (match_dup 0)
> (ashift (const_int 1) (match_operand:SI 2 "gcn_alu_operand"))))]
> @@ -804,6 +900,15 @@ (define_insn "vec_duplicate<mode><exec>"
> [(set_attr "type" "vop3a")
> (set_attr "length" "16")])
>
> +(define_insn "vec_duplicate<mode><exec>"
> + [(set (match_operand:V_4REG 0 "register_operand" "= v")
> + (vec_duplicate:V_4REG
> + (match_operand:<SCALAR_MODE> 1 "gcn_alu_operand" "SvDB")))]
> + ""
> + "v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1\;v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1"
> + [(set_attr "type" "mult")
> + (set_attr "length" "32")])
> +
> (define_insn "vec_extract<mode><scalar_mode>"
> [(set (match_operand:<SCALAR_MODE> 0 "register_operand" "=Sg")
> (vec_select:<SCALAR_MODE>
> @@ -828,6 +933,18 @@ (define_insn "vec_extract<mode><scalar_mode>"
> (set_attr "exec" "none")
> (set_attr "laneselect" "yes")])
>
> +(define_insn "vec_extract<mode><scalar_mode>"
> + [(set (match_operand:<SCALAR_MODE> 0 "register_operand" "=&Sg")
> + (vec_select:<SCALAR_MODE>
> + (match_operand:V_4REG 1 "register_operand" " v")
> + (parallel [(match_operand:SI 2 "gcn_alu_operand" " SvB")])))]
> + ""
> + "v_readlane_b32 %L0, %L1, %2\;v_readlane_b32 %H0, %H1, %2\;v_readlane_b32 %J0, %J1, %2\;v_readlane_b32 %K0, %K1, %2"
> + [(set_attr "type" "vmult")
> + (set_attr "length" "32")
> + (set_attr "exec" "none")
> + (set_attr "laneselect" "yes")])
> +
> (define_insn "vec_extract<V_1REG:mode><V_1REG_ALT:mode>_nop"
> [(set (match_operand:V_1REG_ALT 0 "register_operand" "=v,v")
> (vec_select:V_1REG_ALT
> @@ -854,39 +971,52 @@ (define_insn "vec_extract<V_2REG:mode><V_2REG_ALT:mode>_nop"
> [(set_attr "type" "vmult")
> (set_attr "length" "0,8")])
>
> -(define_expand "vec_extract<V_ALL:mode><V_ALL_ALT:mode>"
> - [(match_operand:V_ALL_ALT 0 "register_operand")
> - (match_operand:V_ALL 1 "register_operand")
> +(define_insn "vec_extract<V_4REG:mode><V_4REG_ALT:mode>_nop"
> + [(set (match_operand:V_4REG_ALT 0 "register_operand" "=v,v")
> + (vec_select:V_4REG_ALT
> + (match_operand:V_4REG 1 "register_operand" " 0,v")
> + (match_operand 2 "ascending_zero_int_parallel" "")))]
> + "MODE_VF (<V_4REG_ALT:MODE>mode) < MODE_VF (<V_4REG:MODE>mode)
> + && <V_4REG_ALT:SCALAR_MODE>mode == <V_4REG:SCALAR_MODE>mode"
> + "@
> + ; in-place extract %0
> + v_mov_b32\t%L0, %L1\;v_mov_b32\t%H0, %H1\;v_mov_b32\t%J0, %J1\;v_mov_b32\t%K0, %K1"
> + [(set_attr "type" "vmult")
> + (set_attr "length" "0,16")])
> +
> +(define_expand "vec_extract<V_MOV:mode><V_MOV_ALT:mode>"
> + [(match_operand:V_MOV_ALT 0 "register_operand")
> + (match_operand:V_MOV 1 "register_operand")
> (match_operand 2 "immediate_operand")]
> - "MODE_VF (<V_ALL_ALT:MODE>mode) < MODE_VF (<V_ALL:MODE>mode)
> - && <V_ALL_ALT:SCALAR_MODE>mode == <V_ALL:SCALAR_MODE>mode"
> + "MODE_VF (<V_MOV_ALT:MODE>mode) < MODE_VF (<V_MOV:MODE>mode)
> + && <V_MOV_ALT:SCALAR_MODE>mode == <V_MOV:SCALAR_MODE>mode"
> {
> - int numlanes = GET_MODE_NUNITS (<V_ALL_ALT:MODE>mode);
> + int numlanes = GET_MODE_NUNITS (<V_MOV_ALT:MODE>mode);
> int firstlane = INTVAL (operands[2]) * numlanes;
> rtx tmp;
>
> if (firstlane == 0)
> {
> - rtx parallel = gen_rtx_PARALLEL (<V_ALL:MODE>mode,
> + rtx parallel = gen_rtx_PARALLEL (<V_MOV:MODE>mode,
> rtvec_alloc (numlanes));
> for (int i = 0; i < numlanes; i++)
> XVECEXP (parallel, 0, i) = GEN_INT (i);
> - emit_insn (gen_vec_extract<V_ALL:mode><V_ALL_ALT:mode>_nop
> + emit_insn (gen_vec_extract<V_MOV:mode><V_MOV_ALT:mode>_nop
> (operands[0], operands[1], parallel));
> } else {
> /* FIXME: optimize this by using DPP where available. */
>
> - rtx permutation = gen_reg_rtx (<V_ALL:VnSI>mode);
> - emit_insn (gen_vec_series<V_ALL:vnsi> (permutation,
> + rtx permutation = gen_reg_rtx (<V_MOV:VnSI>mode);
> + emit_insn (gen_vec_series<V_MOV:vnsi> (permutation,
> GEN_INT (firstlane*4),
> GEN_INT (4)));
>
> - tmp = gen_reg_rtx (<V_ALL:MODE>mode);
> - emit_insn (gen_ds_bpermute<V_ALL:mode> (tmp, permutation, operands[1],
> - get_exec (<V_ALL:MODE>mode)));
> + tmp = gen_reg_rtx (<V_MOV:MODE>mode);
> + emit_insn (gen_ds_bpermute<V_MOV:mode> (tmp, permutation, operands[1],
> + get_exec (<V_MOV:MODE>mode)));
>
> emit_move_insn (operands[0],
> - gen_rtx_SUBREG (<V_ALL_ALT:MODE>mode, tmp, 0));
> + gen_rtx_SUBREG (<V_MOV_ALT:MODE>mode, tmp, 0));
> }
> DONE;
> })
> @@ -894,7 +1024,7 @@ (define_expand "vec_extract<V_ALL:mode><V_ALL_ALT:mode>"
> (define_expand "extract_last_<mode>"
> [(match_operand:<SCALAR_MODE> 0 "register_operand")
> (match_operand:DI 1 "gcn_alu_operand")
> - (match_operand:V_ALL 2 "register_operand")]
> + (match_operand:V_MOV 2 "register_operand")]
> "can_create_pseudo_p ()"
> {
> rtx dst = operands[0];
> @@ -912,7 +1042,7 @@ (define_expand "fold_extract_last_<mode>"
> [(match_operand:<SCALAR_MODE> 0 "register_operand")
> (match_operand:<SCALAR_MODE> 1 "gcn_alu_operand")
> (match_operand:DI 2 "gcn_alu_operand")
> - (match_operand:V_ALL 3 "register_operand")]
> + (match_operand:V_MOV 3 "register_operand")]
> "can_create_pseudo_p ()"
> {
> rtx dst = operands[0];
> @@ -934,7 +1064,7 @@ (define_expand "fold_extract_last_<mode>"
> })
>
> (define_expand "vec_init<mode><scalar_mode>"
> - [(match_operand:V_ALL 0 "register_operand")
> + [(match_operand:V_MOV 0 "register_operand")
> (match_operand 1)]
> ""
> {
> @@ -942,11 +1072,11 @@ (define_expand "vec_init<mode><scalar_mode>"
> DONE;
> })
>
> -(define_expand "vec_init<V_ALL:mode><V_ALL_ALT:mode>"
> - [(match_operand:V_ALL 0 "register_operand")
> - (match_operand:V_ALL_ALT 1)]
> - "<V_ALL:SCALAR_MODE>mode == <V_ALL_ALT:SCALAR_MODE>mode
> - && MODE_VF (<V_ALL_ALT:MODE>mode) < MODE_VF (<V_ALL:MODE>mode)"
> +(define_expand "vec_init<V_MOV:mode><V_MOV_ALT:mode>"
> + [(match_operand:V_MOV 0 "register_operand")
> + (match_operand:V_MOV_ALT 1)]
> + "<V_MOV:SCALAR_MODE>mode == <V_MOV_ALT:SCALAR_MODE>mode
> + && MODE_VF (<V_MOV_ALT:MODE>mode) < MODE_VF (<V_MOV:MODE>mode)"
> {
> gcn_expand_vector_init (operands[0], operands[1]);
> DONE;
> @@ -988,7 +1118,7 @@ (define_expand "vec_init<V_ALL:mode><V_ALL_ALT:mode>"
> ;; TODO: implement combined gather and zero_extend, but only for -msram-ecc=on
>
> (define_expand "gather_load<mode><vnsi>"
> - [(match_operand:V_ALL 0 "register_operand")
> + [(match_operand:V_MOV 0 "register_operand")
> (match_operand:DI 1 "register_operand")
> (match_operand:<VnSI> 2 "register_operand")
> (match_operand 3 "immediate_operand")
> @@ -1011,8 +1141,8 @@ (define_expand "gather_load<mode><vnsi>"
>
> ; Allow any address expression
> (define_expand "gather<mode>_expr<exec>"
> - [(set (match_operand:V_ALL 0 "register_operand")
> - (unspec:V_ALL
> + [(set (match_operand:V_MOV 0 "register_operand")
> + (unspec:V_MOV
> [(match_operand 1 "")
> (match_operand 2 "immediate_operand")
> (match_operand 3 "immediate_operand")
> @@ -1022,8 +1152,8 @@ (define_expand "gather<mode>_expr<exec>"
> {})
>
> (define_insn "gather<mode>_insn_1offset<exec>"
> - [(set (match_operand:V_ALL 0 "register_operand" "=v")
> - (unspec:V_ALL
> + [(set (match_operand:V_MOV 0 "register_operand" "=v")
> + (unspec:V_MOV
> [(plus:<VnDI> (match_operand:<VnDI> 1 "register_operand" " v")
> (vec_duplicate:<VnDI>
> (match_operand 2 "immediate_operand" " n")))
> @@ -1061,8 +1191,8 @@ (define_insn "gather<mode>_insn_1offset<exec>"
> (set_attr "length" "12")])
>
> (define_insn "gather<mode>_insn_1offset_ds<exec>"
> - [(set (match_operand:V_ALL 0 "register_operand" "=v")
> - (unspec:V_ALL
> + [(set (match_operand:V_MOV 0 "register_operand" "=v")
> + (unspec:V_MOV
> [(plus:<VnSI> (match_operand:<VnSI> 1 "register_operand" " v")
> (vec_duplicate:<VnSI>
> (match_operand 2 "immediate_operand" " n")))
> @@ -1083,8 +1213,8 @@ (define_insn "gather<mode>_insn_1offset_ds<exec>"
> (set_attr "length" "12")])
>
> (define_insn "gather<mode>_insn_2offsets<exec>"
> - [(set (match_operand:V_ALL 0 "register_operand" "=v")
> - (unspec:V_ALL
> + [(set (match_operand:V_MOV 0 "register_operand" "=v")
> + (unspec:V_MOV
> [(plus:<VnDI>
> (plus:<VnDI>
> (vec_duplicate:<VnDI>
> @@ -1119,7 +1249,7 @@ (define_expand "scatter_store<mode><vnsi>"
> (match_operand:<VnSI> 1 "register_operand")
> (match_operand 2 "immediate_operand")
> (match_operand:SI 3 "gcn_alu_operand")
> - (match_operand:V_ALL 4 "register_operand")]
> + (match_operand:V_MOV 4 "register_operand")]
> ""
> {
> rtx addr = gcn_expand_scaled_offsets (DEFAULT_ADDR_SPACE, operands[0],
> @@ -1141,7 +1271,7 @@ (define_expand "scatter<mode>_expr<exec_scatter>"
> [(set (mem:BLK (scratch))
> (unspec:BLK
> [(match_operand:<VnDI> 0 "")
> - (match_operand:V_ALL 1 "register_operand")
> + (match_operand:V_MOV 1 "register_operand")
> (match_operand 2 "immediate_operand")
> (match_operand 3 "immediate_operand")]
> UNSPEC_SCATTER))]
> @@ -1154,7 +1284,7 @@ (define_insn "scatter<mode>_insn_1offset<exec_scatter>"
> [(plus:<VnDI> (match_operand:<VnDI> 0 "register_operand" "v")
> (vec_duplicate:<VnDI>
> (match_operand 1 "immediate_operand" "n")))
> - (match_operand:V_ALL 2 "register_operand" "v")
> + (match_operand:V_MOV 2 "register_operand" "v")
> (match_operand 3 "immediate_operand" "n")
> (match_operand 4 "immediate_operand" "n")]
> UNSPEC_SCATTER))]
> @@ -1192,7 +1322,7 @@ (define_insn "scatter<mode>_insn_1offset_ds<exec_scatter>"
> [(plus:<VnSI> (match_operand:<VnSI> 0 "register_operand" "v")
> (vec_duplicate:<VnSI>
> (match_operand 1 "immediate_operand" "n")))
> - (match_operand:V_ALL 2 "register_operand" "v")
> + (match_operand:V_MOV 2 "register_operand" "v")
> (match_operand 3 "immediate_operand" "n")
> (match_operand 4 "immediate_operand" "n")]
> UNSPEC_SCATTER))]
> @@ -1218,7 +1348,7 @@ (define_insn "scatter<mode>_insn_2offsets<exec_scatter>"
> (sign_extend:<VnDI>
> (match_operand:<VnSI> 1 "register_operand" " v")))
> (vec_duplicate:<VnDI> (match_operand 2 "immediate_operand" " n")))
> - (match_operand:V_ALL 3 "register_operand" " v")
> + (match_operand:V_MOV 3 "register_operand" " v")
> (match_operand 4 "immediate_operand" " n")
> (match_operand 5 "immediate_operand" " n")]
> UNSPEC_SCATTER))]
> @@ -3804,8 +3934,8 @@ (define_expand "while_ultsidi"
> })
>
> (define_expand "maskload<mode>di"
> - [(match_operand:V_ALL 0 "register_operand")
> - (match_operand:V_ALL 1 "memory_operand")
> + [(match_operand:V_MOV 0 "register_operand")
> + (match_operand:V_MOV 1 "memory_operand")
> (match_operand 2 "")]
> ""
> {
> @@ -3824,8 +3954,8 @@ (define_expand "maskload<mode>di"
> })
>
> (define_expand "maskstore<mode>di"
> - [(match_operand:V_ALL 0 "memory_operand")
> - (match_operand:V_ALL 1 "register_operand")
> + [(match_operand:V_MOV 0 "memory_operand")
> + (match_operand:V_MOV 1 "register_operand")
> (match_operand 2 "")]
> ""
> {
> @@ -3839,7 +3969,7 @@ (define_expand "maskstore<mode>di"
> })
>
> (define_expand "mask_gather_load<mode><vnsi>"
> - [(match_operand:V_ALL 0 "register_operand")
> + [(match_operand:V_MOV 0 "register_operand")
> (match_operand:DI 1 "register_operand")
> (match_operand:<VnSI> 2 "register_operand")
> (match_operand 3 "immediate_operand")
> @@ -3874,7 +4004,7 @@ (define_expand "mask_scatter_store<mode><vnsi>"
> (match_operand:<VnSI> 1 "register_operand")
> (match_operand 2 "immediate_operand")
> (match_operand:SI 3 "gcn_alu_operand")
> - (match_operand:V_ALL 4 "register_operand")
> + (match_operand:V_MOV 4 "register_operand")
> (match_operand:DI 5 "")]
> ""
> {
> diff --git a/gcc/config/gcn/gcn.cc b/gcc/config/gcn/gcn.cc
> index efb7211d54e..ead89a9fbaf 100644
> --- a/gcc/config/gcn/gcn.cc
> +++ b/gcc/config/gcn/gcn.cc
> @@ -489,7 +489,7 @@ gcn_class_max_nregs (reg_class_t rclass, machine_mode mode)
> if (vgpr_2reg_mode_p (mode))
> return 2;
> /* TImode is used by DImode compare_and_swap. */
> - if (mode == TImode)
> + if (vgpr_4reg_mode_p (mode))
> return 4;
> }
> else if (rclass == VCC_CONDITIONAL_REG && mode == BImode)
> @@ -592,9 +592,9 @@ gcn_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
> Therefore, we restrict ourselved to aligned registers. */
> return (vgpr_1reg_mode_p (mode)
> || (!((regno - FIRST_VGPR_REG) & 1) && vgpr_2reg_mode_p (mode))
> - /* TImode is used by DImode compare_and_swap. */
> - || (mode == TImode
> - && !((regno - FIRST_VGPR_REG) & 3)));
> + /* TImode is used by DImode compare_and_swap,
> + and by DIVMOD V64DImode libfuncs. */
> + || (!((regno - FIRST_VGPR_REG) & 3) && vgpr_4reg_mode_p (mode)));
> return false;
> }
>
> @@ -1326,6 +1326,7 @@ GEN_VN (PREFIX, si##SUFFIX, A(PARAMS), A(ARGS)) \
> GEN_VN (PREFIX, sf##SUFFIX, A(PARAMS), A(ARGS)) \
> GEN_VN (PREFIX, di##SUFFIX, A(PARAMS), A(ARGS)) \
> GEN_VN (PREFIX, df##SUFFIX, A(PARAMS), A(ARGS)) \
> +USE_TI (GEN_VN (PREFIX, ti##SUFFIX, A(PARAMS), A(ARGS))) \
> static rtx \
> gen_##PREFIX##vNm##SUFFIX (PARAMS, rtx merge_src=NULL, rtx exec=NULL) \
> { \
> @@ -1340,6 +1341,8 @@ gen_##PREFIX##vNm##SUFFIX (PARAMS, rtx merge_src=NULL, rtx exec=NULL) \
> case E_SFmode: return gen_##PREFIX##vNsf##SUFFIX (ARGS, merge_src, exec); \
> case E_DImode: return gen_##PREFIX##vNdi##SUFFIX (ARGS, merge_src, exec); \
> case E_DFmode: return gen_##PREFIX##vNdf##SUFFIX (ARGS, merge_src, exec); \
> + case E_TImode: \
> + USE_TI (return gen_##PREFIX##vNti##SUFFIX (ARGS, merge_src, exec);) \
> default: \
> break; \
> } \
> @@ -1348,6 +1351,14 @@ gen_##PREFIX##vNm##SUFFIX (PARAMS, rtx merge_src=NULL, rtx exec=NULL) \
> return NULL_RTX; \
> }
>
> +/* These have TImode support. */
> +#define USE_TI(ARGS) ARGS
> +GEN_VNM (mov,, A(rtx dest, rtx src), A(dest, src))
> +GEN_VNM (vec_duplicate,, A(rtx dest, rtx src), A(dest, src))
> +
> +/* These do not have TImode support. */
> +#undef USE_TI
> +#define USE_TI(ARGS)
> GEN_VNM (add,3, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
> GEN_VN (add,si3_dup, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
> GEN_VN (add,si3_vcc_dup, A(rtx dest, rtx src1, rtx src2, rtx vcc),
> @@ -1366,12 +1377,11 @@ GEN_VNM_NOEXEC (ds_bpermute,, A(rtx dest, rtx addr, rtx src, rtx exec),
> A(dest, addr, src, exec))
> GEN_VNM (gather,_expr, A(rtx dest, rtx addr, rtx as, rtx vol),
> A(dest, addr, as, vol))
> -GEN_VNM (mov,, A(rtx dest, rtx src), A(dest, src))
> GEN_VN (mul,si3_dup, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
> GEN_VN (sub,si3, A(rtx dest, rtx src1, rtx src2), A(dest, src1, src2))
> -GEN_VNM (vec_duplicate,, A(rtx dest, rtx src), A(dest, src))
> GEN_VN_NOEXEC (vec_series,si, A(rtx dest, rtx x, rtx c), A(dest, x, c))
>
> +#undef USE_TI
> #undef GEN_VNM
> #undef GEN_VN
> #undef GET_VN_FN
> @@ -1405,6 +1415,7 @@ get_code_for_##PREFIX##vN##SUFFIX (int nunits) \
> CODE_FOR (PREFIX, sf) \
> CODE_FOR (PREFIX, di) \
> CODE_FOR (PREFIX, df) \
> + CODE_FOR (PREFIX, ti) \
> static int \
> get_code_for_##PREFIX (machine_mode mode) \
> { \
> @@ -1420,6 +1431,7 @@ get_code_for_##PREFIX (machine_mode mode) \
> case E_SFmode: return get_code_for_##PREFIX##vNsf (vf); \
> case E_DImode: return get_code_for_##PREFIX##vNdi (vf); \
> case E_DFmode: return get_code_for_##PREFIX##vNdf (vf); \
> + case E_TImode: return get_code_for_##PREFIX##vNti (vf); \
> default: break; \
> } \
> \
> @@ -4895,7 +4907,13 @@ gcn_vector_mode_supported_p (machine_mode mode)
> || mode == V4SFmode || mode == V4DFmode
> || mode == V2QImode || mode == V2HImode
> || mode == V2SImode || mode == V2DImode
> - || mode == V2SFmode || mode == V2DFmode);
> + || mode == V2SFmode || mode == V2DFmode
> + /* TImode vectors are allowed to exist for divmod, but there
> + are almost no instructions defined for them, and the
> + autovectorizer does not use them. */
> + || mode == V64TImode || mode == V32TImode
> + || mode == V16TImode || mode == V8TImode
> + || mode == V4TImode || mode == V2TImode);
> }
>
> /* Implement TARGET_VECTORIZE_PREFERRED_SIMD_MODE.
> @@ -6722,6 +6740,10 @@ print_operand_address (FILE *file, rtx mem)
> O - print offset:n for data share operations.
> ^ - print "_co" suffix for GCN5 mnemonics
> g - print "glc", if appropriate for given MEM
> + L - print low-part of a multi-reg value
> + H - print second part of a multi-reg value (high-part of 2-reg value)
> + J - print third part of a multi-reg value
> + K - print fourth part of a multi-reg value
> */
>
> void
> @@ -7261,6 +7283,12 @@ print_operand (FILE *file, rtx x, int code)
> case 'H':
> print_operand (file, gcn_operand_part (GET_MODE (x), x, 1), 0);
> return;
> + case 'J':
> + print_operand (file, gcn_operand_part (GET_MODE (x), x, 2), 0);
> + return;
> + case 'K':
> + print_operand (file, gcn_operand_part (GET_MODE (x), x, 3), 0);
> + return;
> case 'R':
> /* Print a scalar register number as an integer. Temporary hack. */
> gcc_assert (REG_P (x));
-----------------
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht München, HRB 106955
prev parent reply other threads:[~2023-06-20 20:47 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-19 11:37 Andrew Stubbs
2023-06-20 20:47 ` Thomas Schwinge [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=877crxx3nc.fsf@euler.schwinge.homeip.net \
--to=thomas@codesourcery.com \
--cc=ams@codesourcery.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).