From: James Greenhalgh <james.greenhalgh@arm.com>
To: Richard Sandiford <richard.sandiford@linaro.org>
Cc: <gcc-patches@gcc.gnu.org>, <nd@arm.com>
Subject: Re: [AArch64] Remove use of wider vector modes
Date: Wed, 30 Aug 2017 17:17:00 -0000 [thread overview]
Message-ID: <20170830165426.GB30919@arm.com> (raw)
In-Reply-To: <87efs4c61d.fsf@linaro.org>
On Tue, Aug 22, 2017 at 10:20:46AM +0100, Richard Sandiford wrote:
> The AArch64 port defined x2, x3 and x4 vector modes that were only used
> in the rtl for the AdvSIMD LD{2,3,4} patterns. It seems unlikely that
> this rtl would have led to any valid simplifications, since the values
> involved were unspecs that had a different number of operands from the
> non-dreg versions. (The dreg UNSPEC_LD2 had a single operand, while
> the qreg one had two operands.)
>
> As it happened, the patterns led to invalid simplifications on big-
> endian targets due to a mix-up in the operand order, see Tamar's fix
> in r240271.
>
> This patch therefore replaces the rtl patterns with dedicated unspecs.
> This allows the x2, x3 and x4 modes to be removed, avoiding a clash
> with 256-bit and 512-bit SVE.
>
> Tested on aarch64-linux-gnu. Also tested by comparing the before and after
> assembly for the testsuite at -O2 -ftree-vectorize on aarch64-linux-gnu
> and aarch64_be-linux-gnu; there were no differences. OK to install?
This is OK. I think we were being a bit too optimistic with the design
of these RTL patterns, though clearly at least some part of this was getting
used in rare corner cases, as we had a wrong code-bug. I think what these
would give you is a simplification if you used a 64-bit ld2/3/4 and then
extracted a lane which GCC knew to be zero. This happened by mistake with
the backwards lane indexes for big-endian, causing us to think the DImode
subreg we took was constant zero.
In reality, there is not likely to be a common case that needs
optimisation here, so I don't mind losing this.
OK.
Thanks,
James
> 2017-08-22 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64-modes.def: Remove 32-, 48- and 64-byte
> vector modes.
> * config/aarch64/iterators.md (VRL2, VRL3, VRL4): Delete.
> * config/aarch64/aarch64.md (UNSPEC_LD2_DREG, UNSPEC_LD3_DREG)
> (UNSPEC_LD4_DREG): New unspecs.
> * config/aarch64/aarch64-simd.md (aarch64_ld2<mode>_dreg_le)
> (aarch64_ld2<mode>_dreg_be): Replace with...
> (aarch64_ld2<mode>_dreg): ...this pattern and use the new DREG
> unspec.
> (aarch64_ld3<mode>_dreg_le)
> (aarch64_ld3<mode>_dreg_be): Replace with...
> (aarch64_ld3<mode>_dreg): ...this pattern and use the new DREG
> unspec.
> (aarch64_ld4<mode>_dreg_le)
> (aarch64_ld4<mode>_dreg_be): Replace with...
> (aarch64_ld4<mode>_dreg): ...this pattern and use the new DREG
> unspec.
>
> Index: gcc/config/aarch64/aarch64-modes.def
> ===================================================================
> --- gcc/config/aarch64/aarch64-modes.def 2017-02-23 19:54:24.000000000 +0000
> +++ gcc/config/aarch64/aarch64-modes.def 2017-08-22 10:11:04.724056356 +0100
> @@ -44,15 +44,5 @@ INT_MODE (OI, 32);
> INT_MODE (CI, 48);
> INT_MODE (XI, 64);
>
> -/* Vector modes for register lists. */
> -VECTOR_MODES (INT, 32); /* V32QI V16HI V8SI V4DI. */
> -VECTOR_MODES (FLOAT, 32); /* V8SF V4DF. */
> -
> -VECTOR_MODES (INT, 48); /* V32QI V16HI V8SI V4DI. */
> -VECTOR_MODES (FLOAT, 48); /* V8SF V4DF. */
> -
> -VECTOR_MODES (INT, 64); /* V32QI V16HI V8SI V4DI. */
> -VECTOR_MODES (FLOAT, 64); /* V8SF V4DF. */
> -
> /* Quad float: 128-bit floating mode for long doubles. */
> FLOAT_MODE (TF, 16, ieee_quad_format);
> Index: gcc/config/aarch64/iterators.md
> ===================================================================
> --- gcc/config/aarch64/iterators.md 2017-08-03 10:40:55.896279339 +0100
> +++ gcc/config/aarch64/iterators.md 2017-08-22 10:11:04.727125997 +0100
> @@ -711,21 +711,6 @@ (define_mode_attr Vendreg [(OI "T") (CI
> ;; ld..._lane and st..._lane operations.
> (define_mode_attr nregs [(OI "2") (CI "3") (XI "4")])
>
> -(define_mode_attr VRL2 [(V8QI "V32QI") (V4HI "V16HI")
> - (V4HF "V16HF")
> - (V2SI "V8SI") (V2SF "V8SF")
> - (DI "V4DI") (DF "V4DF")])
> -
> -(define_mode_attr VRL3 [(V8QI "V48QI") (V4HI "V24HI")
> - (V4HF "V24HF")
> - (V2SI "V12SI") (V2SF "V12SF")
> - (DI "V6DI") (DF "V6DF")])
> -
> -(define_mode_attr VRL4 [(V8QI "V64QI") (V4HI "V32HI")
> - (V4HF "V32HF")
> - (V2SI "V16SI") (V2SF "V16SF")
> - (DI "V8DI") (DF "V8DF")])
> -
> ;; Mode for atomic operation suffixes
> (define_mode_attr atomic_sfx
> [(QI "b") (HI "h") (SI "") (DI "")])
> Index: gcc/config/aarch64/aarch64.md
> ===================================================================
> --- gcc/config/aarch64/aarch64.md 2017-08-16 08:50:34.060622654 +0100
> +++ gcc/config/aarch64/aarch64.md 2017-08-22 10:11:04.726102783 +0100
> @@ -98,10 +98,13 @@ (define_c_enum "unspec" [
> UNSPEC_GOTTINYTLS
> UNSPEC_LD1
> UNSPEC_LD2
> + UNSPEC_LD2_DREG
> UNSPEC_LD2_DUP
> UNSPEC_LD3
> + UNSPEC_LD3_DREG
> UNSPEC_LD3_DUP
> UNSPEC_LD4
> + UNSPEC_LD4_DREG
> UNSPEC_LD4_DUP
> UNSPEC_LD2_LANE
> UNSPEC_LD3_LANE
> Index: gcc/config/aarch64/aarch64-simd.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-simd.md 2017-08-17 17:29:27.227162205 +0100
> +++ gcc/config/aarch64/aarch64-simd.md 2017-08-22 10:11:04.726102783 +0100
> @@ -4963,278 +4963,62 @@ (define_expand "aarch64_ld<VSTRUCT:nregs
> DONE;
> })
>
> -(define_insn "aarch64_ld2<mode>_dreg_le"
> +(define_insn "aarch64_ld2<mode>_dreg"
> [(set (match_operand:OI 0 "register_operand" "=w")
> - (subreg:OI
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (unspec:VD
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD2)
> - (vec_duplicate:VD (const_int 0)))
> - (vec_concat:<VDBL>
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD2)
> - (vec_duplicate:VD (const_int 0)))) 0))]
> - "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> - "ld2\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
> - [(set_attr "type" "neon_load2_2reg<q>")]
> -)
> -
> -(define_insn "aarch64_ld2<mode>_dreg_be"
> - [(set (match_operand:OI 0 "register_operand" "=w")
> - (subreg:OI
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (vec_duplicate:VD (const_int 0))
> - (unspec:VD
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD2))
> - (vec_concat:<VDBL>
> - (vec_duplicate:VD (const_int 0))
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD2))) 0))]
> - "TARGET_SIMD && BYTES_BIG_ENDIAN"
> + (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> + (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> + UNSPEC_LD2_DREG))]
> + "TARGET_SIMD"
> "ld2\\t{%S0.<Vtype> - %T0.<Vtype>}, %1"
> [(set_attr "type" "neon_load2_2reg<q>")]
> )
>
> -(define_insn "aarch64_ld2<mode>_dreg_le"
> - [(set (match_operand:OI 0 "register_operand" "=w")
> - (subreg:OI
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (unspec:DX
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD2)
> - (const_int 0))
> - (vec_concat:<VDBL>
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD2)
> - (const_int 0))) 0))]
> - "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> - "ld1\\t{%S0.1d - %T0.1d}, %1"
> - [(set_attr "type" "neon_load1_2reg<q>")]
> -)
> -
> -(define_insn "aarch64_ld2<mode>_dreg_be"
> +(define_insn "aarch64_ld2<mode>_dreg"
> [(set (match_operand:OI 0 "register_operand" "=w")
> - (subreg:OI
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (const_int 0)
> - (unspec:DX
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD2))
> - (vec_concat:<VDBL>
> - (const_int 0)
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD2))) 0))]
> - "TARGET_SIMD && BYTES_BIG_ENDIAN"
> + (unspec:OI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> + (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> + UNSPEC_LD2_DREG))]
> + "TARGET_SIMD"
> "ld1\\t{%S0.1d - %T0.1d}, %1"
> [(set_attr "type" "neon_load1_2reg<q>")]
> )
>
> -(define_insn "aarch64_ld3<mode>_dreg_le"
> +(define_insn "aarch64_ld3<mode>_dreg"
> [(set (match_operand:CI 0 "register_operand" "=w")
> - (subreg:CI
> - (vec_concat:<VRL3>
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (unspec:VD
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD3)
> - (vec_duplicate:VD (const_int 0)))
> - (vec_concat:<VDBL>
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD3)
> - (vec_duplicate:VD (const_int 0))))
> - (vec_concat:<VDBL>
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD3)
> - (vec_duplicate:VD (const_int 0)))) 0))]
> - "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> - "ld3\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
> - [(set_attr "type" "neon_load3_3reg<q>")]
> -)
> -
> -(define_insn "aarch64_ld3<mode>_dreg_be"
> - [(set (match_operand:CI 0 "register_operand" "=w")
> - (subreg:CI
> - (vec_concat:<VRL3>
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (vec_duplicate:VD (const_int 0))
> - (unspec:VD
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD3))
> - (vec_concat:<VDBL>
> - (vec_duplicate:VD (const_int 0))
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD3)))
> - (vec_concat:<VDBL>
> - (vec_duplicate:VD (const_int 0))
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD3))) 0))]
> - "TARGET_SIMD && BYTES_BIG_ENDIAN"
> + (unspec:CI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> + (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> + UNSPEC_LD3_DREG))]
> + "TARGET_SIMD"
> "ld3\\t{%S0.<Vtype> - %U0.<Vtype>}, %1"
> [(set_attr "type" "neon_load3_3reg<q>")]
> )
>
> -(define_insn "aarch64_ld3<mode>_dreg_le"
> +(define_insn "aarch64_ld3<mode>_dreg"
> [(set (match_operand:CI 0 "register_operand" "=w")
> - (subreg:CI
> - (vec_concat:<VRL3>
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (unspec:DX
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD3)
> - (const_int 0))
> - (vec_concat:<VDBL>
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD3)
> - (const_int 0)))
> - (vec_concat:<VDBL>
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD3)
> - (const_int 0))) 0))]
> - "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> - "ld1\\t{%S0.1d - %U0.1d}, %1"
> - [(set_attr "type" "neon_load1_3reg<q>")]
> -)
> -
> -(define_insn "aarch64_ld3<mode>_dreg_be"
> - [(set (match_operand:CI 0 "register_operand" "=w")
> - (subreg:CI
> - (vec_concat:<VRL3>
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (const_int 0)
> - (unspec:DX
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD3))
> - (vec_concat:<VDBL>
> - (const_int 0)
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD3)))
> - (vec_concat:<VDBL>
> - (const_int 0)
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD3))) 0))]
> - "TARGET_SIMD && BYTES_BIG_ENDIAN"
> + (unspec:CI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> + (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> + UNSPEC_LD3_DREG))]
> + "TARGET_SIMD"
> "ld1\\t{%S0.1d - %U0.1d}, %1"
> [(set_attr "type" "neon_load1_3reg<q>")]
> )
>
> -(define_insn "aarch64_ld4<mode>_dreg_le"
> - [(set (match_operand:XI 0 "register_operand" "=w")
> - (subreg:XI
> - (vec_concat:<VRL4>
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (unspec:VD
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD4)
> - (vec_duplicate:VD (const_int 0)))
> - (vec_concat:<VDBL>
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD4)
> - (vec_duplicate:VD (const_int 0))))
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD4)
> - (vec_duplicate:VD (const_int 0)))
> - (vec_concat:<VDBL>
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD4)
> - (vec_duplicate:VD (const_int 0))))) 0))]
> - "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> - "ld4\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
> - [(set_attr "type" "neon_load4_4reg<q>")]
> -)
> -
> -(define_insn "aarch64_ld4<mode>_dreg_be"
> +(define_insn "aarch64_ld4<mode>_dreg"
> [(set (match_operand:XI 0 "register_operand" "=w")
> - (subreg:XI
> - (vec_concat:<VRL4>
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (vec_duplicate:VD (const_int 0))
> - (unspec:VD
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD4))
> - (vec_concat:<VDBL>
> - (vec_duplicate:VD (const_int 0))
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD4)))
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (vec_duplicate:VD (const_int 0))
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD4))
> - (vec_concat:<VDBL>
> - (vec_duplicate:VD (const_int 0))
> - (unspec:VD [(match_dup 1)]
> - UNSPEC_LD4)))) 0))]
> - "TARGET_SIMD && BYTES_BIG_ENDIAN"
> + (unspec:XI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> + (unspec:VD [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> + UNSPEC_LD4_DREG))]
> + "TARGET_SIMD"
> "ld4\\t{%S0.<Vtype> - %V0.<Vtype>}, %1"
> [(set_attr "type" "neon_load4_4reg<q>")]
> )
>
> -(define_insn "aarch64_ld4<mode>_dreg_le"
> +(define_insn "aarch64_ld4<mode>_dreg"
> [(set (match_operand:XI 0 "register_operand" "=w")
> - (subreg:XI
> - (vec_concat:<VRL4>
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (unspec:DX
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD4)
> - (const_int 0))
> - (vec_concat:<VDBL>
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD4)
> - (const_int 0)))
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD4)
> - (const_int 0))
> - (vec_concat:<VDBL>
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD4)
> - (const_int 0)))) 0))]
> - "TARGET_SIMD && !BYTES_BIG_ENDIAN"
> - "ld1\\t{%S0.1d - %V0.1d}, %1"
> - [(set_attr "type" "neon_load1_4reg<q>")]
> -)
> -
> -(define_insn "aarch64_ld4<mode>_dreg_be"
> - [(set (match_operand:XI 0 "register_operand" "=w")
> - (subreg:XI
> - (vec_concat:<VRL4>
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (const_int 0)
> - (unspec:DX
> - [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")]
> - UNSPEC_LD4))
> - (vec_concat:<VDBL>
> - (const_int 0)
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD4)))
> - (vec_concat:<VRL2>
> - (vec_concat:<VDBL>
> - (const_int 0)
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD4))
> - (vec_concat:<VDBL>
> - (const_int 0)
> - (unspec:DX [(match_dup 1)]
> - UNSPEC_LD4)))) 0))]
> - "TARGET_SIMD && BYTES_BIG_ENDIAN"
> + (unspec:XI [(match_operand:BLK 1 "aarch64_simd_struct_operand" "Utv")
> + (unspec:DX [(const_int 0)] UNSPEC_VSTRUCTDUMMY)]
> + UNSPEC_LD4_DREG))]
> + "TARGET_SIMD"
> "ld1\\t{%S0.1d - %V0.1d}, %1"
> [(set_attr "type" "neon_load1_4reg<q>")]
> )
> @@ -5248,12 +5032,7 @@ (define_expand "aarch64_ld<VSTRUCT:nregs
> rtx mem = gen_rtx_MEM (BLKmode, operands[1]);
> set_mem_size (mem, <VSTRUCT:nregs> * 8);
>
> - if (BYTES_BIG_ENDIAN)
> - emit_insn (gen_aarch64_ld<VSTRUCT:nregs><VDC:mode>_dreg_be (operands[0],
> - mem));
> - else
> - emit_insn (gen_aarch64_ld<VSTRUCT:nregs><VDC:mode>_dreg_le (operands[0],
> - mem));
> + emit_insn (gen_aarch64_ld<VSTRUCT:nregs><VDC:mode>_dreg (operands[0], mem));
> DONE;
> })
>
prev parent reply other threads:[~2017-08-30 16:54 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-22 9:54 Richard Sandiford
2017-08-30 17:17 ` James Greenhalgh [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170830165426.GB30919@arm.com \
--to=james.greenhalgh@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=nd@arm.com \
--cc=richard.sandiford@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).