* [pushed 0/8] aarch64: Fix regression in vec_init code quality
@ 2022-02-09 17:00 Richard Sandiford
2022-02-09 17:00 ` [pushed 1/8] aarch64: Tighten general_operand predicates Richard Sandiford
` (7 more replies)
0 siblings, 8 replies; 9+ messages in thread
From: Richard Sandiford @ 2022-02-09 17:00 UTC (permalink / raw)
To: gcc-patches
The main purpose of this patch series is to fix a performance
regression from GCC 8. Before the patch:
int64x2_t s64q_1(int64_t a0, int64_t a1) {
if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
return (int64x2_t) { a1, a0 };
else
return (int64x2_t) { a0, a1 };
}
generated:
fmov d0, x0
ins v0.d[1], x1
ins v0.d[1], x1
ret
whereas GCC 8 generated the more respectable:
dup v0.2d, x0
ins v0.d[1], x1
ret
But there are some related knock-on changes that IMO are needed to keep
things in a consistent and maintainable state.
There is still more cleanup and optimisation that could be done in this
area, but that's definitely GCC 13 material.
Tested on aarch64-linux-gnu and aarch64_be-elf, pushed.
Sorry for the size of the series, but it really did seem like the
best fix in the circumstances.
Richard
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pushed 1/8] aarch64: Tighten general_operand predicates
2022-02-09 17:00 [pushed 0/8] aarch64: Fix regression in vec_init code quality Richard Sandiford
@ 2022-02-09 17:00 ` Richard Sandiford
2022-02-09 17:00 ` [pushed 2/8] aarch64: Generalise vec_set predicate Richard Sandiford
` (6 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2022-02-09 17:00 UTC (permalink / raw)
To: gcc-patches
This patch fixes some case in which *general_operand was used over
*nonimmediate_operand by patterns that don't accept immediates.
This avoids some complication with later patches.
gcc/
* config/aarch64/aarch64-simd.md (aarch64_simd_vec_set<mode>): Use
aarch64_simd_nonimmediate_operand instead of
aarch64_simd_general_operand.
(@aarch64_combinez<mode>): Use nonimmediate_operand instead of
general_operand.
(@aarch64_combinez_be<mode>): Likewise.
---
gcc/config/aarch64/aarch64-simd.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 6646e069ad2..9529bdb4997 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1039,7 +1039,7 @@ (define_insn "aarch64_simd_vec_set<mode>"
[(set (match_operand:VALL_F16 0 "register_operand" "=w,w,w")
(vec_merge:VALL_F16
(vec_duplicate:VALL_F16
- (match_operand:<VEL> 1 "aarch64_simd_general_operand" "w,?r,Utv"))
+ (match_operand:<VEL> 1 "aarch64_simd_nonimmediate_operand" "w,?r,Utv"))
(match_operand:VALL_F16 3 "register_operand" "0,0,0")
(match_operand:SI 2 "immediate_operand" "i,i,i")))]
"TARGET_SIMD"
@@ -4380,7 +4380,7 @@ (define_insn "store_pair_lanes<mode>"
(define_insn "@aarch64_combinez<mode>"
[(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w")
(vec_concat:<VDBL>
- (match_operand:VDC 1 "general_operand" "w,?r,m")
+ (match_operand:VDC 1 "nonimmediate_operand" "w,?r,m")
(match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero")))]
"TARGET_SIMD && !BYTES_BIG_ENDIAN"
"@
@@ -4395,7 +4395,7 @@ (define_insn "@aarch64_combinez_be<mode>"
[(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w")
(vec_concat:<VDBL>
(match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero")
- (match_operand:VDC 1 "general_operand" "w,?r,m")))]
+ (match_operand:VDC 1 "nonimmediate_operand" "w,?r,m")))]
"TARGET_SIMD && BYTES_BIG_ENDIAN"
"@
mov\\t%0.8b, %1.8b
--
2.25.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pushed 2/8] aarch64: Generalise vec_set predicate
2022-02-09 17:00 [pushed 0/8] aarch64: Fix regression in vec_init code quality Richard Sandiford
2022-02-09 17:00 ` [pushed 1/8] aarch64: Tighten general_operand predicates Richard Sandiford
@ 2022-02-09 17:00 ` Richard Sandiford
2022-02-09 17:00 ` [pushed 3/8] aarch64: Generalise adjacency check for load_pair_lanes Richard Sandiford
` (5 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2022-02-09 17:00 UTC (permalink / raw)
To: gcc-patches
The aarch64_simd_vec_set<mode> define_insn takes memory operands,
so this patch makes the vec_set<mode> optab expander do the same.
gcc/
* config/aarch64/aarch64-simd.md (vec_set<mode>): Allow the
element to be an aarch64_simd_nonimmediate_operand.
---
gcc/config/aarch64/aarch64-simd.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 9529bdb4997..872a3d78269 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1378,7 +1378,7 @@ (define_insn "vec_shr_<mode>"
(define_expand "vec_set<mode>"
[(match_operand:VALL_F16 0 "register_operand")
- (match_operand:<VEL> 1 "register_operand")
+ (match_operand:<VEL> 1 "aarch64_simd_nonimmediate_operand")
(match_operand:SI 2 "immediate_operand")]
"TARGET_SIMD"
{
--
2.25.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pushed 3/8] aarch64: Generalise adjacency check for load_pair_lanes
2022-02-09 17:00 [pushed 0/8] aarch64: Fix regression in vec_init code quality Richard Sandiford
2022-02-09 17:00 ` [pushed 1/8] aarch64: Tighten general_operand predicates Richard Sandiford
2022-02-09 17:00 ` [pushed 2/8] aarch64: Generalise vec_set predicate Richard Sandiford
@ 2022-02-09 17:00 ` Richard Sandiford
2022-02-09 17:01 ` [pushed 4/8] aarch64: Remove redundant vec_concat patterns Richard Sandiford
` (4 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2022-02-09 17:00 UTC (permalink / raw)
To: gcc-patches
This patch generalises the load_pair_lanes<mode> guard so that
it uses aarch64_check_consecutive_mems to check for consecutive
mems. It also allows the pattern to be used for STRICT_ALIGNMENT
targets if the alignment is high enough.
The main aim is to avoid an inline test, for the sake of a later patch
that needs to repeat it. Reusing aarch64_check_consecutive_mems seemed
simpler than writing an entirely new function.
gcc/
* config/aarch64/aarch64-protos.h (aarch64_mergeable_load_pair_p):
Declare.
* config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): Use
aarch64_mergeable_load_pair_p instead of inline check.
* config/aarch64/aarch64.cc (aarch64_expand_vector_init): Likewise.
(aarch64_check_consecutive_mems): Allow the reversed parameter
to be null.
(aarch64_mergeable_load_pair_p): New function.
---
gcc/config/aarch64/aarch64-protos.h | 1 +
gcc/config/aarch64/aarch64-simd.md | 7 +--
gcc/config/aarch64/aarch64.cc | 54 ++++++++++++-------
gcc/testsuite/gcc.target/aarch64/vec-init-6.c | 12 +++++
gcc/testsuite/gcc.target/aarch64/vec-init-7.c | 12 +++++
5 files changed, 62 insertions(+), 24 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-6.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-7.c
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 26368538a55..b75ed35635b 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1000,6 +1000,7 @@ void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
int aarch64_ccmp_mode_to_code (machine_mode mode);
bool extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset);
+bool aarch64_mergeable_load_pair_p (machine_mode, rtx, rtx);
bool aarch64_operands_ok_for_ldpstp (rtx *, bool, machine_mode);
bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, machine_mode);
void aarch64_swap_ldrstr_operands (rtx *, bool);
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 872a3d78269..c5bc2ea658b 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4353,11 +4353,8 @@ (define_insn "load_pair_lanes<mode>"
(vec_concat:<VDBL>
(match_operand:VDC 1 "memory_operand" "Utq")
(match_operand:VDC 2 "memory_operand" "m")))]
- "TARGET_SIMD && !STRICT_ALIGNMENT
- && rtx_equal_p (XEXP (operands[2], 0),
- plus_constant (Pmode,
- XEXP (operands[1], 0),
- GET_MODE_SIZE (<MODE>mode)))"
+ "TARGET_SIMD
+ && aarch64_mergeable_load_pair_p (<VDBL>mode, operands[1], operands[2])"
"ldr\\t%q0, %1"
[(set_attr "type" "neon_load1_1reg_q")]
)
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 296145e6008..c47543aebf3 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -21063,11 +21063,7 @@ aarch64_expand_vector_init (rtx target, rtx vals)
for store_pair_lanes<mode>. */
if (memory_operand (x0, inner_mode)
&& memory_operand (x1, inner_mode)
- && !STRICT_ALIGNMENT
- && rtx_equal_p (XEXP (x1, 0),
- plus_constant (Pmode,
- XEXP (x0, 0),
- GET_MODE_SIZE (inner_mode))))
+ && aarch64_mergeable_load_pair_p (mode, x0, x1))
{
rtx t;
if (inner_mode == DFmode)
@@ -24687,14 +24683,20 @@ aarch64_sched_adjust_priority (rtx_insn *insn, int priority)
return priority;
}
-/* Check if *MEM1 and *MEM2 are consecutive memory references and,
+/* If REVERSED is null, return true if memory reference *MEM2 comes
+ immediately after memory reference *MEM1. Do not change the references
+ in this case.
+
+ Otherwise, check if *MEM1 and *MEM2 are consecutive memory references and,
if they are, try to make them use constant offsets from the same base
register. Return true on success. When returning true, set *REVERSED
to true if *MEM1 comes after *MEM2, false if *MEM1 comes before *MEM2. */
static bool
aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, bool *reversed)
{
- *reversed = false;
+ if (reversed)
+ *reversed = false;
+
if (GET_RTX_CLASS (GET_CODE (XEXP (*mem1, 0))) == RTX_AUTOINC
|| GET_RTX_CLASS (GET_CODE (XEXP (*mem2, 0))) == RTX_AUTOINC)
return false;
@@ -24723,7 +24725,7 @@ aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, bool *reversed)
if (known_eq (UINTVAL (offset1) + size1, UINTVAL (offset2)))
return true;
- if (known_eq (UINTVAL (offset2) + size2, UINTVAL (offset1)))
+ if (known_eq (UINTVAL (offset2) + size2, UINTVAL (offset1)) && reversed)
{
*reversed = true;
return true;
@@ -24756,22 +24758,25 @@ aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, bool *reversed)
if (known_eq (expr_offset1 + size1, expr_offset2))
;
- else if (known_eq (expr_offset2 + size2, expr_offset1))
+ else if (known_eq (expr_offset2 + size2, expr_offset1) && reversed)
*reversed = true;
else
return false;
- if (base2)
+ if (reversed)
{
- rtx addr1 = plus_constant (Pmode, XEXP (*mem2, 0),
- expr_offset1 - expr_offset2);
- *mem1 = replace_equiv_address_nv (*mem1, addr1);
- }
- else
- {
- rtx addr2 = plus_constant (Pmode, XEXP (*mem1, 0),
- expr_offset2 - expr_offset1);
- *mem2 = replace_equiv_address_nv (*mem2, addr2);
+ if (base2)
+ {
+ rtx addr1 = plus_constant (Pmode, XEXP (*mem2, 0),
+ expr_offset1 - expr_offset2);
+ *mem1 = replace_equiv_address_nv (*mem1, addr1);
+ }
+ else
+ {
+ rtx addr2 = plus_constant (Pmode, XEXP (*mem1, 0),
+ expr_offset2 - expr_offset1);
+ *mem2 = replace_equiv_address_nv (*mem2, addr2);
+ }
}
return true;
}
@@ -24779,6 +24784,17 @@ aarch64_check_consecutive_mems (rtx *mem1, rtx *mem2, bool *reversed)
return false;
}
+/* Return true if MEM1 and MEM2 can be combined into a single access
+ of mode MODE, with the combined access having the same address as MEM1. */
+
+bool
+aarch64_mergeable_load_pair_p (machine_mode mode, rtx mem1, rtx mem2)
+{
+ if (STRICT_ALIGNMENT && MEM_ALIGN (mem1) < GET_MODE_ALIGNMENT (mode))
+ return false;
+ return aarch64_check_consecutive_mems (&mem1, &mem2, nullptr);
+}
+
/* Given OPERANDS of consecutive load/store, check if we can merge
them into ldp/stp. LOAD is true if they are load instructions.
MODE is the mode of memory operands. */
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-6.c b/gcc/testsuite/gcc.target/aarch64/vec-init-6.c
new file mode 100644
index 00000000000..96450157498
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-6.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#include <arm_neon.h>
+
+int64_t s64[2];
+float64_t f64[2];
+
+int64x2_t test_s64() { return (int64x2_t) { s64[0], s64[1] }; }
+float64x2_t test_f64() { return (float64x2_t) { f64[0], f64[1] }; }
+
+/* { dg-final { scan-assembler-not {\tins\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-7.c b/gcc/testsuite/gcc.target/aarch64/vec-init-7.c
new file mode 100644
index 00000000000..795895286db
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-7.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O -mstrict-align" } */
+
+#include <arm_neon.h>
+
+int64_t s64[2] __attribute__((aligned(16)));
+float64_t f64[2] __attribute__((aligned(16)));
+
+int64x2_t test_s64() { return (int64x2_t) { s64[0], s64[1] }; }
+float64x2_t test_f64() { return (float64x2_t) { f64[0], f64[1] }; }
+
+/* { dg-final { scan-assembler-not {\tins\t} } } */
--
2.25.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pushed 4/8] aarch64: Remove redundant vec_concat patterns
2022-02-09 17:00 [pushed 0/8] aarch64: Fix regression in vec_init code quality Richard Sandiford
` (2 preceding siblings ...)
2022-02-09 17:00 ` [pushed 3/8] aarch64: Generalise adjacency check for load_pair_lanes Richard Sandiford
@ 2022-02-09 17:01 ` Richard Sandiford
2022-02-09 17:01 ` [pushed 5/8] aarch64: Add more vec_combine patterns Richard Sandiford
` (3 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2022-02-09 17:01 UTC (permalink / raw)
To: gcc-patches
move_lo_quad_internal_<mode> and move_lo_quad_internal_be_<mode>
partially duplicate the later aarch64_combinez{,_be}<mode> patterns.
The duplication itself is a regression.
The only substantive differences between the two are:
* combinez uses vector MOV (ORR) instead of element MOV (DUP).
The former seems more likely to be handled via renaming.
* combinez disparages the GPR->FPR alternative whereas move_lo_quad
gave it equal cost. The new test gives a token example of when
the combinez behaviour helps.
gcc/
* config/aarch64/aarch64-simd.md (move_lo_quad_internal_<mode>)
(move_lo_quad_internal_be_<mode>): Delete.
(move_lo_quad_<mode>): Use aarch64_combine<Vhalf> instead of the above.
gcc/testsuite/
* gcc.target/aarch64/vec-init-8.c: New test.
---
gcc/config/aarch64/aarch64-simd.md | 37 +------------------
gcc/testsuite/gcc.target/aarch64/vec-init-8.c | 15 ++++++++
2 files changed, 17 insertions(+), 35 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-8.c
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index c5bc2ea658b..d6cd4c70fe7 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1584,46 +1584,13 @@ (define_insn "aarch64_<optab>p<mode>"
;; On little-endian this is { operand, zeroes }
;; On big-endian this is { zeroes, operand }
-(define_insn "move_lo_quad_internal_<mode>"
- [(set (match_operand:VQMOV 0 "register_operand" "=w,w,w")
- (vec_concat:VQMOV
- (match_operand:<VHALF> 1 "register_operand" "w,r,r")
- (match_operand:<VHALF> 2 "aarch64_simd_or_scalar_imm_zero")))]
- "TARGET_SIMD && !BYTES_BIG_ENDIAN"
- "@
- dup\\t%d0, %1.d[0]
- fmov\\t%d0, %1
- dup\\t%d0, %1"
- [(set_attr "type" "neon_dup<q>,f_mcr,neon_dup<q>")
- (set_attr "length" "4")
- (set_attr "arch" "simd,fp,simd")]
-)
-
-(define_insn "move_lo_quad_internal_be_<mode>"
- [(set (match_operand:VQMOV 0 "register_operand" "=w,w,w")
- (vec_concat:VQMOV
- (match_operand:<VHALF> 2 "aarch64_simd_or_scalar_imm_zero")
- (match_operand:<VHALF> 1 "register_operand" "w,r,r")))]
- "TARGET_SIMD && BYTES_BIG_ENDIAN"
- "@
- dup\\t%d0, %1.d[0]
- fmov\\t%d0, %1
- dup\\t%d0, %1"
- [(set_attr "type" "neon_dup<q>,f_mcr,neon_dup<q>")
- (set_attr "length" "4")
- (set_attr "arch" "simd,fp,simd")]
-)
-
(define_expand "move_lo_quad_<mode>"
[(match_operand:VQMOV 0 "register_operand")
(match_operand:<VHALF> 1 "register_operand")]
"TARGET_SIMD"
{
- rtx zs = CONST0_RTX (<VHALF>mode);
- if (BYTES_BIG_ENDIAN)
- emit_insn (gen_move_lo_quad_internal_be_<mode> (operands[0], operands[1], zs));
- else
- emit_insn (gen_move_lo_quad_internal_<mode> (operands[0], operands[1], zs));
+ emit_insn (gen_aarch64_combine<Vhalf> (operands[0], operands[1],
+ CONST0_RTX (<VHALF>mode)));
DONE;
}
)
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-8.c b/gcc/testsuite/gcc.target/aarch64/vec-init-8.c
new file mode 100644
index 00000000000..18f8afe10f5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-8.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#include <arm_neon.h>
+
+int64x2_t f1(int64_t *ptr) {
+ int64_t x = *ptr;
+ asm volatile ("" ::: "memory");
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int64x2_t) { 0, x };
+ else
+ return (int64x2_t) { x, 0 };
+}
+
+/* { dg-final { scan-assembler {\tldr\td0, \[x0\]\n} } } */
--
2.25.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pushed 5/8] aarch64: Add more vec_combine patterns
2022-02-09 17:00 [pushed 0/8] aarch64: Fix regression in vec_init code quality Richard Sandiford
` (3 preceding siblings ...)
2022-02-09 17:01 ` [pushed 4/8] aarch64: Remove redundant vec_concat patterns Richard Sandiford
@ 2022-02-09 17:01 ` Richard Sandiford
2022-02-09 17:01 ` [pushed 6/8] aarch64: Add a general vec_concat expander Richard Sandiford
` (2 subsequent siblings)
7 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2022-02-09 17:01 UTC (permalink / raw)
To: gcc-patches
vec_combine is really one instruction on aarch64, provided that
the lowpart element is in the same register as the destination
vector. This patch adds patterns for that.
The patch fixes a regression from GCC 8. Before the patch:
int64x2_t s64q_1(int64_t a0, int64_t a1) {
if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
return (int64x2_t) { a1, a0 };
else
return (int64x2_t) { a0, a1 };
}
generated:
fmov d0, x0
ins v0.d[1], x1
ins v0.d[1], x1
ret
whereas GCC 8 generated the more respectable:
dup v0.2d, x0
ins v0.d[1], x1
ret
gcc/
* config/aarch64/predicates.md (aarch64_reg_or_mem_pair_operand):
New predicate.
* config/aarch64/aarch64-simd.md (*aarch64_combine_internal<mode>)
(*aarch64_combine_internal_be<mode>): New patterns.
gcc/testsuite/
* gcc.target/aarch64/vec-init-9.c: New test.
* gcc.target/aarch64/vec-init-10.c: Likewise.
* gcc.target/aarch64/vec-init-11.c: Likewise.
---
gcc/config/aarch64/aarch64-simd.md | 62 ++++
gcc/config/aarch64/predicates.md | 4 +
.../gcc.target/aarch64/vec-init-10.c | 15 +
.../gcc.target/aarch64/vec-init-11.c | 12 +
gcc/testsuite/gcc.target/aarch64/vec-init-9.c | 267 ++++++++++++++++++
5 files changed, 360 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-10.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-11.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-9.c
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index d6cd4c70fe7..ead80396e70 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4326,6 +4326,25 @@ (define_insn "load_pair_lanes<mode>"
[(set_attr "type" "neon_load1_1reg_q")]
)
+;; This STP pattern is a partial duplicate of the general vec_concat patterns
+;; below. The reason for having both of them is that the alternatives of
+;; the later patterns do not have consistent register preferences: the STP
+;; alternatives have no preference between GPRs and FPRs (and if anything,
+;; the GPR form is more natural for scalar integers) whereas the other
+;; alternatives *require* an FPR for operand 1 and prefer one for operand 2.
+;;
+;; Using "*" to hide the STP alternatives from the RA penalizes cases in
+;; which the destination was always memory. On the other hand, expressing
+;; the true preferences makes GPRs seem more palatable than they really are
+;; for register destinations.
+;;
+;; Despite that, we do still want the general form to have STP alternatives,
+;; in order to handle cases where a register destination is spilled.
+;;
+;; The best compromise therefore seemed to be to have a dedicated STP
+;; pattern to catch cases in which the destination was always memory.
+;; This dedicated pattern must come first.
+
(define_insn "store_pair_lanes<mode>"
[(set (match_operand:<VDBL> 0 "aarch64_mem_pair_lanes_operand" "=Umn, Umn")
(vec_concat:<VDBL>
@@ -4338,6 +4357,49 @@ (define_insn "store_pair_lanes<mode>"
[(set_attr "type" "neon_stp, store_16")]
)
+;; Form a vector whose least significant half comes from operand 1 and whose
+;; most significant half comes from operand 2. The register alternatives
+;; tie the least significant half to the same register as the destination,
+;; so that only the other half needs to be handled explicitly. For the
+;; reasons given above, the STP alternatives use ? for constraints that
+;; the register alternatives either don't accept or themselves disparage.
+
+(define_insn "*aarch64_combine_internal<mode>"
+ [(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, Umn, Umn")
+ (vec_concat:<VDBL>
+ (match_operand:VDC 1 "register_operand" "0, 0, 0, ?w, ?r")
+ (match_operand:VDC 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, w, ?r")))]
+ "TARGET_SIMD
+ && !BYTES_BIG_ENDIAN
+ && (register_operand (operands[0], <VDBL>mode)
+ || register_operand (operands[2], <MODE>mode))"
+ "@
+ ins\t%0.d[1], %2.d[0]
+ ins\t%0.d[1], %2
+ ld1\t{%0.d}[1], %2
+ stp\t%d1, %d2, %y0
+ stp\t%x1, %x2, %y0"
+ [(set_attr "type" "neon_ins_q, neon_from_gp_q, neon_load1_one_lane_q, neon_stp, store_16")]
+)
+
+(define_insn "*aarch64_combine_internal_be<mode>"
+ [(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, Umn, Umn")
+ (vec_concat:<VDBL>
+ (match_operand:VDC 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, ?w, ?r")
+ (match_operand:VDC 1 "register_operand" "0, 0, 0, ?w, ?r")))]
+ "TARGET_SIMD
+ && BYTES_BIG_ENDIAN
+ && (register_operand (operands[0], <VDBL>mode)
+ || register_operand (operands[2], <MODE>mode))"
+ "@
+ ins\t%0.d[1], %2.d[0]
+ ins\t%0.d[1], %2
+ ld1\t{%0.d}[1], %2
+ stp\t%d2, %d1, %y0
+ stp\t%x2, %x1, %y0"
+ [(set_attr "type" "neon_ins_q, neon_from_gp_q, neon_load1_one_lane_q, neon_stp, store_16")]
+)
+
;; In this insn, operand 1 should be low, and operand 2 the high part of the
;; dest vector.
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 7dc4c155ea8..c308015ac2c 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -254,6 +254,10 @@ (define_predicate "aarch64_mem_pair_lanes_operand"
false,
ADDR_QUERY_LDP_STP_N)")))
+(define_predicate "aarch64_reg_or_mem_pair_operand"
+ (ior (match_operand 0 "register_operand")
+ (match_operand 0 "aarch64_mem_pair_lanes_operand")))
+
(define_predicate "aarch64_prefetch_operand"
(match_test "aarch64_address_valid_for_prefetch_p (op, false)"))
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-10.c b/gcc/testsuite/gcc.target/aarch64/vec-init-10.c
new file mode 100644
index 00000000000..f5dd83b94b5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-10.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#include <arm_neon.h>
+
+int64x2_t f1(int64_t *x, int c) {
+ return c ? (int64x2_t) { x[0], x[2] } : (int64x2_t) { 0, 0 };
+}
+
+int64x2_t f2(int64_t *x, int i0, int i1, int c) {
+ return c ? (int64x2_t) { x[i0], x[i1] } : (int64x2_t) { 0, 0 };
+}
+
+/* { dg-final { scan-assembler-times {\t(?:ldr\td[0-9]+|ld1\t)} 4 } } */
+/* { dg-final { scan-assembler-not {\tldr\tx} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-11.c b/gcc/testsuite/gcc.target/aarch64/vec-init-11.c
new file mode 100644
index 00000000000..df242702c0c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-11.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#include <arm_neon.h>
+
+void f1(int64x2_t *res, int64_t *x, int c0, int c1) {
+ res[0] = (int64x2_t) { c0 ? x[0] : 0, c1 ? x[2] : 0 };
+}
+
+/* { dg-final { scan-assembler-times {\tldr\tx[0-9]+} 2 } } */
+/* { dg-final { scan-assembler {\tstp\tx[0-9]+, x[0-9]+} } } */
+/* { dg-final { scan-assembler-not {\tldr\td} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-9.c b/gcc/testsuite/gcc.target/aarch64/vec-init-9.c
new file mode 100644
index 00000000000..8f68e06a559
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-9.c
@@ -0,0 +1,267 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#include <arm_neon.h>
+
+void ext();
+
+/*
+** s64q_1:
+** fmov d0, x0
+** ins v0\.d\[1\], x1
+** ret
+*/
+int64x2_t s64q_1(int64_t a0, int64_t a1) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int64x2_t) { a1, a0 };
+ else
+ return (int64x2_t) { a0, a1 };
+}
+/*
+** s64q_2:
+** fmov d0, x0
+** ld1 {v0\.d}\[1\], \[x1\]
+** ret
+*/
+int64x2_t s64q_2(int64_t a0, int64_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int64x2_t) { ptr[0], a0 };
+ else
+ return (int64x2_t) { a0, ptr[0] };
+}
+/*
+** s64q_3:
+** ldr d0, \[x0\]
+** ins v0\.d\[1\], x1
+** ret
+*/
+int64x2_t s64q_3(int64_t *ptr, int64_t a1) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int64x2_t) { a1, ptr[0] };
+ else
+ return (int64x2_t) { ptr[0], a1 };
+}
+/*
+** s64q_4:
+** stp x1, x2, \[x0\]
+** ret
+*/
+void s64q_4(int64x2_t *res, int64_t a0, int64_t a1) {
+ res[0] = (int64x2_t) { a0, a1 };
+}
+/*
+** s64q_5:
+** stp x1, x2, \[x0, #?8\]
+** ret
+*/
+void s64q_5(uintptr_t res, int64_t a0, int64_t a1) {
+ *(int64x2_t *)(res + 8) = (int64x2_t) { a0, a1 };
+}
+/*
+** s64q_6:
+** ...
+** stp x0, x1, .*
+** ...
+** ldr q0, .*
+** ...
+** ret
+*/
+int64x2_t s64q_6(int64_t a0, int64_t a1) {
+ int64x2_t res = { a0, a1 };
+ ext ();
+ return res;
+}
+
+/*
+** f64q_1:
+** ins v0\.d\[1\], v1\.d\[0\]
+** ret
+*/
+float64x2_t f64q_1(float64_t a0, float64_t a1) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float64x2_t) { a1, a0 };
+ else
+ return (float64x2_t) { a0, a1 };
+}
+/*
+** f64q_2:
+** ld1 {v0\.d}\[1\], \[x0\]
+** ret
+*/
+float64x2_t f64q_2(float64_t a0, float64_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float64x2_t) { ptr[0], a0 };
+ else
+ return (float64x2_t) { a0, ptr[0] };
+}
+/*
+** f64q_3:
+** ldr d0, \[x0\]
+** ins v0\.d\[1\], v1\.d\[0\]
+** ret
+*/
+float64x2_t f64q_3(float64_t a0, float64_t a1, float64_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float64x2_t) { a1, ptr[0] };
+ else
+ return (float64x2_t) { ptr[0], a1 };
+}
+/*
+** f64q_4:
+** stp d0, d1, \[x0\]
+** ret
+*/
+void f64q_4(float64x2_t *res, float64_t a0, float64_t a1) {
+ res[0] = (float64x2_t) { a0, a1 };
+}
+/*
+** f64q_5:
+** stp d0, d1, \[x0, #?8\]
+** ret
+*/
+void f64q_5(uintptr_t res, float64_t a0, float64_t a1) {
+ *(float64x2_t *)(res + 8) = (float64x2_t) { a0, a1 };
+}
+/*
+** f64q_6:
+** ...
+** stp d0, d1, .*
+** ...
+** ldr q0, .*
+** ...
+** ret
+*/
+float64x2_t f64q_6(float64_t a0, float64_t a1) {
+ float64x2_t res = { a0, a1 };
+ ext ();
+ return res;
+}
+
+/*
+** s32q_1:
+** ins v0\.d\[1\], v1\.d\[0\]
+** ret
+*/
+int32x4_t s32q_1(int32x2_t a0, int32x2_t a1) {
+ return vcombine_s32 (a0, a1);
+}
+/*
+** s32q_2:
+** ld1 {v0\.d}\[1\], \[x0\]
+** ret
+*/
+int32x4_t s32q_2(int32x2_t a0, int32x2_t *ptr) {
+ return vcombine_s32 (a0, ptr[0]);
+}
+/*
+** s32q_3:
+** ldr d0, \[x0\]
+** ins v0\.d\[1\], v1\.d\[0\]
+** ret
+*/
+int32x4_t s32q_3(int32x2_t a0, int32x2_t a1, int32x2_t *ptr) {
+ return vcombine_s32 (ptr[0], a1);
+}
+/*
+** s32q_4:
+** stp d0, d1, \[x0\]
+** ret
+*/
+void s32q_4(int32x4_t *res, int32x2_t a0, int32x2_t a1) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ res[0] = vcombine_s32 (a1, a0);
+ else
+ res[0] = vcombine_s32 (a0, a1);
+}
+/*
+** s32q_5:
+** stp d0, d1, \[x0, #?8\]
+** ret
+*/
+void s32q_5(uintptr_t res, int32x2_t a0, int32x2_t a1) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ *(int32x4_t *)(res + 8) = vcombine_s32 (a1, a0);
+ else
+ *(int32x4_t *)(res + 8) = vcombine_s32 (a0, a1);
+}
+/*
+** s32q_6:
+** ...
+** stp d0, d1, .*
+** ...
+** ldr q0, .*
+** ...
+** ret
+*/
+int32x4_t s32q_6(int32x2_t a0, int32x2_t a1) {
+ int32x4_t res = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+ ? vcombine_s32 (a1, a0)
+ : vcombine_s32 (a0, a1));
+ ext ();
+ return res;
+}
+
+/*
+** f32q_1:
+** ins v0\.d\[1\], v1\.d\[0\]
+** ret
+*/
+float32x4_t f32q_1(float32x2_t a0, float32x2_t a1) {
+ return vcombine_f32 (a0, a1);
+}
+/*
+** f32q_2:
+** ld1 {v0\.d}\[1\], \[x0\]
+** ret
+*/
+float32x4_t f32q_2(float32x2_t a0, float32x2_t *ptr) {
+ return vcombine_f32 (a0, ptr[0]);
+}
+/*
+** f32q_3:
+** ldr d0, \[x0\]
+** ins v0\.d\[1\], v1\.d\[0\]
+** ret
+*/
+float32x4_t f32q_3(float32x2_t a0, float32x2_t a1, float32x2_t *ptr) {
+ return vcombine_f32 (ptr[0], a1);
+}
+/*
+** f32q_4:
+** stp d0, d1, \[x0\]
+** ret
+*/
+void f32q_4(float32x4_t *res, float32x2_t a0, float32x2_t a1) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ res[0] = vcombine_f32 (a1, a0);
+ else
+ res[0] = vcombine_f32 (a0, a1);
+}
+/*
+** f32q_5:
+** stp d0, d1, \[x0, #?8\]
+** ret
+*/
+void f32q_5(uintptr_t res, float32x2_t a0, float32x2_t a1) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ *(float32x4_t *)(res + 8) = vcombine_f32 (a1, a0);
+ else
+ *(float32x4_t *)(res + 8) = vcombine_f32 (a0, a1);
+}
+/*
+** f32q_6:
+** ...
+** stp d0, d1, .*
+** ...
+** ldr q0, .*
+** ...
+** ret
+*/
+float32x4_t f32q_6(float32x2_t a0, float32x2_t a1) {
+ float32x4_t res = (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+ ? vcombine_f32 (a1, a0)
+ : vcombine_f32 (a0, a1));
+ ext ();
+ return res;
+}
--
2.25.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pushed 6/8] aarch64: Add a general vec_concat expander
2022-02-09 17:00 [pushed 0/8] aarch64: Fix regression in vec_init code quality Richard Sandiford
` (4 preceding siblings ...)
2022-02-09 17:01 ` [pushed 5/8] aarch64: Add more vec_combine patterns Richard Sandiford
@ 2022-02-09 17:01 ` Richard Sandiford
2022-02-09 17:01 ` [pushed 7/8] aarch64: Remove move_lo/hi_quad expanders Richard Sandiford
2022-02-09 17:02 ` [pushed 8/8] aarch64: Extend vec_concat patterns to 8-byte vectors Richard Sandiford
7 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2022-02-09 17:01 UTC (permalink / raw)
To: gcc-patches
After previous patches, we have a (mostly new) group of vec_concat
patterns as well as vestiges of the old move_lo/hi_quad patterns.
(A previous patch removed the move_lo_quad insns, but we still
have the move_hi_quad insns and both sets of expanders.)
This patch is the first of two to remove the old move_lo/hi_quad
stuff. It isn't technically a regression fix, but it seemed
better to make the changes now rather than leave things in
a half-finished and inconsistent state.
This patch defines an aarch64_vec_concat expander that coerces the
element operands into a valid form, including the ones added by the
previous patch. This in turn lets us get rid of one move_lo/hi_quad
pair.
As a side-effect, it also means that vcombines of 2 vectors make
better use of the available forms, like vec_inits of 2 scalars
already do.
gcc/
* config/aarch64/aarch64-protos.h (aarch64_split_simd_combine):
Delete.
* config/aarch64/aarch64-simd.md (@aarch64_combinez<mode>): Rename
to...
(*aarch64_combinez<mode>): ...this.
(@aarch64_combinez_be<mode>): Rename to...
(*aarch64_combinez_be<mode>): ...this.
(@aarch64_vec_concat<mode>): New expander.
(aarch64_combine<mode>): Use it.
(@aarch64_simd_combine<mode>): Delete.
* config/aarch64/aarch64.cc (aarch64_split_simd_combine): Delete.
(aarch64_expand_vector_init): Use aarch64_vec_concat.
gcc/testsuite/
* gcc.target/aarch64/vec-init-12.c: New test.
---
gcc/config/aarch64/aarch64-protos.h | 2 -
gcc/config/aarch64/aarch64-simd.md | 76 ++++++++++++-------
gcc/config/aarch64/aarch64.cc | 55 ++------------
.../gcc.target/aarch64/vec-init-12.c | 65 ++++++++++++++++
4 files changed, 122 insertions(+), 76 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-12.c
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b75ed35635b..392efa0b74d 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -925,8 +925,6 @@ bool aarch64_split_128bit_move_p (rtx, rtx);
bool aarch64_mov128_immediate (rtx);
-void aarch64_split_simd_combine (rtx, rtx, rtx);
-
void aarch64_split_simd_move (rtx, rtx);
/* Check for a legitimate floating point constant for FMOV. */
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ead80396e70..7acde0dd099 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4403,7 +4403,7 @@ (define_insn "*aarch64_combine_internal_be<mode>"
;; In this insn, operand 1 should be low, and operand 2 the high part of the
;; dest vector.
-(define_insn "@aarch64_combinez<mode>"
+(define_insn "*aarch64_combinez<mode>"
[(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w")
(vec_concat:<VDBL>
(match_operand:VDC 1 "nonimmediate_operand" "w,?r,m")
@@ -4417,7 +4417,7 @@ (define_insn "@aarch64_combinez<mode>"
(set_attr "arch" "simd,fp,simd")]
)
-(define_insn "@aarch64_combinez_be<mode>"
+(define_insn "*aarch64_combinez_be<mode>"
[(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w")
(vec_concat:<VDBL>
(match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero")
@@ -4431,38 +4431,62 @@ (define_insn "@aarch64_combinez_be<mode>"
(set_attr "arch" "simd,fp,simd")]
)
-(define_expand "aarch64_combine<mode>"
- [(match_operand:<VDBL> 0 "register_operand")
- (match_operand:VDC 1 "register_operand")
- (match_operand:VDC 2 "aarch64_simd_reg_or_zero")]
+;; Form a vector whose first half (in array order) comes from operand 1
+;; and whose second half (in array order) comes from operand 2.
+;; This operand order follows the RTL vec_concat operation.
+(define_expand "@aarch64_vec_concat<mode>"
+ [(set (match_operand:<VDBL> 0 "register_operand")
+ (vec_concat:<VDBL>
+ (match_operand:VDC 1 "general_operand")
+ (match_operand:VDC 2 "general_operand")))]
"TARGET_SIMD"
{
- if (operands[2] == CONST0_RTX (<MODE>mode))
+ int lo = BYTES_BIG_ENDIAN ? 2 : 1;
+ int hi = BYTES_BIG_ENDIAN ? 1 : 2;
+
+ if (MEM_P (operands[1])
+ && MEM_P (operands[2])
+ && aarch64_mergeable_load_pair_p (<VDBL>mode, operands[1], operands[2]))
+ /* Use load_pair_lanes<mode>. */
+ ;
+ else if (operands[hi] == CONST0_RTX (<MODE>mode))
{
- if (BYTES_BIG_ENDIAN)
- emit_insn (gen_aarch64_combinez_be<mode> (operands[0], operands[1],
- operands[2]));
- else
- emit_insn (gen_aarch64_combinez<mode> (operands[0], operands[1],
- operands[2]));
+ /* Use *aarch64_combinez<mode>. */
+ if (!nonimmediate_operand (operands[lo], <MODE>mode))
+ operands[lo] = force_reg (<MODE>mode, operands[lo]);
}
else
- aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
- DONE;
-}
-)
+ {
+ /* Use *aarch64_combine_general<mode>. */
+ operands[lo] = force_reg (<MODE>mode, operands[lo]);
+ if (!aarch64_simd_nonimmediate_operand (operands[hi], <MODE>mode))
+ {
+ if (MEM_P (operands[hi]))
+ {
+ rtx addr = force_reg (Pmode, XEXP (operands[hi], 0));
+ operands[hi] = replace_equiv_address (operands[hi], addr);
+ }
+ else
+ operands[hi] = force_reg (<MODE>mode, operands[hi]);
+ }
+ }
+})
-(define_expand "@aarch64_simd_combine<mode>"
+;; Form a vector whose least significant half comes from operand 1 and whose
+;; most significant half comes from operand 2. This operand order follows
+;; arm_neon.h vcombine* intrinsics.
+(define_expand "aarch64_combine<mode>"
[(match_operand:<VDBL> 0 "register_operand")
- (match_operand:VDC 1 "register_operand")
- (match_operand:VDC 2 "register_operand")]
+ (match_operand:VDC 1 "general_operand")
+ (match_operand:VDC 2 "general_operand")]
"TARGET_SIMD"
- {
- emit_insn (gen_move_lo_quad_<Vdbl> (operands[0], operands[1]));
- emit_insn (gen_move_hi_quad_<Vdbl> (operands[0], operands[2]));
- DONE;
- }
-[(set_attr "type" "multiple")]
+{
+ if (BYTES_BIG_ENDIAN)
+ std::swap (operands[1], operands[2]);
+ emit_insn (gen_aarch64_vec_concat<mode> (operands[0], operands[1],
+ operands[2]));
+ DONE;
+}
)
;; <su><addsub>l<q>.
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index c47543aebf3..af42d1bedfe 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -4239,23 +4239,6 @@ aarch64_split_128bit_move_p (rtx dst, rtx src)
return true;
}
-/* Split a complex SIMD combine. */
-
-void
-aarch64_split_simd_combine (rtx dst, rtx src1, rtx src2)
-{
- machine_mode src_mode = GET_MODE (src1);
- machine_mode dst_mode = GET_MODE (dst);
-
- gcc_assert (VECTOR_MODE_P (dst_mode));
- gcc_assert (register_operand (dst, dst_mode)
- && register_operand (src1, src_mode)
- && register_operand (src2, src_mode));
-
- emit_insn (gen_aarch64_simd_combine (src_mode, dst, src1, src2));
- return;
-}
-
/* Split a complex SIMD move. */
void
@@ -20941,37 +20924,13 @@ aarch64_expand_vector_init (rtx target, rtx vals)
of mode N in VALS and we must put their concatentation into TARGET. */
if (XVECLEN (vals, 0) == 2 && VECTOR_MODE_P (GET_MODE (XVECEXP (vals, 0, 0))))
{
- gcc_assert (known_eq (GET_MODE_SIZE (mode),
- 2 * GET_MODE_SIZE (GET_MODE (XVECEXP (vals, 0, 0)))));
- rtx lo = XVECEXP (vals, 0, 0);
- rtx hi = XVECEXP (vals, 0, 1);
- machine_mode narrow_mode = GET_MODE (lo);
- gcc_assert (GET_MODE_INNER (narrow_mode) == inner_mode);
- gcc_assert (narrow_mode == GET_MODE (hi));
-
- /* When we want to concatenate a half-width vector with zeroes we can
- use the aarch64_combinez[_be] patterns. Just make sure that the
- zeroes are in the right half. */
- if (BYTES_BIG_ENDIAN
- && aarch64_simd_imm_zero (lo, narrow_mode)
- && general_operand (hi, narrow_mode))
- emit_insn (gen_aarch64_combinez_be (narrow_mode, target, hi, lo));
- else if (!BYTES_BIG_ENDIAN
- && aarch64_simd_imm_zero (hi, narrow_mode)
- && general_operand (lo, narrow_mode))
- emit_insn (gen_aarch64_combinez (narrow_mode, target, lo, hi));
- else
- {
- /* Else create the two half-width registers and combine them. */
- if (!REG_P (lo))
- lo = force_reg (GET_MODE (lo), lo);
- if (!REG_P (hi))
- hi = force_reg (GET_MODE (hi), hi);
-
- if (BYTES_BIG_ENDIAN)
- std::swap (lo, hi);
- emit_insn (gen_aarch64_simd_combine (narrow_mode, target, lo, hi));
- }
+ machine_mode narrow_mode = GET_MODE (XVECEXP (vals, 0, 0));
+ gcc_assert (GET_MODE_INNER (narrow_mode) == inner_mode
+ && known_eq (GET_MODE_SIZE (mode),
+ 2 * GET_MODE_SIZE (narrow_mode)));
+ emit_insn (gen_aarch64_vec_concat (narrow_mode, target,
+ XVECEXP (vals, 0, 0),
+ XVECEXP (vals, 0, 1)));
return;
}
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-12.c b/gcc/testsuite/gcc.target/aarch64/vec-init-12.c
new file mode 100644
index 00000000000..c287478e2d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-12.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#include <arm_neon.h>
+
+/*
+** s32_1:
+** ldr q0, \[x0\]
+** ret
+*/
+int32x4_t s32_1(int32x2_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return vcombine_s32 (ptr[1], ptr[0]);
+ else
+ return vcombine_s32 (ptr[0], ptr[1]);
+}
+/*
+** s32_2:
+** add x([0-9])+, x0, #?8
+** ld1 {v0\.d}\[1\], \[x\1\]
+** ret
+*/
+int32x4_t s32_2(int32x2_t a0, int32x2_t *ptr) {
+ return vcombine_s32 (a0, ptr[1]);
+}
+/*
+** s32_3:
+** ldr d0, \[x0\], #?16
+** ld1 {v0\.d}\[1\], \[x0\]
+** ret
+*/
+int32x4_t s32_3(int32x2_t *ptr) {
+ return vcombine_s32 (ptr[0], ptr[2]);
+}
+
+/*
+** f32_1:
+** ldr q0, \[x0\]
+** ret
+*/
+float32x4_t f32_1(float32x2_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return vcombine_f32 (ptr[1], ptr[0]);
+ else
+ return vcombine_f32 (ptr[0], ptr[1]);
+}
+/*
+** f32_2:
+** add x([0-9])+, x0, #?8
+** ld1 {v0\.d}\[1\], \[x\1\]
+** ret
+*/
+float32x4_t f32_2(float32x2_t a0, float32x2_t *ptr) {
+ return vcombine_f32 (a0, ptr[1]);
+}
+/*
+** f32_3:
+** ldr d0, \[x0\], #?16
+** ld1 {v0\.d}\[1\], \[x0\]
+** ret
+*/
+float32x4_t f32_3(float32x2_t *ptr) {
+ return vcombine_f32 (ptr[0], ptr[2]);
+}
--
2.25.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pushed 7/8] aarch64: Remove move_lo/hi_quad expanders
2022-02-09 17:00 [pushed 0/8] aarch64: Fix regression in vec_init code quality Richard Sandiford
` (5 preceding siblings ...)
2022-02-09 17:01 ` [pushed 6/8] aarch64: Add a general vec_concat expander Richard Sandiford
@ 2022-02-09 17:01 ` Richard Sandiford
2022-02-09 17:02 ` [pushed 8/8] aarch64: Extend vec_concat patterns to 8-byte vectors Richard Sandiford
7 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2022-02-09 17:01 UTC (permalink / raw)
To: gcc-patches
This patch is the second of two to remove the old
move_lo/hi_quad expanders and move_hi_quad insns.
gcc/
* config/aarch64/aarch64-simd.md (@aarch64_split_simd_mov<mode>):
Use aarch64_combine instead of move_lo/hi_quad. Tabify.
(move_lo_quad_<mode>, aarch64_simd_move_hi_quad_<mode>): Delete.
(aarch64_simd_move_hi_quad_be_<mode>, move_hi_quad_<mode>): Delete.
(vec_pack_trunc_<mode>): Take general_operand elements and use
aarch64_combine rather than move_lo/hi_quad to combine them.
(vec_pack_trunc_df): Likewise.
---
gcc/config/aarch64/aarch64-simd.md | 111 +++++------------------------
1 file changed, 18 insertions(+), 93 deletions(-)
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 7acde0dd099..ef6e772503d 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -272,7 +272,7 @@ (define_split
(define_expand "@aarch64_split_simd_mov<mode>"
[(set (match_operand:VQMOV 0)
- (match_operand:VQMOV 1))]
+ (match_operand:VQMOV 1))]
"TARGET_SIMD"
{
rtx dst = operands[0];
@@ -280,23 +280,22 @@ (define_expand "@aarch64_split_simd_mov<mode>"
if (GP_REGNUM_P (REGNO (src)))
{
- rtx src_low_part = gen_lowpart (<VHALF>mode, src);
- rtx src_high_part = gen_highpart (<VHALF>mode, src);
+ rtx src_low_part = gen_lowpart (<VHALF>mode, src);
+ rtx src_high_part = gen_highpart (<VHALF>mode, src);
+ rtx dst_low_part = gen_lowpart (<VHALF>mode, dst);
- emit_insn
- (gen_move_lo_quad_<mode> (dst, src_low_part));
- emit_insn
- (gen_move_hi_quad_<mode> (dst, src_high_part));
+ emit_move_insn (dst_low_part, src_low_part);
+ emit_insn (gen_aarch64_combine<Vhalf> (dst, dst_low_part,
+ src_high_part));
}
-
else
{
- rtx dst_low_part = gen_lowpart (<VHALF>mode, dst);
- rtx dst_high_part = gen_highpart (<VHALF>mode, dst);
+ rtx dst_low_part = gen_lowpart (<VHALF>mode, dst);
+ rtx dst_high_part = gen_highpart (<VHALF>mode, dst);
rtx lo = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
rtx hi = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
- emit_insn (gen_aarch64_get_half<mode> (dst_low_part, src, lo));
- emit_insn (gen_aarch64_get_half<mode> (dst_high_part, src, hi));
+ emit_insn (gen_aarch64_get_half<mode> (dst_low_part, src, lo));
+ emit_insn (gen_aarch64_get_half<mode> (dst_high_part, src, hi));
}
DONE;
}
@@ -1580,69 +1579,6 @@ (define_insn "aarch64_<optab>p<mode>"
;; What that means, is that the RTL descriptions of the below patterns
;; need to change depending on endianness.
-;; Move to the low architectural bits of the register.
-;; On little-endian this is { operand, zeroes }
-;; On big-endian this is { zeroes, operand }
-
-(define_expand "move_lo_quad_<mode>"
- [(match_operand:VQMOV 0 "register_operand")
- (match_operand:<VHALF> 1 "register_operand")]
- "TARGET_SIMD"
-{
- emit_insn (gen_aarch64_combine<Vhalf> (operands[0], operands[1],
- CONST0_RTX (<VHALF>mode)));
- DONE;
-}
-)
-
-;; Move operand1 to the high architectural bits of the register, keeping
-;; the low architectural bits of operand2.
-;; For little-endian this is { operand2, operand1 }
-;; For big-endian this is { operand1, operand2 }
-
-(define_insn "aarch64_simd_move_hi_quad_<mode>"
- [(set (match_operand:VQMOV 0 "register_operand" "+w,w")
- (vec_concat:VQMOV
- (vec_select:<VHALF>
- (match_dup 0)
- (match_operand:VQMOV 2 "vect_par_cnst_lo_half" ""))
- (match_operand:<VHALF> 1 "register_operand" "w,r")))]
- "TARGET_SIMD && !BYTES_BIG_ENDIAN"
- "@
- ins\\t%0.d[1], %1.d[0]
- ins\\t%0.d[1], %1"
- [(set_attr "type" "neon_ins")]
-)
-
-(define_insn "aarch64_simd_move_hi_quad_be_<mode>"
- [(set (match_operand:VQMOV 0 "register_operand" "+w,w")
- (vec_concat:VQMOV
- (match_operand:<VHALF> 1 "register_operand" "w,r")
- (vec_select:<VHALF>
- (match_dup 0)
- (match_operand:VQMOV 2 "vect_par_cnst_lo_half" ""))))]
- "TARGET_SIMD && BYTES_BIG_ENDIAN"
- "@
- ins\\t%0.d[1], %1.d[0]
- ins\\t%0.d[1], %1"
- [(set_attr "type" "neon_ins")]
-)
-
-(define_expand "move_hi_quad_<mode>"
- [(match_operand:VQMOV 0 "register_operand")
- (match_operand:<VHALF> 1 "register_operand")]
- "TARGET_SIMD"
-{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
- if (BYTES_BIG_ENDIAN)
- emit_insn (gen_aarch64_simd_move_hi_quad_be_<mode> (operands[0],
- operands[1], p));
- else
- emit_insn (gen_aarch64_simd_move_hi_quad_<mode> (operands[0],
- operands[1], p));
- DONE;
-})
-
;; Narrowing operations.
(define_insn "aarch64_xtn<mode>_insn_le"
@@ -1743,16 +1679,12 @@ (define_insn "*aarch64_narrow_trunc<mode>"
(define_expand "vec_pack_trunc_<mode>"
[(match_operand:<VNARROWD> 0 "register_operand")
- (match_operand:VDN 1 "register_operand")
- (match_operand:VDN 2 "register_operand")]
+ (match_operand:VDN 1 "general_operand")
+ (match_operand:VDN 2 "general_operand")]
"TARGET_SIMD"
{
rtx tempreg = gen_reg_rtx (<VDBL>mode);
- int lo = BYTES_BIG_ENDIAN ? 2 : 1;
- int hi = BYTES_BIG_ENDIAN ? 1 : 2;
-
- emit_insn (gen_move_lo_quad_<Vdbl> (tempreg, operands[lo]));
- emit_insn (gen_move_hi_quad_<Vdbl> (tempreg, operands[hi]));
+ emit_insn (gen_aarch64_vec_concat<mode> (tempreg, operands[1], operands[2]));
emit_insn (gen_trunc<Vdbl><Vnarrowd>2 (operands[0], tempreg));
DONE;
})
@@ -3402,20 +3334,13 @@ (define_expand "vec_pack_trunc_v2df"
(define_expand "vec_pack_trunc_df"
[(set (match_operand:V2SF 0 "register_operand")
- (vec_concat:V2SF
- (float_truncate:SF
- (match_operand:DF 1 "register_operand"))
- (float_truncate:SF
- (match_operand:DF 2 "register_operand"))
- ))]
+ (vec_concat:V2SF
+ (float_truncate:SF (match_operand:DF 1 "general_operand"))
+ (float_truncate:SF (match_operand:DF 2 "general_operand"))))]
"TARGET_SIMD"
{
rtx tmp = gen_reg_rtx (V2SFmode);
- int lo = BYTES_BIG_ENDIAN ? 2 : 1;
- int hi = BYTES_BIG_ENDIAN ? 1 : 2;
-
- emit_insn (gen_move_lo_quad_v2df (tmp, operands[lo]));
- emit_insn (gen_move_hi_quad_v2df (tmp, operands[hi]));
+ emit_insn (gen_aarch64_vec_concatdf (tmp, operands[1], operands[2]));
emit_insn (gen_aarch64_float_truncate_lo_v2sf (operands[0], tmp));
DONE;
}
--
2.25.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [pushed 8/8] aarch64: Extend vec_concat patterns to 8-byte vectors
2022-02-09 17:00 [pushed 0/8] aarch64: Fix regression in vec_init code quality Richard Sandiford
` (6 preceding siblings ...)
2022-02-09 17:01 ` [pushed 7/8] aarch64: Remove move_lo/hi_quad expanders Richard Sandiford
@ 2022-02-09 17:02 ` Richard Sandiford
7 siblings, 0 replies; 9+ messages in thread
From: Richard Sandiford @ 2022-02-09 17:02 UTC (permalink / raw)
To: gcc-patches
This patch extends the previous support for 16-byte vec_concat
so that it supports pairs of 4-byte elements. This too isn't
strictly a regression fix, since the 8-byte forms weren't affected
by the same problems as the 16-byte forms, but it leaves things in
a more consistent state.
gcc/
* config/aarch64/iterators.md (VDCSIF): New mode iterator.
(VDBL): Handle SF.
(single_wx, single_type, single_dtype, dblq): New mode attributes.
* config/aarch64/aarch64-simd.md (load_pair_lanes<mode>): Extend
from VDC to VDCSIF.
(store_pair_lanes<mode>): Likewise.
(*aarch64_combine_internal<mode>): Likewise.
(*aarch64_combine_internal_be<mode>): Likewise.
(*aarch64_combinez<mode>): Likewise.
(*aarch64_combinez_be<mode>): Likewise.
* config/aarch64/aarch64.cc (aarch64_classify_address): Handle
8-byte modes for ADDR_QUERY_LDP_STP_N.
(aarch64_print_operand): Likewise for %y.
gcc/testsuite/
* gcc.target/aarch64/vec-init-13.c: New test.
* gcc.target/aarch64/vec-init-14.c: Likewise.
* gcc.target/aarch64/vec-init-15.c: Likewise.
* gcc.target/aarch64/vec-init-16.c: Likewise.
* gcc.target/aarch64/vec-init-17.c: Likewise.
---
gcc/config/aarch64/aarch64-simd.md | 72 +++++-----
gcc/config/aarch64/aarch64.cc | 16 ++-
gcc/config/aarch64/iterators.md | 38 +++++-
.../gcc.target/aarch64/vec-init-13.c | 123 ++++++++++++++++++
.../gcc.target/aarch64/vec-init-14.c | 123 ++++++++++++++++++
.../gcc.target/aarch64/vec-init-15.c | 15 +++
.../gcc.target/aarch64/vec-init-16.c | 12 ++
.../gcc.target/aarch64/vec-init-17.c | 73 +++++++++++
8 files changed, 430 insertions(+), 42 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-13.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-14.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-15.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-16.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/vec-init-17.c
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index ef6e772503d..18733428f3f 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -4243,12 +4243,12 @@ (define_insn_and_split "aarch64_get_lane<mode>"
(define_insn "load_pair_lanes<mode>"
[(set (match_operand:<VDBL> 0 "register_operand" "=w")
(vec_concat:<VDBL>
- (match_operand:VDC 1 "memory_operand" "Utq")
- (match_operand:VDC 2 "memory_operand" "m")))]
+ (match_operand:VDCSIF 1 "memory_operand" "Utq")
+ (match_operand:VDCSIF 2 "memory_operand" "m")))]
"TARGET_SIMD
&& aarch64_mergeable_load_pair_p (<VDBL>mode, operands[1], operands[2])"
- "ldr\\t%q0, %1"
- [(set_attr "type" "neon_load1_1reg_q")]
+ "ldr\\t%<single_dtype>0, %1"
+ [(set_attr "type" "neon_load1_1reg<dblq>")]
)
;; This STP pattern is a partial duplicate of the general vec_concat patterns
@@ -4273,12 +4273,12 @@ (define_insn "load_pair_lanes<mode>"
(define_insn "store_pair_lanes<mode>"
[(set (match_operand:<VDBL> 0 "aarch64_mem_pair_lanes_operand" "=Umn, Umn")
(vec_concat:<VDBL>
- (match_operand:VDC 1 "register_operand" "w, r")
- (match_operand:VDC 2 "register_operand" "w, r")))]
+ (match_operand:VDCSIF 1 "register_operand" "w, r")
+ (match_operand:VDCSIF 2 "register_operand" "w, r")))]
"TARGET_SIMD"
"@
- stp\\t%d1, %d2, %y0
- stp\\t%x1, %x2, %y0"
+ stp\t%<single_type>1, %<single_type>2, %y0
+ stp\t%<single_wx>1, %<single_wx>2, %y0"
[(set_attr "type" "neon_stp, store_16")]
)
@@ -4292,37 +4292,37 @@ (define_insn "store_pair_lanes<mode>"
(define_insn "*aarch64_combine_internal<mode>"
[(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, Umn, Umn")
(vec_concat:<VDBL>
- (match_operand:VDC 1 "register_operand" "0, 0, 0, ?w, ?r")
- (match_operand:VDC 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, w, ?r")))]
+ (match_operand:VDCSIF 1 "register_operand" "0, 0, 0, ?w, ?r")
+ (match_operand:VDCSIF 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, w, ?r")))]
"TARGET_SIMD
&& !BYTES_BIG_ENDIAN
&& (register_operand (operands[0], <VDBL>mode)
|| register_operand (operands[2], <MODE>mode))"
"@
- ins\t%0.d[1], %2.d[0]
- ins\t%0.d[1], %2
- ld1\t{%0.d}[1], %2
- stp\t%d1, %d2, %y0
- stp\t%x1, %x2, %y0"
- [(set_attr "type" "neon_ins_q, neon_from_gp_q, neon_load1_one_lane_q, neon_stp, store_16")]
+ ins\t%0.<single_type>[1], %2.<single_type>[0]
+ ins\t%0.<single_type>[1], %<single_wx>2
+ ld1\t{%0.<single_type>}[1], %2
+ stp\t%<single_type>1, %<single_type>2, %y0
+ stp\t%<single_wx>1, %<single_wx>2, %y0"
+ [(set_attr "type" "neon_ins<dblq>, neon_from_gp<dblq>, neon_load1_one_lane<dblq>, neon_stp, store_16")]
)
(define_insn "*aarch64_combine_internal_be<mode>"
[(set (match_operand:<VDBL> 0 "aarch64_reg_or_mem_pair_operand" "=w, w, w, Umn, Umn")
(vec_concat:<VDBL>
- (match_operand:VDC 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, ?w, ?r")
- (match_operand:VDC 1 "register_operand" "0, 0, 0, ?w, ?r")))]
+ (match_operand:VDCSIF 2 "aarch64_simd_nonimmediate_operand" "w, ?r, Utv, ?w, ?r")
+ (match_operand:VDCSIF 1 "register_operand" "0, 0, 0, ?w, ?r")))]
"TARGET_SIMD
&& BYTES_BIG_ENDIAN
&& (register_operand (operands[0], <VDBL>mode)
|| register_operand (operands[2], <MODE>mode))"
"@
- ins\t%0.d[1], %2.d[0]
- ins\t%0.d[1], %2
- ld1\t{%0.d}[1], %2
- stp\t%d2, %d1, %y0
- stp\t%x2, %x1, %y0"
- [(set_attr "type" "neon_ins_q, neon_from_gp_q, neon_load1_one_lane_q, neon_stp, store_16")]
+ ins\t%0.<single_type>[1], %2.<single_type>[0]
+ ins\t%0.<single_type>[1], %<single_wx>2
+ ld1\t{%0.<single_type>}[1], %2
+ stp\t%<single_type>2, %<single_type>1, %y0
+ stp\t%<single_wx>2, %<single_wx>1, %y0"
+ [(set_attr "type" "neon_ins<dblq>, neon_from_gp<dblq>, neon_load1_one_lane<dblq>, neon_stp, store_16")]
)
;; In this insn, operand 1 should be low, and operand 2 the high part of the
@@ -4331,13 +4331,13 @@ (define_insn "*aarch64_combine_internal_be<mode>"
(define_insn "*aarch64_combinez<mode>"
[(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w")
(vec_concat:<VDBL>
- (match_operand:VDC 1 "nonimmediate_operand" "w,?r,m")
- (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero")))]
+ (match_operand:VDCSIF 1 "nonimmediate_operand" "w,?r,m")
+ (match_operand:VDCSIF 2 "aarch64_simd_or_scalar_imm_zero")))]
"TARGET_SIMD && !BYTES_BIG_ENDIAN"
"@
- mov\\t%0.8b, %1.8b
- fmov\t%d0, %1
- ldr\\t%d0, %1"
+ fmov\\t%<single_type>0, %<single_type>1
+ fmov\t%<single_type>0, %<single_wx>1
+ ldr\\t%<single_type>0, %1"
[(set_attr "type" "neon_move<q>, neon_from_gp, neon_load1_1reg")
(set_attr "arch" "simd,fp,simd")]
)
@@ -4345,13 +4345,13 @@ (define_insn "*aarch64_combinez<mode>"
(define_insn "*aarch64_combinez_be<mode>"
[(set (match_operand:<VDBL> 0 "register_operand" "=w,w,w")
(vec_concat:<VDBL>
- (match_operand:VDC 2 "aarch64_simd_or_scalar_imm_zero")
- (match_operand:VDC 1 "nonimmediate_operand" "w,?r,m")))]
+ (match_operand:VDCSIF 2 "aarch64_simd_or_scalar_imm_zero")
+ (match_operand:VDCSIF 1 "nonimmediate_operand" "w,?r,m")))]
"TARGET_SIMD && BYTES_BIG_ENDIAN"
"@
- mov\\t%0.8b, %1.8b
- fmov\t%d0, %1
- ldr\\t%d0, %1"
+ fmov\\t%<single_type>0, %<single_type>1
+ fmov\t%<single_type>0, %<single_wx>1
+ ldr\\t%<single_type>0, %1"
[(set_attr "type" "neon_move<q>, neon_from_gp, neon_load1_1reg")
(set_attr "arch" "simd,fp,simd")]
)
@@ -4362,8 +4362,8 @@ (define_insn "*aarch64_combinez_be<mode>"
(define_expand "@aarch64_vec_concat<mode>"
[(set (match_operand:<VDBL> 0 "register_operand")
(vec_concat:<VDBL>
- (match_operand:VDC 1 "general_operand")
- (match_operand:VDC 2 "general_operand")))]
+ (match_operand:VDCSIF 1 "general_operand")
+ (match_operand:VDCSIF 2 "general_operand")))]
"TARGET_SIMD"
{
int lo = BYTES_BIG_ENDIAN ? 2 : 1;
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index af42d1bedfe..7bb97bd48e4 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -9922,9 +9922,15 @@ aarch64_classify_address (struct aarch64_address_info *info,
/* If we are dealing with ADDR_QUERY_LDP_STP_N that means the incoming mode
corresponds to the actual size of the memory being loaded/stored and the
mode of the corresponding addressing mode is half of that. */
- if (type == ADDR_QUERY_LDP_STP_N
- && known_eq (GET_MODE_SIZE (mode), 16))
- mode = DFmode;
+ if (type == ADDR_QUERY_LDP_STP_N)
+ {
+ if (known_eq (GET_MODE_SIZE (mode), 16))
+ mode = DFmode;
+ else if (known_eq (GET_MODE_SIZE (mode), 8))
+ mode = SFmode;
+ else
+ return false;
+ }
bool allow_reg_index_p = (!load_store_pair_p
&& ((vec_flags == 0
@@ -11404,7 +11410,9 @@ aarch64_print_operand (FILE *f, rtx x, int code)
machine_mode mode = GET_MODE (x);
if (!MEM_P (x)
- || (code == 'y' && maybe_ne (GET_MODE_SIZE (mode), 16)))
+ || (code == 'y'
+ && maybe_ne (GET_MODE_SIZE (mode), 8)
+ && maybe_ne (GET_MODE_SIZE (mode), 16)))
{
output_operand_lossage ("invalid operand for '%%%c'", code);
return;
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index a0c02e4ac15..88067a3536a 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -236,6 +236,9 @@ (define_mode_iterator VQW [V16QI V8HI V4SI])
;; Double vector modes for combines.
(define_mode_iterator VDC [V8QI V4HI V4BF V4HF V2SI V2SF DI DF])
+;; VDC plus SI and SF.
+(define_mode_iterator VDCSIF [V8QI V4HI V4BF V4HF V2SI V2SF SI SF DI DF])
+
;; Polynomial modes for vector combines.
(define_mode_iterator VDC_P [V8QI V4HI DI])
@@ -1436,8 +1439,8 @@ (define_mode_attr Vhalf [(V8QI "v4qi") (V16QI "v8qi")
(define_mode_attr VDBL [(V8QI "V16QI") (V4HI "V8HI")
(V4HF "V8HF") (V4BF "V8BF")
(V2SI "V4SI") (V2SF "V4SF")
- (SI "V2SI") (DI "V2DI")
- (DF "V2DF")])
+ (SI "V2SI") (SF "V2SF")
+ (DI "V2DI") (DF "V2DF")])
;; Register suffix for double-length mode.
(define_mode_attr Vdtype [(V4HF "8h") (V2SF "4s")])
@@ -1557,6 +1560,30 @@ (define_mode_attr Vhalftype [(V16QI "8b") (V8HI "4h")
(V4SI "2s") (V8HF "4h")
(V4SF "2s")])
+;; Whether a mode fits in W or X registers (i.e. "w" for 32-bit modes
+;; and "x" for 64-bit modes).
+(define_mode_attr single_wx [(SI "w") (SF "w")
+ (V8QI "x") (V4HI "x")
+ (V4HF "x") (V4BF "x")
+ (V2SI "x") (V2SF "x")
+ (DI "x") (DF "x")])
+
+;; Whether a mode fits in S or D registers (i.e. "s" for 32-bit modes
+;; and "d" for 64-bit modes).
+(define_mode_attr single_type [(SI "s") (SF "s")
+ (V8QI "d") (V4HI "d")
+ (V4HF "d") (V4BF "d")
+ (V2SI "d") (V2SF "d")
+ (DI "d") (DF "d")])
+
+;; Whether a double-width mode fits in D or Q registers (i.e. "d" for
+;; 32-bit modes and "q" for 64-bit modes).
+(define_mode_attr single_dtype [(SI "d") (SF "d")
+ (V8QI "q") (V4HI "q")
+ (V4HF "q") (V4BF "q")
+ (V2SI "q") (V2SF "q")
+ (DI "q") (DF "q")])
+
;; Define corresponding core/FP element mode for each vector mode.
(define_mode_attr vw [(V8QI "w") (V16QI "w")
(V4HI "w") (V8HI "w")
@@ -1849,6 +1876,13 @@ (define_mode_attr q [(V8QI "") (V16QI "_q")
(V4x1DF "") (V4x2DF "_q")
(V4x4BF "") (V4x8BF "_q")])
+;; Equivalent of the "q" attribute for the <VDBL> mode.
+(define_mode_attr dblq [(SI "") (SF "")
+ (V8QI "_q") (V4HI "_q")
+ (V4HF "_q") (V4BF "_q")
+ (V2SI "_q") (V2SF "_q")
+ (DI "_q") (DF "_q")])
+
(define_mode_attr vp [(V8QI "v") (V16QI "v")
(V4HI "v") (V8HI "v")
(V2SI "p") (V4SI "v")
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-13.c b/gcc/testsuite/gcc.target/aarch64/vec-init-13.c
new file mode 100644
index 00000000000..d0f88cbe71a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-13.c
@@ -0,0 +1,123 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#include <arm_neon.h>
+
+/*
+** s64q_1:
+** fmov d0, x0
+** ret
+*/
+int64x2_t s64q_1(int64_t a0) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int64x2_t) { 0, a0 };
+ else
+ return (int64x2_t) { a0, 0 };
+}
+/*
+** s64q_2:
+** ldr d0, \[x0\]
+** ret
+*/
+int64x2_t s64q_2(int64_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int64x2_t) { 0, ptr[0] };
+ else
+ return (int64x2_t) { ptr[0], 0 };
+}
+/*
+** s64q_3:
+** ldr d0, \[x0, #?8\]
+** ret
+*/
+int64x2_t s64q_3(int64_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int64x2_t) { 0, ptr[1] };
+ else
+ return (int64x2_t) { ptr[1], 0 };
+}
+
+/*
+** f64q_1:
+** fmov d0, d0
+** ret
+*/
+float64x2_t f64q_1(float64_t a0) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float64x2_t) { 0, a0 };
+ else
+ return (float64x2_t) { a0, 0 };
+}
+/*
+** f64q_2:
+** ldr d0, \[x0\]
+** ret
+*/
+float64x2_t f64q_2(float64_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float64x2_t) { 0, ptr[0] };
+ else
+ return (float64x2_t) { ptr[0], 0 };
+}
+/*
+** f64q_3:
+** ldr d0, \[x0, #?8\]
+** ret
+*/
+float64x2_t f64q_3(float64_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float64x2_t) { 0, ptr[1] };
+ else
+ return (float64x2_t) { ptr[1], 0 };
+}
+
+/*
+** s32q_1:
+** fmov d0, d0
+** ret
+*/
+int32x4_t s32q_1(int32x2_t a0, int32x2_t a1) {
+ return vcombine_s32 (a0, (int32x2_t) { 0, 0 });
+}
+/*
+** s32q_2:
+** ldr d0, \[x0\]
+** ret
+*/
+int32x4_t s32q_2(int32x2_t *ptr) {
+ return vcombine_s32 (ptr[0], (int32x2_t) { 0, 0 });
+}
+/*
+** s32q_3:
+** ldr d0, \[x0, #?8\]
+** ret
+*/
+int32x4_t s32q_3(int32x2_t *ptr) {
+ return vcombine_s32 (ptr[1], (int32x2_t) { 0, 0 });
+}
+
+/*
+** f32q_1:
+** fmov d0, d0
+** ret
+*/
+float32x4_t f32q_1(float32x2_t a0, float32x2_t a1) {
+ return vcombine_f32 (a0, (float32x2_t) { 0, 0 });
+}
+/*
+** f32q_2:
+** ldr d0, \[x0\]
+** ret
+*/
+float32x4_t f32q_2(float32x2_t *ptr) {
+ return vcombine_f32 (ptr[0], (float32x2_t) { 0, 0 });
+}
+/*
+** f32q_3:
+** ldr d0, \[x0, #?8\]
+** ret
+*/
+float32x4_t f32q_3(float32x2_t *ptr) {
+ return vcombine_f32 (ptr[1], (float32x2_t) { 0, 0 });
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-14.c b/gcc/testsuite/gcc.target/aarch64/vec-init-14.c
new file mode 100644
index 00000000000..02875088cd9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-14.c
@@ -0,0 +1,123 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#include <arm_neon.h>
+
+void ext();
+
+/*
+** s32_1:
+** fmov s0, w0
+** ins v0\.s\[1\], w1
+** ret
+*/
+int32x2_t s32_1(int32_t a0, int32_t a1) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int32x2_t) { a1, a0 };
+ else
+ return (int32x2_t) { a0, a1 };
+}
+/*
+** s32_2:
+** fmov s0, w0
+** ld1 {v0\.s}\[1\], \[x1\]
+** ret
+*/
+int32x2_t s32_2(int32_t a0, int32_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int32x2_t) { ptr[0], a0 };
+ else
+ return (int32x2_t) { a0, ptr[0] };
+}
+/*
+** s32_3:
+** ldr s0, \[x0\]
+** ins v0\.s\[1\], w1
+** ret
+*/
+int32x2_t s32_3(int32_t *ptr, int32_t a1) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int32x2_t) { a1, ptr[0] };
+ else
+ return (int32x2_t) { ptr[0], a1 };
+}
+/*
+** s32_4:
+** stp w1, w2, \[x0\]
+** ret
+*/
+void s32_4(int32x2_t *res, int32_t a0, int32_t a1) {
+ res[0] = (int32x2_t) { a0, a1 };
+}
+/*
+** s32_5:
+** stp w1, w2, \[x0, #?4\]
+** ret
+*/
+void s32_5(uintptr_t res, int32_t a0, int32_t a1) {
+ *(int32x2_t *)(res + 4) = (int32x2_t) { a0, a1 };
+}
+/* Currently uses d8 to hold res across the call. */
+int32x2_t s32_6(int32_t a0, int32_t a1) {
+ int32x2_t res = { a0, a1 };
+ ext ();
+ return res;
+}
+
+/*
+** f32_1:
+** ins v0\.s\[1\], v1\.s\[0\]
+** ret
+*/
+float32x2_t f32_1(float32_t a0, float32_t a1) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float32x2_t) { a1, a0 };
+ else
+ return (float32x2_t) { a0, a1 };
+}
+/*
+** f32_2:
+** ld1 {v0\.s}\[1\], \[x0\]
+** ret
+*/
+float32x2_t f32_2(float32_t a0, float32_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float32x2_t) { ptr[0], a0 };
+ else
+ return (float32x2_t) { a0, ptr[0] };
+}
+/*
+** f32_3:
+** ldr s0, \[x0\]
+** ins v0\.s\[1\], v1\.s\[0\]
+** ret
+*/
+float32x2_t f32_3(float32_t a0, float32_t a1, float32_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float32x2_t) { a1, ptr[0] };
+ else
+ return (float32x2_t) { ptr[0], a1 };
+}
+/*
+** f32_4:
+** stp s0, s1, \[x0\]
+** ret
+*/
+void f32_4(float32x2_t *res, float32_t a0, float32_t a1) {
+ res[0] = (float32x2_t) { a0, a1 };
+}
+/*
+** f32_5:
+** stp s0, s1, \[x0, #?4\]
+** ret
+*/
+void f32_5(uintptr_t res, float32_t a0, float32_t a1) {
+ *(float32x2_t *)(res + 4) = (float32x2_t) { a0, a1 };
+}
+/* Currently uses d8 to hold res across the call. */
+float32x2_t f32_6(float32_t a0, float32_t a1) {
+ float32x2_t res = { a0, a1 };
+ ext ();
+ return res;
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-15.c b/gcc/testsuite/gcc.target/aarch64/vec-init-15.c
new file mode 100644
index 00000000000..82f0a8f55ee
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-15.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#include <arm_neon.h>
+
+int32x2_t f1(int32_t *x, int c) {
+ return c ? (int32x2_t) { x[0], x[2] } : (int32x2_t) { 0, 0 };
+}
+
+int32x2_t f2(int32_t *x, int i0, int i1, int c) {
+ return c ? (int32x2_t) { x[i0], x[i1] } : (int32x2_t) { 0, 0 };
+}
+
+/* { dg-final { scan-assembler-times {\t(?:ldr\ts[0-9]+|ld1\t)} 4 } } */
+/* { dg-final { scan-assembler-not {\tldr\tw} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-16.c b/gcc/testsuite/gcc.target/aarch64/vec-init-16.c
new file mode 100644
index 00000000000..e00aec7a32c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-16.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+
+#include <arm_neon.h>
+
+void f1(int32x2_t *res, int32_t *x, int c0, int c1) {
+ res[0] = (int32x2_t) { c0 ? x[0] : 0, c1 ? x[2] : 0 };
+}
+
+/* { dg-final { scan-assembler-times {\tldr\tw[0-9]+} 2 } } */
+/* { dg-final { scan-assembler {\tstp\tw[0-9]+, w[0-9]+} } } */
+/* { dg-final { scan-assembler-not {\tldr\ts} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/vec-init-17.c b/gcc/testsuite/gcc.target/aarch64/vec-init-17.c
new file mode 100644
index 00000000000..86191b3ca1d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vec-init-17.c
@@ -0,0 +1,73 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-final { check-function-bodies "**" "" "" { target lp64 } } } */
+
+#include <arm_neon.h>
+
+/*
+** s32_1:
+** fmov s0, w0
+** ret
+*/
+int32x2_t s32_1(int32_t a0) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int32x2_t) { 0, a0 };
+ else
+ return (int32x2_t) { a0, 0 };
+}
+/*
+** s32_2:
+** ldr s0, \[x0\]
+** ret
+*/
+int32x2_t s32_2(int32_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int32x2_t) { 0, ptr[0] };
+ else
+ return (int32x2_t) { ptr[0], 0 };
+}
+/*
+** s32_3:
+** ldr s0, \[x0, #?4\]
+** ret
+*/
+int32x2_t s32_3(int32_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (int32x2_t) { 0, ptr[1] };
+ else
+ return (int32x2_t) { ptr[1], 0 };
+}
+
+/*
+** f32_1:
+** fmov s0, s0
+** ret
+*/
+float32x2_t f32_1(float32_t a0) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float32x2_t) { 0, a0 };
+ else
+ return (float32x2_t) { a0, 0 };
+}
+/*
+** f32_2:
+** ldr s0, \[x0\]
+** ret
+*/
+float32x2_t f32_2(float32_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float32x2_t) { 0, ptr[0] };
+ else
+ return (float32x2_t) { ptr[0], 0 };
+}
+/*
+** f32_3:
+** ldr s0, \[x0, #?4\]
+** ret
+*/
+float32x2_t f32_3(float32_t *ptr) {
+ if (__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
+ return (float32x2_t) { 0, ptr[1] };
+ else
+ return (float32x2_t) { ptr[1], 0 };
+}
--
2.25.1
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-02-09 17:02 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-09 17:00 [pushed 0/8] aarch64: Fix regression in vec_init code quality Richard Sandiford
2022-02-09 17:00 ` [pushed 1/8] aarch64: Tighten general_operand predicates Richard Sandiford
2022-02-09 17:00 ` [pushed 2/8] aarch64: Generalise vec_set predicate Richard Sandiford
2022-02-09 17:00 ` [pushed 3/8] aarch64: Generalise adjacency check for load_pair_lanes Richard Sandiford
2022-02-09 17:01 ` [pushed 4/8] aarch64: Remove redundant vec_concat patterns Richard Sandiford
2022-02-09 17:01 ` [pushed 5/8] aarch64: Add more vec_combine patterns Richard Sandiford
2022-02-09 17:01 ` [pushed 6/8] aarch64: Add a general vec_concat expander Richard Sandiford
2022-02-09 17:01 ` [pushed 7/8] aarch64: Remove move_lo/hi_quad expanders Richard Sandiford
2022-02-09 17:02 ` [pushed 8/8] aarch64: Extend vec_concat patterns to 8-byte vectors Richard Sandiford
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).