* [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types
@ 2020-07-07 13:17 Przemyslaw Wirkus
2020-07-13 16:12 ` Richard Sandiford
0 siblings, 1 reply; 8+ messages in thread
From: Przemyslaw Wirkus @ 2020-07-07 13:17 UTC (permalink / raw)
To: gcc-patches
Cc: Richard Earnshaw, Richard Sandiford, Marcus Shawcroft, Kyrylo Tkachov
[-- Attachment #1: Type: text/plain, Size: 2383 bytes --]
Hi,
Introduce simple peephole2 optimization which substitutes a sequence of
four consecutive load or store (LDR, STR) instructions with two load or
store pair (LDP, STP) instructions for 2 element supported vector modes
(V2SI, V2SF, V2DI, and V2DF).
Generated load / store pair instruction offset is adjusted accordingly.
Bootstrapped and tested on aarch64-none-linux-gnu.
Example:
$ cat stp_vec_v2sf.c
typedef float __attribute__((vector_size(8))) vec;
void
store_adjusted(vec *out, vec x, vec y)
{
out[400] = x;
out[401] = y;
out[402] = y;
out[403] = x;
}
Example compiled with:
$ ./aarch64-none-linux-gnu-gcc -S -O2 stp_vec_v2sf.c -dp
Before the patch:
store_adjusted:
str d0, [x0, 3200] // 9 [c=4 l=4] *aarch64_simd_movv2si/2
str d1, [x0, 3208] // 11 [c=4 l=4] *aarch64_simd_movv2si/2
str d1, [x0, 3216] // 13 [c=4 l=4] *aarch64_simd_movv2si/2
str d0, [x0, 3224] // 15 [c=4 l=4] *aarch64_simd_movv2si/2
ret // 26 [c=0 l=4] *do_return
After the patch:
store_adjusted:
add x1, x0, 3200 // 27 [c=4 l=4] *adddi3_aarch64/0
stp d0, d1, [x1] // 28 [c=0 l=4] vec_store_pairv2siv2si
stp d1, d0, [x1, 16] // 29 [c=0 l=4] vec_store_pairv2siv2si
ret // 22 [c=0 l=4] *do_return
OK for master ?
kind regards,
Przemyslaw
gcc/Changelog:
* config/aarch64/aarch64-ldpstp.md: Add two peepholes for adjusted vector
V2SI, V2SF, V2DI, V2DF load and store modes.
* config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp): Add new
parameter nunits.
(aarch64_operands_adjust_ok_for_ldpstp): Add new parameter nunits.
* config/aarch64/aarch64.c (aarch64_operands_adjust_ok_for_ldpstp): Add
new parameter nunits and support for vector types.
(aarch64_gen_adjusted_ldpstp): Add new parameter nunits and support for
vector types.
* config/aarch64/iterators.md (VP_2E): New iterator for 2 element vectors.
(nunits): Add SI and DI to mode attribute.
gcc/testsuite/Changelog:
* gcc.target/aarch64/ldp_vec_v2sf.c: New test.
* gcc.target/aarch64/ldp_vec_v2si.c: New test.
* gcc.target/aarch64/stp_vec_v2df.c: New test.
* gcc.target/aarch64/stp_vec_v2di.c: New test.
* gcc.target/aarch64/stp_vec_v2sf.c: New test.
* gcc.target/aarch64/stp_vec_v2si.c: New test.
[-- Attachment #2: patch.txt --]
[-- Type: text/plain, Size: 14980 bytes --]
diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarch64-ldpstp.md
index dd6f39615c51105a45b7b3dcde7b86e900ae7119..94c312f8f4f6472ebbeca0c2f3e760e0e316f7b7 100644
--- a/gcc/config/aarch64/aarch64-ldpstp.md
+++ b/gcc/config/aarch64/aarch64-ldpstp.md
@@ -186,10 +186,10 @@ (define_peephole2
(set (match_operand:GPI 6 "register_operand" "")
(match_operand:GPI 7 "memory_operand" ""))
(match_dup 8)]
- "aarch64_operands_adjust_ok_for_ldpstp (operands, true, <MODE>mode)"
+ "aarch64_operands_adjust_ok_for_ldpstp (operands, true, <MODE>mode, <nunits>)"
[(const_int 0)]
{
- if (aarch64_gen_adjusted_ldpstp (operands, true, <MODE>mode, UNKNOWN))
+ if (aarch64_gen_adjusted_ldpstp (operands, true, <MODE>mode, <nunits>, UNKNOWN))
DONE;
else
FAIL;
@@ -206,10 +206,10 @@ (define_peephole2
(set (match_operand:GPF 6 "register_operand" "")
(match_operand:GPF 7 "memory_operand" ""))
(match_dup 8)]
- "aarch64_operands_adjust_ok_for_ldpstp (operands, true, <MODE>mode)"
+ "aarch64_operands_adjust_ok_for_ldpstp (operands, true, <MODE>mode, <nunits>)"
[(const_int 0)]
{
- if (aarch64_gen_adjusted_ldpstp (operands, true, <MODE>mode, UNKNOWN))
+ if (aarch64_gen_adjusted_ldpstp (operands, true, <MODE>mode, <nunits>, UNKNOWN))
DONE;
else
FAIL;
@@ -226,10 +226,10 @@ (define_peephole2
(set (match_operand:DI 6 "register_operand" "")
(sign_extend:DI (match_operand:SI 7 "memory_operand" "")))
(match_dup 8)]
- "aarch64_operands_adjust_ok_for_ldpstp (operands, true, SImode)"
+ "aarch64_operands_adjust_ok_for_ldpstp (operands, true, SImode, 1)"
[(const_int 0)]
{
- if (aarch64_gen_adjusted_ldpstp (operands, true, SImode, SIGN_EXTEND))
+ if (aarch64_gen_adjusted_ldpstp (operands, true, SImode, 1, SIGN_EXTEND))
DONE;
else
FAIL;
@@ -246,10 +246,10 @@ (define_peephole2
(set (match_operand:DI 6 "register_operand" "")
(zero_extend:DI (match_operand:SI 7 "memory_operand" "")))
(match_dup 8)]
- "aarch64_operands_adjust_ok_for_ldpstp (operands, true, SImode)"
+ "aarch64_operands_adjust_ok_for_ldpstp (operands, true, SImode, 1)"
[(const_int 0)]
{
- if (aarch64_gen_adjusted_ldpstp (operands, true, SImode, ZERO_EXTEND))
+ if (aarch64_gen_adjusted_ldpstp (operands, true, SImode, 1, ZERO_EXTEND))
DONE;
else
FAIL;
@@ -266,10 +266,10 @@ (define_peephole2
(set (match_operand:GPI 6 "memory_operand" "")
(match_operand:GPI 7 "aarch64_reg_or_zero" ""))
(match_dup 8)]
- "aarch64_operands_adjust_ok_for_ldpstp (operands, false, <MODE>mode)"
+ "aarch64_operands_adjust_ok_for_ldpstp (operands, false, <MODE>mode, <nunits>)"
[(const_int 0)]
{
- if (aarch64_gen_adjusted_ldpstp (operands, false, <MODE>mode, UNKNOWN))
+ if (aarch64_gen_adjusted_ldpstp (operands, false, <MODE>mode, <nunits>, UNKNOWN))
DONE;
else
FAIL;
@@ -286,10 +286,52 @@ (define_peephole2
(set (match_operand:GPF 6 "memory_operand" "")
(match_operand:GPF 7 "aarch64_reg_or_fp_zero" ""))
(match_dup 8)]
- "aarch64_operands_adjust_ok_for_ldpstp (operands, false, <MODE>mode)"
+ "aarch64_operands_adjust_ok_for_ldpstp (operands, false, <MODE>mode, <nunits>)"
[(const_int 0)]
{
- if (aarch64_gen_adjusted_ldpstp (operands, false, <MODE>mode, UNKNOWN))
+ if (aarch64_gen_adjusted_ldpstp (operands, false, <MODE>mode, <nunits>, UNKNOWN))
+ DONE;
+ else
+ FAIL;
+})
+
+(define_peephole2
+ [(match_scratch:DI 8 "r")
+ (set (match_operand:VP_2E 0 "memory_operand" "")
+ (match_operand:VP_2E 1 "aarch64_reg_or_zero" ""))
+ (set (match_operand:VP_2E 2 "memory_operand" "")
+ (match_operand:VP_2E 3 "aarch64_reg_or_zero" ""))
+ (set (match_operand:VP_2E 4 "memory_operand" "")
+ (match_operand:VP_2E 5 "aarch64_reg_or_zero" ""))
+ (set (match_operand:VP_2E 6 "memory_operand" "")
+ (match_operand:VP_2E 7 "aarch64_reg_or_zero" ""))
+ (match_dup 8)]
+ "TARGET_SIMD
+ && aarch64_operands_adjust_ok_for_ldpstp (operands, false, <VEL>mode, <nunits>)"
+ [(const_int 0)]
+{
+ if (aarch64_gen_adjusted_ldpstp (operands, false, <VEL>mode, <nunits>, UNKNOWN))
+ DONE;
+ else
+ FAIL;
+})
+
+(define_peephole2
+ [(match_scratch:DI 8 "r")
+ (set (match_operand:VP_2E 0 "register_operand" "")
+ (match_operand:VP_2E 1 "memory_operand" ""))
+ (set (match_operand:VP_2E 2 "register_operand" "")
+ (match_operand:VP_2E 3 "memory_operand" ""))
+ (set (match_operand:VP_2E 4 "register_operand" "")
+ (match_operand:VP_2E 5 "memory_operand" ""))
+ (set (match_operand:VP_2E 6 "register_operand" "")
+ (match_operand:VP_2E 7 "memory_operand" ""))
+ (match_dup 8)]
+ "TARGET_SIMD
+ && aarch64_operands_adjust_ok_for_ldpstp (operands, true, <VEL>mode, <nunits>)"
+ [(const_int 0)]
+{
+ if (aarch64_gen_adjusted_ldpstp (operands, true, <VEL>mode, <nunits>, UNKNOWN))
DONE;
else
FAIL;
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9e43adb7db0373df6cc5ef1d2b22f217aca2aad2..8855fcbedbca8784e30511c017d95b58d03ee452 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -681,7 +681,7 @@ void aarch64_split_compare_and_swap (rtx op[]);
void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
-bool aarch64_gen_adjusted_ldpstp (rtx *, bool, scalar_mode, RTX_CODE);
+bool aarch64_gen_adjusted_ldpstp (rtx *, bool, scalar_mode, int nunits, RTX_CODE);
void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx);
bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
@@ -732,7 +732,7 @@ int aarch64_ccmp_mode_to_code (machine_mode mode);
bool extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset);
bool aarch64_operands_ok_for_ldpstp (rtx *, bool, machine_mode);
-bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, scalar_mode);
+bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, scalar_mode, int nunits);
void aarch64_swap_ldrstr_operands (rtx *, bool);
extern void aarch64_asm_output_pool_epilogue (FILE *, const char *,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 973c65aa4fb348450872036617362aa17310fb20..15bfbc29f68eadd6c7e5458228cd74bc734ab627 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -21873,6 +21873,9 @@ aarch64_ldrstr_offset_compare (const void *x, const void *y)
/* Given OPERANDS of consecutive load/store, check if we can merge
them into ldp/stp by adjusting the offset. LOAD is true if they
are load instructions. MODE is the mode of memory operands.
+ NUNITS is the number of units for MODE of memory operands. This
+ allows us to, in addition to scalar modes (NUNITS == 1), adjust
+ vector modes (NUNITS > 1) of memory operands.
Given below consecutive stores:
@@ -21893,7 +21896,7 @@ aarch64_ldrstr_offset_compare (const void *x, const void *y)
bool
aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
- scalar_mode mode)
+ scalar_mode mode, int nunits)
{
const int num_insns = 4;
enum reg_class rclass;
@@ -21970,7 +21973,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
for (int i = 0; i < num_insns; i++)
offvals[i] = INTVAL (offset[i]);
- msize = GET_MODE_SIZE (mode);
+ msize = GET_MODE_SIZE (mode) * nunits;
/* Check if the offsets can be put in the right order to do a ldp/stp. */
qsort (offvals, num_insns, sizeof (HOST_WIDE_INT),
@@ -22010,13 +22013,14 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
bool
aarch64_gen_adjusted_ldpstp (rtx *operands, bool load,
- scalar_mode mode, RTX_CODE code)
+ scalar_mode mode, int nunits, RTX_CODE code)
{
rtx base, offset_1, offset_3, t1, t2;
rtx mem_1, mem_2, mem_3, mem_4;
rtx temp_operands[8];
HOST_WIDE_INT off_val_1, off_val_3, base_off, new_off_1, new_off_3,
stp_off_upper_limit, stp_off_lower_limit, msize;
+ machine_mode mem_mode;
/* We make changes on a copy as we may still bail out. */
for (int i = 0; i < 8; i ++)
@@ -22049,7 +22053,7 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool load,
&& offset_3 != NULL_RTX);
/* Adjust offset so it can fit in LDP/STP instruction. */
- msize = GET_MODE_SIZE (mode);
+ msize = GET_MODE_SIZE (mode) * nunits;
stp_off_upper_limit = msize * (0x40 - 1);
stp_off_lower_limit = - msize * 0x40;
@@ -22114,8 +22118,11 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool load,
replace_equiv_address_nv (mem_4, plus_constant (Pmode, operands[8],
new_off_3 + msize), true);
- if (!aarch64_mem_pair_operand (mem_1, mode)
- || !aarch64_mem_pair_operand (mem_3, mode))
+ /* If nunits > 1 we are adjusting for vector mode. In this case we should
+ generate mode for vector built from nunits and scalar_mode provided. */
+ mem_mode = (nunits == 1) ? mode : mode_for_vector(mode, nunits).else_void();
+ if (!aarch64_mem_pair_operand (mem_1, mem_mode)
+ || !aarch64_mem_pair_operand (mem_3, mem_mode))
return false;
if (code == ZERO_EXTEND)
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index a568cf21b99d4b169d7e367c5f00d65c544ef790..8c5765476a9db2b93775f7da770bb2ba03677763 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -98,6 +98,9 @@ (define_mode_iterator DREG [V8QI V4HI V4HF V2SI V2SF DF])
;; Copy of the above.
(define_mode_iterator DREG2 [V8QI V4HI V4HF V2SI V2SF DF])
+;; All modes suitable to store/load pair (2 elements) using STP/LDP.
+(define_mode_iterator VP_2E [V2SI V2SF V2DI V2DF])
+
;; Advanced SIMD, 64-bit container, all integer modes.
(define_mode_iterator VD_BHSI [V8QI V4HI V2SI])
@@ -935,6 +938,7 @@ (define_mode_attr nunits [(V8QI "8") (V16QI "16")
(V4BF "4") (V8BF "8")
(V2SF "2") (V4SF "4")
(V1DF "1") (V2DF "2")
+ (SI "1") (SF "1")
(DI "1") (DF "1")])
;; Map a mode to the number of bits in it, if the size of the mode
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
new file mode 100644
index 0000000000000000000000000000000000000000..f46dea1f748a094509ecfa0292a7c54e94164c9a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef float __attribute__((vector_size(8))) vec;
+
+vec
+load_long(vec *v) {
+ return v[110] + v[111] + v[112] + v[113];
+}
+
+/* { dg-final { scan-assembler "add\tx\[0-9\]+, x\[0-9\]+, 880" } } */
+/* { dg-final { scan-assembler "ldp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "ldp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+, 16\\\]" } } */
+/* { dg-final { scan-assembler-not "ldr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2si.c b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2si.c
new file mode 100644
index 0000000000000000000000000000000000000000..0abd94f942ae7ec49afda590989773f52556404c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2si.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int __attribute__((vector_size(8))) vec;
+
+vec
+load_long(vec *v) {
+ return v[110] + v[111] + v[112] + v[113];
+}
+
+/* { dg-final { scan-assembler "add\tx\[0-9\]+, x\[0-9\]+, 880" } } */
+/* { dg-final { scan-assembler "ldp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "ldp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+, 16\\\]" } } */
+/* { dg-final { scan-assembler-not "ldr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_v2df.c b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2df.c
new file mode 100644
index 0000000000000000000000000000000000000000..cb7a65c006af451b873f8adc0546af6f8efa3c43
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2df.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef double __attribute__((vector_size(16))) vec;
+
+void
+store_adjusted(vec *out, vec x, vec y)
+{
+ out[100] = x;
+ out[101] = y;
+ out[102] = y;
+ out[103] = x;
+}
+
+/* { dg-final { scan-assembler "add\tx\[0-9\]+, x\[0-9\]+, 1600" } } */
+/* { dg-final { scan-assembler "stp\tq\[0-9\]+, q\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "stp\tq\[0-9\]+, q\[0-9\]+, \\\[x\[0-9\]+, 32\\\]" } } */
+/* { dg-final { scan-assembler-not "str\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_v2di.c b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2di.c
new file mode 100644
index 0000000000000000000000000000000000000000..a5b298d5c43beb2df4c21d1cb81a961cca908192
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2di.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef long long __attribute__((vector_size(16))) vec;
+
+void
+store_adjusted(vec *out, vec x, vec y)
+{
+ out[100] = x;
+ out[101] = y;
+ out[102] = y;
+ out[103] = x;
+}
+
+/* { dg-final { scan-assembler "add\tx\[0-9\]+, x\[0-9\]+, 1600" } } */
+/* { dg-final { scan-assembler "stp\tq\[0-9\]+, q\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "stp\tq\[0-9\]+, q\[0-9\]+, \\\[x\[0-9\]+, 32\\\]" } } */
+/* { dg-final { scan-assembler-not "str\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_v2sf.c b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2sf.c
new file mode 100644
index 0000000000000000000000000000000000000000..3bf8c58faa3b687040b5a5bccec54f771914b474
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2sf.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef float __attribute__((vector_size(8))) vec;
+
+void
+store_adjusted(vec *out, vec x, vec y)
+{
+ out[400] = x;
+ out[401] = y;
+ out[402] = y;
+ out[403] = x;
+}
+
+/* { dg-final { scan-assembler "add\tx\[0-9\]+, x\[0-9\]+, 3200" } } */
+/* { dg-final { scan-assembler "stp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "stp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+, 16\\\]" } } */
+/* { dg-final { scan-assembler-not "str\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_v2si.c b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2si.c
new file mode 100644
index 0000000000000000000000000000000000000000..f9d1cf4ac6bad7d44604a71037dd15cff55ced51
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2si.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int __attribute__((vector_size(8))) vec;
+
+void
+store_adjusted(vec *out, vec x, vec y)
+{
+ out[400] = x;
+ out[401] = y;
+ out[402] = y;
+ out[403] = x;
+}
+
+/* { dg-final { scan-assembler "add\tx\[0-9\]+, x\[0-9\]+, 3200" } } */
+/* { dg-final { scan-assembler "stp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "stp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+, 16\\\]" } } */
+/* { dg-final { scan-assembler-not "str\t" } } */
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types
2020-07-07 13:17 [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types Przemyslaw Wirkus
@ 2020-07-13 16:12 ` Richard Sandiford
2020-07-21 8:20 ` Przemyslaw Wirkus
0 siblings, 1 reply; 8+ messages in thread
From: Richard Sandiford @ 2020-07-13 16:12 UTC (permalink / raw)
To: Przemyslaw Wirkus
Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov
Hi,
Sorry for the slow review.
Przemyslaw Wirkus <Przemyslaw.Wirkus@arm.com> writes:
> Hi,
>
> Introduce simple peephole2 optimization which substitutes a sequence of
> four consecutive load or store (LDR, STR) instructions with two load or
> store pair (LDP, STP) instructions for 2 element supported vector modes
> (V2SI, V2SF, V2DI, and V2DF).
> Generated load / store pair instruction offset is adjusted accordingly.
>
> Bootstrapped and tested on aarch64-none-linux-gnu.
>
> Example:
> $ cat stp_vec_v2sf.c
> typedef float __attribute__((vector_size(8))) vec;
>
> void
> store_adjusted(vec *out, vec x, vec y)
> {
> out[400] = x;
> out[401] = y;
> out[402] = y;
> out[403] = x;
> }
>
> Example compiled with:
> $ ./aarch64-none-linux-gnu-gcc -S -O2 stp_vec_v2sf.c -dp
>
> Before the patch:
>
> store_adjusted:
> str d0, [x0, 3200] // 9 [c=4 l=4] *aarch64_simd_movv2si/2
> str d1, [x0, 3208] // 11 [c=4 l=4] *aarch64_simd_movv2si/2
> str d1, [x0, 3216] // 13 [c=4 l=4] *aarch64_simd_movv2si/2
> str d0, [x0, 3224] // 15 [c=4 l=4] *aarch64_simd_movv2si/2
> ret // 26 [c=0 l=4] *do_return
>
> After the patch:
>
> store_adjusted:
> add x1, x0, 3200 // 27 [c=4 l=4] *adddi3_aarch64/0
> stp d0, d1, [x1] // 28 [c=0 l=4] vec_store_pairv2siv2si
> stp d1, d0, [x1, 16] // 29 [c=0 l=4] vec_store_pairv2siv2si
> ret // 22 [c=0 l=4] *do_return
>
>
> OK for master ?
>
> kind regards,
> Przemyslaw
Thanks for doing this, looks good.
My only real comment is that I wonder if we really need the nunits
parameter, or if we should instead change the scalar_mode to a general
machine_mode and just pass the vector mode to that.
Passing nunits means that we don't need a to_constant here:
> @@ -21970,7 +21973,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
> for (int i = 0; i < num_insns; i++)
> offvals[i] = INTVAL (offset[i]);
>
> - msize = GET_MODE_SIZE (mode);
> + msize = GET_MODE_SIZE (mode) * nunits;
>
> /* Check if the offsets can be put in the right order to do a ldp/stp. */
> qsort (offvals, num_insns, sizeof (HOST_WIDE_INT),
but I think adding to_constant is fine in this context, since it's
inherently non-SVE code.
That would also avoid having to recalculate the mode:
> @@ -22114,8 +22118,11 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool load,
> replace_equiv_address_nv (mem_4, plus_constant (Pmode, operands[8],
> new_off_3 + msize), true);
>
> - if (!aarch64_mem_pair_operand (mem_1, mode)
> - || !aarch64_mem_pair_operand (mem_3, mode))
> + /* If nunits > 1 we are adjusting for vector mode. In this case we should
> + generate mode for vector built from nunits and scalar_mode provided. */
> + mem_mode = (nunits == 1) ? mode : mode_for_vector(mode, nunits).else_void();
> + if (!aarch64_mem_pair_operand (mem_1, mem_mode)
> + || !aarch64_mem_pair_operand (mem_3, mem_mode))
> return false;
>
> if (code == ZERO_EXTEND)
…here.
One other (very) minor thing is that some of the lines were over the
80 character limit, but removing the nunits parameter might fix that :-)
> diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..f46dea1f748a094509ecfa0292a7c54e94164c9a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +typedef float __attribute__((vector_size(8))) vec;
> +
> +vec
> +load_long(vec *v) {
> + return v[110] + v[111] + v[112] + v[113];
> +}
> +
> +/* { dg-final { scan-assembler "add\tx\[0-9\]+, x\[0-9\]+, 880" } } */
> +/* { dg-final { scan-assembler "ldp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+\\\]" } } */
> +/* { dg-final { scan-assembler "ldp\td\[0-9\]+, d\[0-9\]+, \\\[x\[0-9\]+, 16\\\]" } } */
FWIW, it's possible to avoid many of these backslashes by quoting the
regexp with {…} rather than "…". E.g.:
/* { dg-final { scan-assembler {ldp\td[0-9]+, d[0-9]+, \[x[0-9]+, 16\]} } } */
The above is fine too though.
Thanks,
Richard
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types
2020-07-13 16:12 ` Richard Sandiford
@ 2020-07-21 8:20 ` Przemyslaw Wirkus
2020-07-21 8:31 ` Andrea Corallo
0 siblings, 1 reply; 8+ messages in thread
From: Przemyslaw Wirkus @ 2020-07-21 8:20 UTC (permalink / raw)
To: Richard Sandiford
Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, Kyrylo Tkachov
[-- Attachment #1: Type: text/plain, Size: 1076 bytes --]
Richard,
In attachment reworked patch.
> -----Original Message-----
> From: Richard Sandiford <richard.sandiford@arm.com>
> Sent: 13 July 2020 17:13
> To: Przemyslaw Wirkus <Przemyslaw.Wirkus@arm.com>
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> <Richard.Earnshaw@arm.com>; Marcus Shawcroft
> <Marcus.Shawcroft@arm.com>; Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>
> Subject: Re: [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector
> types
>
> Hi,
>
> Sorry for the slow review.
Thank you for all your comments. They were insightful. I've simplified
my patch to match them.
> Przemyslaw Wirkus <Przemyslaw.Wirkus@arm.com> writes:
> > Hi,
> >
> > Introduce simple peephole2 optimization which substitutes a sequence
> > of four consecutive load or store (LDR, STR) instructions with two
> > load or store pair (LDP, STP) instructions for 2 element supported
> > vector modes (V2SI, V2SF, V2DI, and V2DF).
> > Generated load / store pair instruction offset is adjusted accordingly.
[snip...]
Kind regards,
Przemyslaw Wirkus
[-- Attachment #2: rb13293.patch --]
[-- Type: application/octet-stream, Size: 9963 bytes --]
diff --git a/gcc/config/aarch64/aarch64-ldpstp.md b/gcc/config/aarch64/aarch64-ldpstp.md
index dd6f39615c51105a45b7b3dcde7b86e900ae7119..02d7a5bd171b2e2e1a92448c021ff4822cd7a1a7 100644
--- a/gcc/config/aarch64/aarch64-ldpstp.md
+++ b/gcc/config/aarch64/aarch64-ldpstp.md
@@ -294,3 +294,45 @@ (define_peephole2
else
FAIL;
})
+
+(define_peephole2
+ [(match_scratch:DI 8 "r")
+ (set (match_operand:VP_2E 0 "memory_operand" "")
+ (match_operand:VP_2E 1 "aarch64_reg_or_zero" ""))
+ (set (match_operand:VP_2E 2 "memory_operand" "")
+ (match_operand:VP_2E 3 "aarch64_reg_or_zero" ""))
+ (set (match_operand:VP_2E 4 "memory_operand" "")
+ (match_operand:VP_2E 5 "aarch64_reg_or_zero" ""))
+ (set (match_operand:VP_2E 6 "memory_operand" "")
+ (match_operand:VP_2E 7 "aarch64_reg_or_zero" ""))
+ (match_dup 8)]
+ "TARGET_SIMD
+ && aarch64_operands_adjust_ok_for_ldpstp (operands, false, <MODE>mode)"
+ [(const_int 0)]
+{
+ if (aarch64_gen_adjusted_ldpstp (operands, false, <MODE>mode, UNKNOWN))
+ DONE;
+ else
+ FAIL;
+})
+
+(define_peephole2
+ [(match_scratch:DI 8 "r")
+ (set (match_operand:VP_2E 0 "register_operand" "")
+ (match_operand:VP_2E 1 "memory_operand" ""))
+ (set (match_operand:VP_2E 2 "register_operand" "")
+ (match_operand:VP_2E 3 "memory_operand" ""))
+ (set (match_operand:VP_2E 4 "register_operand" "")
+ (match_operand:VP_2E 5 "memory_operand" ""))
+ (set (match_operand:VP_2E 6 "register_operand" "")
+ (match_operand:VP_2E 7 "memory_operand" ""))
+ (match_dup 8)]
+ "TARGET_SIMD
+ && aarch64_operands_adjust_ok_for_ldpstp (operands, true, <MODE>mode)"
+ [(const_int 0)]
+{
+ if (aarch64_gen_adjusted_ldpstp (operands, true, <MODE>mode, UNKNOWN))
+ DONE;
+ else
+ FAIL;
+})
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9e43adb7db0373df6cc5ef1d2b22f217aca2aad2..2a0e1683538406fa255bb5d8173893876d8f3e93 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -681,7 +681,7 @@ void aarch64_split_compare_and_swap (rtx op[]);
void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx);
-bool aarch64_gen_adjusted_ldpstp (rtx *, bool, scalar_mode, RTX_CODE);
+bool aarch64_gen_adjusted_ldpstp (rtx *, bool, machine_mode, RTX_CODE);
void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx);
bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
@@ -732,7 +732,7 @@ int aarch64_ccmp_mode_to_code (machine_mode mode);
bool extract_base_offset_in_addr (rtx mem, rtx *base, rtx *offset);
bool aarch64_operands_ok_for_ldpstp (rtx *, bool, machine_mode);
-bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, scalar_mode);
+bool aarch64_operands_adjust_ok_for_ldpstp (rtx *, bool, machine_mode);
void aarch64_swap_ldrstr_operands (rtx *, bool);
extern void aarch64_asm_output_pool_epilogue (FILE *, const char *,
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 973c65aa4fb348450872036617362aa17310fb20..99d050e2a233c56f2f1ddc5b5034a0948686814b 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -21893,7 +21893,7 @@ aarch64_ldrstr_offset_compare (const void *x, const void *y)
bool
aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
- scalar_mode mode)
+ machine_mode mode)
{
const int num_insns = 4;
enum reg_class rclass;
@@ -21970,7 +21970,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
for (int i = 0; i < num_insns; i++)
offvals[i] = INTVAL (offset[i]);
- msize = GET_MODE_SIZE (mode);
+ msize = GET_MODE_SIZE (mode).to_constant ();
/* Check if the offsets can be put in the right order to do a ldp/stp. */
qsort (offvals, num_insns, sizeof (HOST_WIDE_INT),
@@ -22010,7 +22010,7 @@ aarch64_operands_adjust_ok_for_ldpstp (rtx *operands, bool load,
bool
aarch64_gen_adjusted_ldpstp (rtx *operands, bool load,
- scalar_mode mode, RTX_CODE code)
+ machine_mode mode, RTX_CODE code)
{
rtx base, offset_1, offset_3, t1, t2;
rtx mem_1, mem_2, mem_3, mem_4;
@@ -22049,7 +22049,7 @@ aarch64_gen_adjusted_ldpstp (rtx *operands, bool load,
&& offset_3 != NULL_RTX);
/* Adjust offset so it can fit in LDP/STP instruction. */
- msize = GET_MODE_SIZE (mode);
+ msize = GET_MODE_SIZE (mode).to_constant();
stp_off_upper_limit = msize * (0x40 - 1);
stp_off_lower_limit = - msize * 0x40;
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index a568cf21b99d4b169d7e367c5f00d65c544ef790..8a1795eb211e78ab4f9c5d719b39d2e5b7c8f2de 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -98,6 +98,9 @@ (define_mode_iterator DREG [V8QI V4HI V4HF V2SI V2SF DF])
;; Copy of the above.
(define_mode_iterator DREG2 [V8QI V4HI V4HF V2SI V2SF DF])
+;; All modes suitable to store/load pair (2 elements) using STP/LDP.
+(define_mode_iterator VP_2E [V2SI V2SF V2DI V2DF])
+
;; Advanced SIMD, 64-bit container, all integer modes.
(define_mode_iterator VD_BHSI [V8QI V4HI V2SI])
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
new file mode 100644
index 0000000000000000000000000000000000000000..fbdae1c6cff1aef40db644361381ce511f0be64a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef float __attribute__((vector_size(8))) vec;
+
+vec
+load_long(vec *v) {
+ return v[110] + v[111] + v[112] + v[113];
+}
+
+/* { dg-final { scan-assembler {add\tx[0-9]+, x[0-9]+, 880} } } */
+/* { dg-final { scan-assembler {ldp\td[0-9]+, d[0-9]+, \[x[0-9]+\]} } } */
+/* { dg-final { scan-assembler {ldp\td[0-9]+, d[0-9]+, \[x[0-9]+, 16\]} } } */
+/* { dg-final { scan-assembler-not "ldr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2si.c b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2si.c
new file mode 100644
index 0000000000000000000000000000000000000000..7714cd6cd9e8fa7dc1febf484d6726d44c246408
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2si.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int __attribute__((vector_size(8))) vec;
+
+vec
+load_long(vec *v) {
+ return v[110] + v[111] + v[112] + v[113];
+}
+
+/* { dg-final { scan-assembler {add\tx[0-9]+, x[0-9]+, 880} } } */
+/* { dg-final { scan-assembler {ldp\td[0-9]+, d[0-9]+, \[x[0-9]+\]} } } */
+/* { dg-final { scan-assembler {ldp\td[0-9]+, d[0-9]+, \[x[0-9]+, 16\]} } } */
+/* { dg-final { scan-assembler-not "ldr\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_v2df.c b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2df.c
new file mode 100644
index 0000000000000000000000000000000000000000..3ee039d641531258be2453520c4e1bce6412d809
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2df.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef double __attribute__((vector_size(16))) vec;
+
+void
+store_adjusted(vec *out, vec x, vec y)
+{
+ out[100] = x;
+ out[101] = y;
+ out[102] = y;
+ out[103] = x;
+}
+
+/* { dg-final { scan-assembler {add\tx[0-9]+, x[0-9]+, 1600} } } */
+/* { dg-final { scan-assembler {stp\tq[0-9]+, q[0-9]+, \[x[0-9]+\]} } } */
+/* { dg-final { scan-assembler {stp\tq[0-9]+, q[0-9]+, \[x[0-9]+, 32\]} } } */
+/* { dg-final { scan-assembler-not "str\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_v2di.c b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2di.c
new file mode 100644
index 0000000000000000000000000000000000000000..85d7b5b955efaa617235519281a6c187d970b0da
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2di.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef long long __attribute__((vector_size(16))) vec;
+
+void
+store_adjusted(vec *out, vec x, vec y)
+{
+ out[100] = x;
+ out[101] = y;
+ out[102] = y;
+ out[103] = x;
+}
+
+/* { dg-final { scan-assembler {add\tx[0-9]+, x[0-9]+, 1600} } } */
+/* { dg-final { scan-assembler {stp\tq[0-9]+, q[0-9]+, \[x[0-9]+\]} } } */
+/* { dg-final { scan-assembler {stp\tq[0-9]+, q[0-9]+, \[x[0-9]+, 32\]} } } */
+/* { dg-final { scan-assembler-not "str\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_v2sf.c b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2sf.c
new file mode 100644
index 0000000000000000000000000000000000000000..e7c2e6e62ce741e45beada811af6c32bae27f698
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2sf.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef float __attribute__((vector_size(8))) vec;
+
+void
+store_adjusted(vec *out, vec x, vec y)
+{
+ out[400] = x;
+ out[401] = y;
+ out[402] = y;
+ out[403] = x;
+}
+
+/* { dg-final { scan-assembler {add\tx[0-9]+, x[0-9]+, 3200} } } */
+/* { dg-final { scan-assembler {stp\td[0-9]+, d[0-9]+, \[x[0-9]+\]} } } */
+/* { dg-final { scan-assembler {stp\td[0-9]+, d[0-9]+, \[x[0-9]+, 16\]} } } */
+/* { dg-final { scan-assembler-not "str\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/stp_vec_v2si.c b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2si.c
new file mode 100644
index 0000000000000000000000000000000000000000..17b44a2aed5df90230c4583a46a5bfa97a957b26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/stp_vec_v2si.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+typedef int __attribute__((vector_size(8))) vec;
+
+void
+store_adjusted(vec *out, vec x, vec y)
+{
+ out[400] = x;
+ out[401] = y;
+ out[402] = y;
+ out[403] = x;
+}
+
+/* { dg-final { scan-assembler {add\tx[0-9]+, x[0-9]+, 3200} } } */
+/* { dg-final { scan-assembler {stp\td[0-9]+, d[0-9]+, \[x[0-9]+\]} } } */
+/* { dg-final { scan-assembler {stp\td[0-9]+, d[0-9]+, \[x[0-9]+, 16\]} } } */
+/* { dg-final { scan-assembler-not "str\t" } } */
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types
2020-07-21 8:20 ` Przemyslaw Wirkus
@ 2020-07-21 8:31 ` Andrea Corallo
2020-07-21 9:00 ` Richard Sandiford
0 siblings, 1 reply; 8+ messages in thread
From: Andrea Corallo @ 2020-07-21 8:31 UTC (permalink / raw)
To: Przemyslaw Wirkus
Cc: Richard Sandiford, Richard Earnshaw, gcc-patches, Marcus Shawcroft
Przemyslaw Wirkus <Przemyslaw.Wirkus@arm.com> writes:
> diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..fbdae1c6cff1aef40db644361381ce511f0be64a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
> @@ -0,0 +1,14 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +typedef float __attribute__((vector_size(8))) vec;
> +
> +vec
> +load_long(vec *v) {
Hi Przemyslaw,
I think here we should have a space before '(' and a new line before
'{'.
Same applies for the other testcases.
Sorry for the nit picking.
Regards
Andrea
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types
2020-07-21 8:31 ` Andrea Corallo
@ 2020-07-21 9:00 ` Richard Sandiford
2020-07-21 9:04 ` Andrea Corallo
2020-07-22 8:49 ` Przemyslaw Wirkus
0 siblings, 2 replies; 8+ messages in thread
From: Richard Sandiford @ 2020-07-21 9:00 UTC (permalink / raw)
To: Andrea Corallo
Cc: Przemyslaw Wirkus, Richard Earnshaw, gcc-patches, Marcus Shawcroft
Andrea Corallo <andrea.corallo@arm.com> writes:
> Przemyslaw Wirkus <Przemyslaw.Wirkus@arm.com> writes:
>
>> diff --git a/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
>> new file mode 100644
>> index 0000000000000000000000000000000000000000..fbdae1c6cff1aef40db644361381ce511f0be64a
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/aarch64/ldp_vec_v2sf.c
>> @@ -0,0 +1,14 @@
>> +/* { dg-do compile } */
>> +/* { dg-options "-O2" } */
>> +
>> +typedef float __attribute__((vector_size(8))) vec;
>> +
>> +vec
>> +load_long(vec *v) {
>
> Hi Przemyslaw,
>
> I think here we should have a space before '(' and a new line before
> '{'.
>
> Same applies for the other testcases.
Yeah, that's certainly true for code in the compiler itself. Tests
kind-of get a pass stylewise though. It would be bad if everything in
the testsuite used GNU style, since then we'd never test anything else. ;-)
So the patch is OK as-is or with the above change.
Przemek, if you don't have commit access already, please follow
the steps on https://gcc.gnu.org/gitwrite.html (happy to sponsor).
Thanks,
Richard
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types
2020-07-21 9:00 ` Richard Sandiford
@ 2020-07-21 9:04 ` Andrea Corallo
2020-07-22 8:49 ` Przemyslaw Wirkus
1 sibling, 0 replies; 8+ messages in thread
From: Andrea Corallo @ 2020-07-21 9:04 UTC (permalink / raw)
To: Przemyslaw Wirkus
Cc: Richard Earnshaw, gcc-patches, Marcus Shawcroft, richard.sandiford
Richard Sandiford <richard.sandiford@arm.com> writes:
> Yeah, that's certainly true for code in the compiler itself. Tests
> kind-of get a pass stylewise though. It would be bad if everything in
> the testsuite used GNU style, since then we'd never test anything else. ;-)
LOL, good to know thanks
Andrea
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types
2020-07-21 9:00 ` Richard Sandiford
2020-07-21 9:04 ` Andrea Corallo
@ 2020-07-22 8:49 ` Przemyslaw Wirkus
2020-08-03 8:42 ` Przemyslaw Wirkus
1 sibling, 1 reply; 8+ messages in thread
From: Przemyslaw Wirkus @ 2020-07-22 8:49 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches
[snip...]
> Przemek, if you don't have commit access already, please follow the steps on
> https://gcc.gnu.org/gitwrite.html (happy to sponsor).
Done.
Thank you, Richard, for sponsoring this and all the support!
Kind regards,
Przemek
^ permalink raw reply [flat|nested] 8+ messages in thread
* RE: [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types
2020-07-22 8:49 ` Przemyslaw Wirkus
@ 2020-08-03 8:42 ` Przemyslaw Wirkus
0 siblings, 0 replies; 8+ messages in thread
From: Przemyslaw Wirkus @ 2020-08-03 8:42 UTC (permalink / raw)
To: gcc-patches
Commited cd91a084877dabcc53aec57ab70ca4fc32f3d985
> -----Original Message-----
> From: Przemyslaw Wirkus
> Sent: 22 July 2020 09:49
> To: Richard Sandiford <richard.sandiford@arm.com>
> Cc: gcc-patches@gcc.gnu.org
> Subject: RE: [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector
> types
>
> [snip...]
>
> > Przemek, if you don't have commit access already, please follow the
> > steps on https://gcc.gnu.org/gitwrite.html (happy to sponsor).
>
> Done.
>
> Thank you, Richard, for sponsoring this and all the support!
>
> Kind regards,
> Przemek
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-08-03 8:42 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-07 13:17 [PATCH][GCC][aarch64] Generation of adjusted ldp/stp for vector types Przemyslaw Wirkus
2020-07-13 16:12 ` Richard Sandiford
2020-07-21 8:20 ` Przemyslaw Wirkus
2020-07-21 8:31 ` Andrea Corallo
2020-07-21 9:00 ` Richard Sandiford
2020-07-21 9:04 ` Andrea Corallo
2020-07-22 8:49 ` Przemyslaw Wirkus
2020-08-03 8:42 ` Przemyslaw Wirkus
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).