public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] ira: Don't create copies for earlyclobbered pairs
@ 2023-05-05 16:59 Richard Sandiford
  2023-05-08 12:52 ` Vladimir Makarov
  0 siblings, 1 reply; 2+ messages in thread
From: Richard Sandiford @ 2023-05-05 16:59 UTC (permalink / raw)
  To: gcc-patches; +Cc: vmakarov, jlaw

This patch follows on from g:9f635bd13fe9e85872e441b6f3618947f989909a
("the previous patch").  To start by quoting that:

If an insn requires two operands to be tied, and the input operand dies
in the insn, IRA acts as though there were a copy from the input to the
output with the same execution frequency as the insn.  Allocating the
same register to the input and the output then saves the cost of a move.

If there is no such tie, but an input operand nevertheless dies
in the insn, IRA creates a similar move, but with an eighth of the
frequency.  This helps to ensure that chains of instructions reuse
registers in a natural way, rather than using arbitrarily different
registers for no reason.

This heuristic seems to work well in the vast majority of cases.
However, the problem fixed in the previous patch was that we
could create a copy for an operand pair even if, for all relevant
alternatives, the output and input register classes did not have
any registers in common.  It is then impossible for the output
operand to reuse the dying input register.

This left unfixed a further case where copies don't make sense:
there is no point trying to reuse the dying input register if,
for all relevant alternatives, the output is earlyclobbered and
the input doesn't match the output.  (Matched earlyclobbers are fine.)

Handling that case fixes several existing XFAILs and helps with
a follow-on aarch64 patch.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  A SPEC2017 run
on aarch64 showed no differences outside the noise.  Also, I tried
compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target
per cpu directory, using the options -Os -fno-schedule-insns{,2}.
The results below summarise the tests that showed a difference in LOC:

Target               Tests   Good    Bad   Delta    Best   Worst  Median
======               =====   ====    ===   =====    ====   =====  ======
amdgcn-amdhsa           14      7      7       3     -18      10      -1
arm-linux-gnueabihf     16     15      1     -22      -4       2      -1
csky-elf                 6      6      0     -21      -6      -2      -4
hppa64-hp-hpux11.23      5      5      0      -7      -2      -1      -1
ia64-linux-gnu          16     16      0     -70     -15      -1      -3
m32r-elf                53      1     52      64      -2       8       1
mcore-elf                2      2      0      -8      -6      -2      -6
microblaze-elf         285    283      2    -909     -68       4      -1
mmix                     7      7      0   -2101   -2091      -1      -1
msp430-elf               1      1      0      -4      -4      -4      -4
pru-elf                  8      6      2     -12      -6       2      -2
rx-elf                  22     18      4     -40      -5       6      -2
sparc-linux-gnu         15     14      1     -40      -8       1      -2
sparc-wrs-vxworks       15     14      1     -40      -8       1      -2
visium-elf               2      1      1       0      -2       2      -2
xstormy16-elf            1      1      0      -2      -2      -2      -2

with other targets showing no sensitivity to the patch.  The only
target that seems to be negatively affected is m32r-elf; otherwise
the patch seems like an extremely minor but still clear improvement.

OK to install?

Richard


gcc/
	* ira-conflicts.cc (can_use_same_reg_p): Skip over non-matching
	earlyclobbers.

gcc/testsuite/
	* gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c: Remove XFAILs.
	* gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/bic_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/bic_s64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/bic_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/bic_u64.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/scale_f32.c: Likewise.
	* gcc.target/aarch64/sve/acle/asm/scale_f64.c: Likewise.
---
 gcc/ira-conflicts.cc                                         | 3 +++
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c      | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c      | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c      | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c      | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c  | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c    | 2 +-
 gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c    | 2 +-
 19 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/gcc/ira-conflicts.cc b/gcc/ira-conflicts.cc
index 5aa080af421..a4d93c8d734 100644
--- a/gcc/ira-conflicts.cc
+++ b/gcc/ira-conflicts.cc
@@ -398,6 +398,9 @@ can_use_same_reg_p (rtx_insn *insn, int output, int input)
       if (op_alt[input].matches == output)
 	return true;
 
+      if (op_alt[output].earlyclobber)
+	continue;
+
       if (ira_reg_class_intersect[op_alt[input].cl][op_alt[output].cl]
 	  != NO_REGS)
 	return true;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c
index b74ae33e100..e40865fcbc4 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s16.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (asr_wide_x0_s16_z_tied1, svint16_t, uint64_t,
 		 z0 = svasr_wide_z (p0, z0, x0))
 
 /*
-** asr_wide_x0_s16_z_untied: { xfail *-*-* }
+** asr_wide_x0_s16_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.h, p0/z, z1\.h
 **	asr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c
index 8698aef26c6..06e4ca2a030 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s32.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (asr_wide_x0_s32_z_tied1, svint32_t, uint64_t,
 		 z0 = svasr_wide_z (p0, z0, x0))
 
 /*
-** asr_wide_x0_s32_z_untied: { xfail *-*-* }
+** asr_wide_x0_s32_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.s, p0/z, z1\.s
 **	asr	z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c
index 77b1669392d..1f840ca8e57 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/asr_wide_s8.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (asr_wide_x0_s8_z_tied1, svint8_t, uint64_t,
 		 z0 = svasr_wide_z (p0, z0, x0))
 
 /*
-** asr_wide_x0_s8_z_untied: { xfail *-*-* }
+** asr_wide_x0_s8_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.b, p0/z, z1\.b
 **	asr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c
index 9e388e499b8..e02c66947d6 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s32.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_w0_s32_z_tied1, svint32_t, int32_t,
 		 z0 = svbic_z (p0, z0, x0))
 
 /*
-** bic_w0_s32_z_untied: { xfail *-*-* }
+** bic_w0_s32_z_untied:
 **	mov	(z[0-9]+\.s), w0
 **	movprfx	z0\.s, p0/z, z1\.s
 **	bic	z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c
index bf953681547..57c1e535fea 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_s64.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_x0_s64_z_tied1, svint64_t, int64_t,
 		 z0 = svbic_z (p0, z0, x0))
 
 /*
-** bic_x0_s64_z_untied: { xfail *-*-* }
+** bic_x0_s64_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.d, p0/z, z1\.d
 **	bic	z0\.d, p0/m, z0\.d, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c
index b308b599b43..9f08ab40a8c 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u32.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_w0_u32_z_tied1, svuint32_t, uint32_t,
 		 z0 = svbic_z (p0, z0, x0))
 
 /*
-** bic_w0_u32_z_untied: { xfail *-*-* }
+** bic_w0_u32_z_untied:
 **	mov	(z[0-9]+\.s), w0
 **	movprfx	z0\.s, p0/z, z1\.s
 **	bic	z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c
index e82db1e94fd..de84f3af6ff 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/bic_u64.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (bic_x0_u64_z_tied1, svuint64_t, uint64_t,
 		 z0 = svbic_z (p0, z0, x0))
 
 /*
-** bic_x0_u64_z_untied: { xfail *-*-* }
+** bic_x0_u64_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.d, p0/z, z1\.d
 **	bic	z0\.d, p0/m, z0\.d, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c
index 8d63d390984..a0207726144 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s16.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_s16_z_tied1, svint16_t, uint64_t,
 		 z0 = svlsl_wide_z (p0, z0, x0))
 
 /*
-** lsl_wide_x0_s16_z_untied: { xfail *-*-* }
+** lsl_wide_x0_s16_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.h, p0/z, z1\.h
 **	lsl	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c
index acd813df34f..bd67b7006b5 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s32.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_s32_z_tied1, svint32_t, uint64_t,
 		 z0 = svlsl_wide_z (p0, z0, x0))
 
 /*
-** lsl_wide_x0_s32_z_untied: { xfail *-*-* }
+** lsl_wide_x0_s32_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.s, p0/z, z1\.s
 **	lsl	z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c
index 17e8e8685e3..7eb8627041d 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_s8.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_s8_z_tied1, svint8_t, uint64_t,
 		 z0 = svlsl_wide_z (p0, z0, x0))
 
 /*
-** lsl_wide_x0_s8_z_untied: { xfail *-*-* }
+** lsl_wide_x0_s8_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.b, p0/z, z1\.b
 **	lsl	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c
index cff24a85090..482f8d0557b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u16.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_u16_z_tied1, svuint16_t, uint64_t,
 		 z0 = svlsl_wide_z (p0, z0, x0))
 
 /*
-** lsl_wide_x0_u16_z_untied: { xfail *-*-* }
+** lsl_wide_x0_u16_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.h, p0/z, z1\.h
 **	lsl	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c
index 7b1afab4918..612897d24df 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u32.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_u32_z_tied1, svuint32_t, uint64_t,
 		 z0 = svlsl_wide_z (p0, z0, x0))
 
 /*
-** lsl_wide_x0_u32_z_untied: { xfail *-*-* }
+** lsl_wide_x0_u32_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.s, p0/z, z1\.s
 **	lsl	z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c
index df8b1ec86b4..6ca2f9e7da2 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsl_wide_u8.c
@@ -155,7 +155,7 @@ TEST_UNIFORM_ZX (lsl_wide_x0_u8_z_tied1, svuint8_t, uint64_t,
 		 z0 = svlsl_wide_z (p0, z0, x0))
 
 /*
-** lsl_wide_x0_u8_z_untied: { xfail *-*-* }
+** lsl_wide_x0_u8_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.b, p0/z, z1\.b
 **	lsl	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c
index 863b51a2fc5..9110c5aad44 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u16.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (lsr_wide_x0_u16_z_tied1, svuint16_t, uint64_t,
 		 z0 = svlsr_wide_z (p0, z0, x0))
 
 /*
-** lsr_wide_x0_u16_z_untied: { xfail *-*-* }
+** lsr_wide_x0_u16_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.h, p0/z, z1\.h
 **	lsr	z0\.h, p0/m, z0\.h, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c
index 73c2cf86e33..93af4fa4925 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u32.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (lsr_wide_x0_u32_z_tied1, svuint32_t, uint64_t,
 		 z0 = svlsr_wide_z (p0, z0, x0))
 
 /*
-** lsr_wide_x0_u32_z_untied: { xfail *-*-* }
+** lsr_wide_x0_u32_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.s, p0/z, z1\.s
 **	lsr	z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c
index fe44eabda11..2f38139d40b 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/lsr_wide_u8.c
@@ -153,7 +153,7 @@ TEST_UNIFORM_ZX (lsr_wide_x0_u8_z_tied1, svuint8_t, uint64_t,
 		 z0 = svlsr_wide_z (p0, z0, x0))
 
 /*
-** lsr_wide_x0_u8_z_untied: { xfail *-*-* }
+** lsr_wide_x0_u8_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.b, p0/z, z1\.b
 **	lsr	z0\.b, p0/m, z0\.b, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c
index 747f8a6397b..12a1b1d8686 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f32.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (scale_w0_f32_z_tied1, svfloat32_t, int32_t,
 		 z0 = svscale_z (p0, z0, x0))
 
 /*
-** scale_w0_f32_z_untied: { xfail *-*-* }
+** scale_w0_f32_z_untied:
 **	mov	(z[0-9]+\.s), w0
 **	movprfx	z0\.s, p0/z, z1\.s
 **	fscale	z0\.s, p0/m, z0\.s, \1
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c
index 004cbfa3eff..f6b11718584 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/asm/scale_f64.c
@@ -127,7 +127,7 @@ TEST_UNIFORM_ZX (scale_x0_f64_z_tied1, svfloat64_t, int64_t,
 		 z0 = svscale_z (p0, z0, x0))
 
 /*
-** scale_x0_f64_z_untied: { xfail *-*-* }
+** scale_x0_f64_z_untied:
 **	mov	(z[0-9]+\.d), x0
 **	movprfx	z0\.d, p0/z, z1\.d
 **	fscale	z0\.d, p0/m, z0\.d, \1
-- 
2.25.1


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] ira: Don't create copies for earlyclobbered pairs
  2023-05-05 16:59 [PATCH] ira: Don't create copies for earlyclobbered pairs Richard Sandiford
@ 2023-05-08 12:52 ` Vladimir Makarov
  0 siblings, 0 replies; 2+ messages in thread
From: Vladimir Makarov @ 2023-05-08 12:52 UTC (permalink / raw)
  To: gcc-patches, jlaw, richard.sandiford


On 5/5/23 12:59, Richard Sandiford wrote:
> This patch follows on from g:9f635bd13fe9e85872e441b6f3618947f989909a
> ("the previous patch").  To start by quoting that:
>
> If an insn requires two operands to be tied, and the input operand dies
> in the insn, IRA acts as though there were a copy from the input to the
> output with the same execution frequency as the insn.  Allocating the
> same register to the input and the output then saves the cost of a move.
>
> If there is no such tie, but an input operand nevertheless dies
> in the insn, IRA creates a similar move, but with an eighth of the
> frequency.  This helps to ensure that chains of instructions reuse
> registers in a natural way, rather than using arbitrarily different
> registers for no reason.
>
> This heuristic seems to work well in the vast majority of cases.
> However, the problem fixed in the previous patch was that we
> could create a copy for an operand pair even if, for all relevant
> alternatives, the output and input register classes did not have
> any registers in common.  It is then impossible for the output
> operand to reuse the dying input register.
>
> This left unfixed a further case where copies don't make sense:
> there is no point trying to reuse the dying input register if,
> for all relevant alternatives, the output is earlyclobbered and
> the input doesn't match the output.  (Matched earlyclobbers are fine.)
>
> Handling that case fixes several existing XFAILs and helps with
> a follow-on aarch64 patch.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  A SPEC2017 run
> on aarch64 showed no differences outside the noise.  Also, I tried
> compiling gcc.c-torture, gcc.dg, and g++.dg for at least one target
> per cpu directory, using the options -Os -fno-schedule-insns{,2}.
> The results below summarise the tests that showed a difference in LOC:
>
> Target               Tests   Good    Bad   Delta    Best   Worst  Median
> ======               =====   ====    ===   =====    ====   =====  ======
> amdgcn-amdhsa           14      7      7       3     -18      10      -1
> arm-linux-gnueabihf     16     15      1     -22      -4       2      -1
> csky-elf                 6      6      0     -21      -6      -2      -4
> hppa64-hp-hpux11.23      5      5      0      -7      -2      -1      -1
> ia64-linux-gnu          16     16      0     -70     -15      -1      -3
> m32r-elf                53      1     52      64      -2       8       1
> mcore-elf                2      2      0      -8      -6      -2      -6
> microblaze-elf         285    283      2    -909     -68       4      -1
> mmix                     7      7      0   -2101   -2091      -1      -1
> msp430-elf               1      1      0      -4      -4      -4      -4
> pru-elf                  8      6      2     -12      -6       2      -2
> rx-elf                  22     18      4     -40      -5       6      -2
> sparc-linux-gnu         15     14      1     -40      -8       1      -2
> sparc-wrs-vxworks       15     14      1     -40      -8       1      -2
> visium-elf               2      1      1       0      -2       2      -2
> xstormy16-elf            1      1      0      -2      -2      -2      -2
>
> with other targets showing no sensitivity to the patch.  The only
> target that seems to be negatively affected is m32r-elf; otherwise
> the patch seems like an extremely minor but still clear improvement.
>
> OK to install?
>
Yes, Richard.

Thank you for measuring the patch effect.  I wish other people would do 
the same for patches affecting generated code performance.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-05-08 12:52 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-05 16:59 [PATCH] ira: Don't create copies for earlyclobbered pairs Richard Sandiford
2023-05-08 12:52 ` Vladimir Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).