* LRA patch for PR87718
@ 2018-11-22 17:29 Vladimir Makarov
2018-11-24 11:29 ` Christophe Lyon
0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Makarov @ 2018-11-22 17:29 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]
 The following patch fixes
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718
 The patch adds a special treatment for moves with a hard register in
register cost and class calculation.
 The patch was bootstrapped and tested on x86-64 and ppc64.
 I found two testsuite regressions because of the patch. The expected
generated code for PR82361 test is too specific. GCC with the patch
generates the same quality code but with a different hard register on
x86-64. So I just changed the test for PR82361.
 Another test is for ppc64. I think the expected generated code for
this test is wrong. I'll submit a changed test for a discussion later.
 Although I spent much time on the solution and I think it is the
right one, the patch is in very sensitive area of RA and may affect
expected code generation for many targets. I am ready to work on the
new regressions as soon as they are found.
 The patch was committed as rev. 260385.
[-- Attachment #2: pr87718.patch --]
[-- Type: text/x-patch, Size: 11540 bytes --]
Index: ChangeLog
===================================================================
--- ChangeLog (revision 266384)
+++ ChangeLog (working copy)
@@ -1,3 +1,10 @@
+2018-11-22 Vladimir Makarov <vmakarov@redhat.com>
+
+ PR rtl-optimization/87718
+ * ira-costs.c: Remove trailing white-spaces.
+ (record_operand_costs): Add a special treatment for moves
+ involving a hard register.
+
2018-11-22 Uros Bizjak <ubizjak@gmail.com>
* config/i386/i386.c (ix86_avx_emit_vzeroupper): Remove.
Index: ira-costs.c
===================================================================
--- ira-costs.c (revision 266155)
+++ ira-costs.c (working copy)
@@ -1257,7 +1257,7 @@ record_address_regs (machine_mode mode,
add_cost = (move_in_cost[i][rclass] * scale) / 2;
if (INT_MAX - add_cost < pp_costs[k])
pp_costs[k] = INT_MAX;
- else
+ else
pp_costs[k] += add_cost;
}
}
@@ -1283,10 +1283,100 @@ record_operand_costs (rtx_insn *insn, en
{
const char *constraints[MAX_RECOG_OPERANDS];
machine_mode modes[MAX_RECOG_OPERANDS];
- rtx ops[MAX_RECOG_OPERANDS];
rtx set;
int i;
+ if ((set = single_set (insn)) != NULL_RTX
+ /* In rare cases the single set insn might have less 2 operands
+ as the source can be a fixed special reg. */
+ && recog_data.n_operands > 1
+ && recog_data.operand[0] == SET_DEST (set)
+ && recog_data.operand[1] == SET_SRC (set))
+ {
+ int regno, other_regno;
+ rtx dest = SET_DEST (set);
+ rtx src = SET_SRC (set);
+
+ if (GET_CODE (dest) == SUBREG
+ && known_eq (GET_MODE_SIZE (GET_MODE (dest)),
+ GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest)))))
+ dest = SUBREG_REG (dest);
+ if (GET_CODE (src) == SUBREG
+ && known_eq (GET_MODE_SIZE (GET_MODE (src)),
+ GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))))
+ src = SUBREG_REG (src);
+ if (REG_P (src) && REG_P (dest)
+ && (((regno = REGNO (src)) >= FIRST_PSEUDO_REGISTER
+ && (other_regno = REGNO (dest)) < FIRST_PSEUDO_REGISTER)
+ || ((regno = REGNO (dest)) >= FIRST_PSEUDO_REGISTER
+ && (other_regno = REGNO (src)) < FIRST_PSEUDO_REGISTER)))
+ {
+ machine_mode mode = GET_MODE (SET_SRC (set));
+ cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
+ enum reg_class *cost_classes = cost_classes_ptr->classes;
+ reg_class_t rclass, hard_reg_class, pref_class;
+ int cost, k;
+ bool dead_p = find_regno_note (insn, REG_DEAD, REGNO (src));
+
+ hard_reg_class = REGNO_REG_CLASS (other_regno);
+ i = regno == (int) REGNO (src) ? 1 : 0;
+ for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+ {
+ rclass = cost_classes[k];
+ cost = ((i == 0
+ ? ira_register_move_cost[mode][hard_reg_class][rclass]
+ : ira_register_move_cost[mode][rclass][hard_reg_class])
+ * frequency);
+ op_costs[i]->cost[k] = cost;
+ /* If we have assigned a class to this allocno in our
+ first pass, add a cost to this alternative
+ corresponding to what we would add if this allocno
+ were not in the appropriate class. */
+ if (pref)
+ {
+ if ((pref_class = pref[COST_INDEX (regno)]) == NO_REGS)
+ op_costs[i]->cost[k]
+ += ((i == 0 ? ira_memory_move_cost[mode][rclass][0] : 0)
+ + (i == 1 ? ira_memory_move_cost[mode][rclass][1] : 0)
+ * frequency);
+ else if (ira_reg_class_intersect[pref_class][rclass]
+ == NO_REGS)
+ op_costs[i]->cost[k]
+ += (ira_register_move_cost[mode][pref_class][rclass]
+ * frequency);
+ }
+ /* If this insn is a single set copying operand 1 to
+ operand 0 and one operand is an allocno with the
+ other a hard reg or an allocno that prefers a hard
+ register that is in its own register class then we
+ may want to adjust the cost of that register class to
+ -1.
+
+ Avoid the adjustment if the source does not die to
+ avoid stressing of register allocator by preferencing
+ two colliding registers into single class. */
+ if (dead_p
+ && TEST_HARD_REG_BIT (reg_class_contents[rclass], other_regno)
+ && (reg_class_size[(int) rclass]
+ == (ira_reg_class_max_nregs
+ [(int) rclass][(int) GET_MODE(src)])))
+ {
+ if (reg_class_size[rclass] == 1)
+ op_costs[i]->cost[k] = -frequency;
+ else if (in_hard_reg_set_p (reg_class_contents[rclass],
+ GET_MODE(src), other_regno))
+ op_costs[i]->cost[k] = -frequency;
+ }
+ }
+ op_costs[i]->mem_cost
+ = ira_memory_move_cost[mode][hard_reg_class][i] * frequency;
+ if (pref && (pref_class = pref[COST_INDEX (regno)]) != NO_REGS)
+ op_costs[i]->mem_cost
+ += ira_memory_move_cost[mode][pref_class][i] * frequency;
+ return;
+ }
+ }
+
for (i = 0; i < recog_data.n_operands; i++)
{
constraints[i] = recog_data.constraints[i];
@@ -1302,7 +1392,6 @@ record_operand_costs (rtx_insn *insn, en
{
memcpy (op_costs[i], init_cost, struct_costs_size);
- ops[i] = recog_data.operand[i];
if (GET_CODE (recog_data.operand[i]) == SUBREG)
recog_data.operand[i] = SUBREG_REG (recog_data.operand[i]);
@@ -1318,7 +1407,7 @@ record_operand_costs (rtx_insn *insn, en
recog_data.operand[i], 0, ADDRESS, SCRATCH,
frequency * 2);
}
-
+
/* Check for commutative in a separate loop so everything will have
been initialized. We must do this even if one operand is a
constant--see addsi3 in m68k.md. */
@@ -1328,8 +1417,8 @@ record_operand_costs (rtx_insn *insn, en
const char *xconstraints[MAX_RECOG_OPERANDS];
int j;
- /* Handle commutative operands by swapping the constraints.
- We assume the modes are the same. */
+ /* Handle commutative operands by swapping the
+ constraints. We assume the modes are the same. */
for (j = 0; j < recog_data.n_operands; j++)
xconstraints[j] = constraints[j];
@@ -1342,69 +1431,6 @@ record_operand_costs (rtx_insn *insn, en
record_reg_classes (recog_data.n_alternatives, recog_data.n_operands,
recog_data.operand, modes,
constraints, insn, pref);
-
- /* If this insn is a single set copying operand 1 to operand 0 and
- one operand is an allocno with the other a hard reg or an allocno
- that prefers a hard register that is in its own register class
- then we may want to adjust the cost of that register class to -1.
-
- Avoid the adjustment if the source does not die to avoid
- stressing of register allocator by preferencing two colliding
- registers into single class.
-
- Also avoid the adjustment if a copy between hard registers of the
- class is expensive (ten times the cost of a default copy is
- considered arbitrarily expensive). This avoids losing when the
- preferred class is very expensive as the source of a copy
- instruction. */
- if ((set = single_set (insn)) != NULL_RTX
- /* In rare cases the single set insn might have less 2 operands
- as the source can be a fixed special reg. */
- && recog_data.n_operands > 1
- && ops[0] == SET_DEST (set) && ops[1] == SET_SRC (set))
- {
- int regno, other_regno;
- rtx dest = SET_DEST (set);
- rtx src = SET_SRC (set);
-
- if (GET_CODE (dest) == SUBREG
- && known_eq (GET_MODE_SIZE (GET_MODE (dest)),
- GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest)))))
- dest = SUBREG_REG (dest);
- if (GET_CODE (src) == SUBREG
- && known_eq (GET_MODE_SIZE (GET_MODE (src)),
- GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))))
- src = SUBREG_REG (src);
- if (REG_P (src) && REG_P (dest)
- && find_regno_note (insn, REG_DEAD, REGNO (src))
- && (((regno = REGNO (src)) >= FIRST_PSEUDO_REGISTER
- && (other_regno = REGNO (dest)) < FIRST_PSEUDO_REGISTER)
- || ((regno = REGNO (dest)) >= FIRST_PSEUDO_REGISTER
- && (other_regno = REGNO (src)) < FIRST_PSEUDO_REGISTER)))
- {
- machine_mode mode = GET_MODE (src);
- cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
- enum reg_class *cost_classes = cost_classes_ptr->classes;
- reg_class_t rclass;
- int k;
-
- i = regno == (int) REGNO (src) ? 1 : 0;
- for (k = cost_classes_ptr->num - 1; k >= 0; k--)
- {
- rclass = cost_classes[k];
- if (TEST_HARD_REG_BIT (reg_class_contents[rclass], other_regno)
- && (reg_class_size[(int) rclass]
- == ira_reg_class_max_nregs [(int) rclass][(int) mode]))
- {
- if (reg_class_size[rclass] == 1)
- op_costs[i]->cost[k] = -frequency;
- else if (in_hard_reg_set_p (reg_class_contents[rclass],
- mode, other_regno))
- op_costs[i]->cost[k] = -frequency;
- }
- }
- }
- }
}
\f
@@ -1457,7 +1483,7 @@ scan_one_insn (rtx_insn *insn)
/* If this insn loads a parameter from its stack slot, then it
represents a savings, rather than a cost, if the parameter is
- stored in memory. Record this fact.
+ stored in memory. Record this fact.
Similarly if we're loading other constants from memory (constant
pool, TOC references, small data areas, etc) and this is the only
@@ -1468,7 +1494,7 @@ scan_one_insn (rtx_insn *insn)
mem_cost might result in it being loaded using the specialized
instruction into a register, then stored into stack and loaded
again from the stack. See PR52208.
-
+
Don't do this if SET_SRC (set) has side effect. See PR56124. */
if (set != 0 && REG_P (SET_DEST (set)) && MEM_P (SET_SRC (set))
&& (note = find_reg_note (insn, REG_EQUIV, NULL_RTX)) != NULL_RTX
@@ -1766,7 +1792,7 @@ find_costs_and_classes (FILE *dump_file)
a = ALLOCNO_NEXT_REGNO_ALLOCNO (a))
{
int *a_costs, *p_costs;
-
+
a_num = ALLOCNO_NUM (a);
if ((flag_ira_region == IRA_REGION_ALL
|| flag_ira_region == IRA_REGION_MIXED)
@@ -1936,7 +1962,7 @@ find_costs_and_classes (FILE *dump_file)
int a_num = ALLOCNO_NUM (a);
int *total_a_costs = COSTS (total_allocno_costs, a_num)->cost;
int *a_costs = COSTS (costs, a_num)->cost;
-
+
if (aclass == NO_REGS)
best = NO_REGS;
else
@@ -1998,7 +2024,7 @@ find_costs_and_classes (FILE *dump_file)
}
}
}
-
+
if (internal_flag_ira_verbose > 4 && dump_file)
{
if (allocno_p)
@@ -2081,7 +2107,7 @@ process_bb_node_for_hard_reg_moves (ira_
int cost;
enum reg_class hard_reg_class;
machine_mode mode;
-
+
mode = ALLOCNO_MODE (a);
hard_reg_class = REGNO_REG_CLASS (hard_regno);
ira_init_register_move_cost_if_necessary (mode);
Index: testsuite/ChangeLog
===================================================================
--- testsuite/ChangeLog (revision 266384)
+++ testsuite/ChangeLog (working copy)
@@ -1,3 +1,9 @@
+2018-11-22 Vladimir Makarov <vmakarov@redhat.com>
+
+ PR rtl-optimization/87718
+ * gcc.target/i386/pr82361-1.c: Check only the first operand of
+ moves.
+
2018-11-22 Thomas Preud'homme <thomas.preudhomme@linaro.org>
* gcc.target/arm/pr85434.c: New test.
Index: testsuite/gcc.target/i386/pr82361-1.c
===================================================================
--- testsuite/gcc.target/i386/pr82361-1.c (revision 266155)
+++ testsuite/gcc.target/i386/pr82361-1.c (working copy)
@@ -6,7 +6,7 @@
/* { dg-final { scan-assembler-not "movl\t%eax, %eax" } } */
/* FIXME: We are still not able to optimize the modulo in f1/f2, only manage
one. */
-/* { dg-final { scan-assembler-times "movl\t%edx, %edx" 2 } } */
+/* { dg-final { scan-assembler-times "movl\t%edx" 2 } } */
void
f1 (unsigned int a, unsigned int b)
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: LRA patch for PR87718
2018-11-22 17:29 LRA patch for PR87718 Vladimir Makarov
@ 2018-11-24 11:29 ` Christophe Lyon
2018-12-20 10:38 ` Sam Tebbs
0 siblings, 1 reply; 5+ messages in thread
From: Christophe Lyon @ 2018-11-24 11:29 UTC (permalink / raw)
To: Vladimir Makarov; +Cc: gcc Patches
On Thu, 22 Nov 2018 at 18:30, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
> The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718
>
> The patch adds a special treatment for moves with a hard register in
> register cost and class calculation.
>
> The patch was bootstrapped and tested on x86-64 and ppc64.
>
> I found two testsuite regressions because of the patch. The expected
> generated code for PR82361 test is too specific. GCC with the patch
> generates the same quality code but with a different hard register on
> x86-64. So I just changed the test for PR82361.
>
> Another test is for ppc64. I think the expected generated code for
> this test is wrong. I'll submit a changed test for a discussion later.
>
> Although I spent much time on the solution and I think it is the
> right one, the patch is in very sensitive area of RA and may affect
> expected code generation for many targets. I am ready to work on the
> new regressions as soon as they are found.
>
> The patch was committed as rev. 260385.
>
Hi,
This patch introduced at least several ICEs on arm targets:
on arm-none-linux-gnueabi --with-cpu=cortex-a9:
Executed from: gcc.target/arm/arm.exp
gcc.target/arm/attr-neon-fp16.c (internal compiler error)
gcc.target/arm/pr51968.c (internal compiler error)
gcc.target/arm/pr68620.c (internal compiler error)
Executed from: gcc.target/arm/simd/simd.exp
gcc.target/arm/simd/vextp16_1.c (internal compiler error)
gcc.target/arm/simd/vextp8_1.c (internal compiler error)
gcc.target/arm/simd/vexts16_1.c (internal compiler error)
gcc.target/arm/simd/vexts8_1.c (internal compiler error)
gcc.target/arm/simd/vextu16_1.c (internal compiler error)
gcc.target/arm/simd/vextu8_1.c (internal compiler error)
gcc.target/arm/simd/vrev16p8_1.c (internal compiler error)
gcc.target/arm/simd/vrev16s8_1.c (internal compiler error)
gcc.target/arm/simd/vrev16u8_1.c (internal compiler error)
gcc.target/arm/simd/vrev32p16_1.c (internal compiler error)
gcc.target/arm/simd/vrev32p8_1.c (internal compiler error)
gcc.target/arm/simd/vrev32s16_1.c (internal compiler error)
gcc.target/arm/simd/vrev32s8_1.c (internal compiler error)
gcc.target/arm/simd/vrev32u16_1.c (internal compiler error)
gcc.target/arm/simd/vrev32u8_1.c (internal compiler error)
gcc.target/arm/simd/vrev64f32_1.c (internal compiler error)
gcc.target/arm/simd/vrev64p16_1.c (internal compiler error)
gcc.target/arm/simd/vrev64p8_1.c (internal compiler error)
gcc.target/arm/simd/vrev64s16_1.c (internal compiler error)
gcc.target/arm/simd/vrev64s32_1.c (internal compiler error)
gcc.target/arm/simd/vrev64s8_1.c (internal compiler error)
gcc.target/arm/simd/vrev64u16_1.c (internal compiler error)
gcc.target/arm/simd/vrev64u32_1.c (internal compiler error)
gcc.target/arm/simd/vrev64u8_1.c (internal compiler error)
arm-none-linux-gnueabihf shows only 1 ICE:
gcc.target/arm/pr51968.c (internal compiler error)
There are other regressions on the same targets, but not ICEs.
I can report them later.
Thanks,
Christophe
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: LRA patch for PR87718
2018-11-24 11:29 ` Christophe Lyon
@ 2018-12-20 10:38 ` Sam Tebbs
2018-12-20 11:39 ` Sam Tebbs
0 siblings, 1 reply; 5+ messages in thread
From: Sam Tebbs @ 2018-12-20 10:38 UTC (permalink / raw)
To: Christophe Lyon, Vladimir Makarov; +Cc: gcc Patches, nd
[-- Attachment #1: Type: text/plain, Size: 2787 bytes --]
On 11/24/18 11:29 AM, Christophe Lyon wrote:
> On Thu, 22 Nov 2018 at 18:30, Vladimir Makarov <vmakarov@redhat.com>
> wrote:
>> The following patch fixes
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718
>>
>> The patch adds a special treatment for moves with a hard register in
>> register cost and class calculation.
>>
>> The patch was bootstrapped and tested on x86-64 and ppc64.
>>
>> I found two testsuite regressions because of the patch. The expected
>> generated code for PR82361 test is too specific. GCC with the patch
>> generates the same quality code but with a different hard register on
>> x86-64. So I just changed the test for PR82361.
>>
>> Another test is for ppc64. I think the expected generated code for
>> this test is wrong. I'll submit a changed test for a discussion later.
>>
>> Although I spent much time on the solution and I think it is the
>> right one, the patch is in very sensitive area of RA and may affect
>> expected code generation for many targets. I am ready to work on the
>> new regressions as soon as they are found.
>>
>> The patch was committed as rev. 260385.
>>
> Hi,
>
> This patch introduced at least several ICEs on arm targets:
> on arm-none-linux-gnueabi --with-cpu=cortex-a9:
<snip>
> There are other regressions on the same targets, but not ICEs.
> I can report them later.
>
> Thanks,
>
> Christophe
Hi Christophe and Vladimir,
Here are the regressions seen on arm-none-linux-gnueabihf and arm-none-eabi.
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
ldrh\\tr[0-9]+ 2
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vld1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
vmov\\.f16\\ts[0-9]+, r[0-9]+ 2
FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler
vmov(\\.f16)?\\tr[0-9]+, s[0-9]+
FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov(\\.f16)?\\ts0,
r[0-9]+
FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\tr[0-9]+,
s[0-2] 2
FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\ts0,
r[0-9]+ 2
I didn't see a bug report for these, so I will open one.
It is not clear if the test cases should be adjusted because of your
patch or if they are failing because of incorrect codegen. Attached is
the code generated for armv8_2-fp16-move-1.c (one of the test files
failing) with and without your patch.
Full command line used to compile and test armv8_2-fp16-move-1.c:
bin/gcc armv8_2-fp16-move-1.c -fno-diagnostics-show-caret
-fno-diagnostics-show-line-numbers -fdiagnostics-color=never -O2
-mfpu=fp-armv8 -march=armv8.2-a+fp16 -mfloat-abi=hard -ffat-lto-objects
-fno-ident -S -o armv8_2-fp16-move-1.s.
[-- Attachment #2: armv8_2-fp16-move-1-with.s --]
[-- Type: text/plain, Size: 8451 bytes --]
.arch armv8.2-a
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 38, 1
.eabi_attribute 18, 4
.file "armv8_2-fp16-move-1.c"
.text
.align 1
.p2align 2,,3
.global test_load_1
.arch armv8.2-a
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_load_1, %function
test_load_1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vld1.16 {d0[0]}, [r0]
bx lr
.size test_load_1, .-test_load_1
.align 1
.p2align 2,,3
.global test_load_2
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_load_2, %function
test_load_2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
add r3, r0, r1, lsl #1
vld1.16 {d0[0]}, [r3]
bx lr
.size test_load_2, .-test_load_2
.align 1
.p2align 2,,3
.global test_store_1
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_store_1, %function
test_store_1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vst1.16 {d0[0]}, [r0]
bx lr
.size test_store_1, .-test_store_1
.align 1
.p2align 2,,3
.global test_store_2
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_store_2, %function
test_store_2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vmov.f16 r3, s0 @ __fp16
strh r3, [r0, r1, lsl #1] @ __fp16
bx lr
.size test_store_2, .-test_store_2
.align 1
.p2align 2,,3
.global test_load_store_1
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_load_store_1, %function
test_load_store_1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldrh r3, [r2, r1, lsl #1] @ __fp16
strh r3, [r0, r1, lsl #1] @ __fp16
bx lr
.size test_load_store_1, .-test_load_store_1
.align 1
.p2align 2,,3
.global test_load_store_2
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_load_store_2, %function
test_load_store_2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
adds r1, r1, #2
add r3, r2, r1, lsl #1
add r0, r0, r1, lsl #1
vld1.16 {d0[0]}, [r3]
vmov.f16 r3, s0 @ __fp16
strh r3, [r0, #-4] @ __fp16
bx lr
.size test_load_store_2, .-test_load_store_2
.align 1
.p2align 2,,3
.global test_select_1
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_1, %function
test_select_1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r0, #0
vseleq.f16 s0, s1, s0
bx lr
.size test_select_1, .-test_select_1
.align 1
.p2align 2,,3
.global test_select_2
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_2, %function
test_select_2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r0, #0
vseleq.f16 s0, s1, s0
bx lr
.size test_select_2, .-test_select_2
.align 1
.p2align 2,,3
.global test_select_3
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_3, %function
test_select_3:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s15, s0
vcvtb.f32.f16 s14, s1
vcmp.f32 s15, s14
vmrs APSR_nzcv, FPSCR
vseleq.f16 s0, s1, s2
bx lr
.size test_select_3, .-test_select_3
.align 1
.p2align 2,,3
.global test_select_4
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_4, %function
test_select_4:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s15, s0
vcvtb.f32.f16 s14, s1
vcmp.f32 s15, s14
vmrs APSR_nzcv, FPSCR
vseleq.f16 s0, s2, s1
bx lr
.size test_select_4, .-test_select_4
.align 1
.p2align 2,,3
.global test_select_5
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_5, %function
test_select_5:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s15, s1
vcmpe.f32 s0, s15
vmrs APSR_nzcv, FPSCR
bmi .L17
vmov s1, s2 @ __fp16
.L17:
vmov s0, s1 @ __fp16
bx lr
.size test_select_5, .-test_select_5
.align 1
.p2align 2,,3
.global test_select_6
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_6, %function
test_select_6:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s15, s1
vcmpe.f32 s0, s15
vmrs APSR_nzcv, FPSCR
bls .L19
vmov s1, s2 @ __fp16
.L19:
vmov s0, s1 @ __fp16
bx lr
.size test_select_6, .-test_select_6
.align 1
.p2align 2,,3
.global test_select_7
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_7, %function
test_select_7:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s15, s0
vcvtb.f32.f16 s14, s1
vcmpe.f32 s15, s14
vmrs APSR_nzcv, FPSCR
vselgt.f16 s0, s1, s2
bx lr
.size test_select_7, .-test_select_7
.align 1
.p2align 2,,3
.global test_select_8
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_8, %function
test_select_8:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s15, s0
vcvtb.f32.f16 s14, s1
vcmpe.f32 s15, s14
vmrs APSR_nzcv, FPSCR
vselge.f16 s0, s1, s2
bx lr
.size test_select_8, .-test_select_8
.align 1
.p2align 2,,3
.global test_compare_1
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_1, %function
test_compare_1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmp.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it ne
movne r0, #0
bx lr
.size test_compare_1, .-test_compare_1
.align 1
.p2align 2,,3
.global test_compare_
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_, %function
test_compare_:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmp.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it eq
moveq r0, #0
bx lr
.size test_compare_, .-test_compare_
.align 1
.p2align 2,,3
.global test_compare_2
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_2, %function
test_compare_2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmpe.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it le
movle r0, #0
bx lr
.size test_compare_2, .-test_compare_2
.align 1
.p2align 2,,3
.global test_compare_3
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_3, %function
test_compare_3:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmpe.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it lt
movlt r0, #0
bx lr
.size test_compare_3, .-test_compare_3
.align 1
.p2align 2,,3
.global test_compare_4
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_4, %function
test_compare_4:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmpe.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it pl
movpl r0, #0
bx lr
.size test_compare_4, .-test_compare_4
.align 1
.p2align 2,,3
.global test_compare_5
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_5, %function
test_compare_5:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmpe.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it hi
movhi r0, #0
bx lr
.size test_compare_5, .-test_compare_5
.section .note.GNU-stack,"",%progbits
[-- Attachment #3: armv8_2-fp16-move-1-without.s --]
[-- Type: text/plain, Size: 8516 bytes --]
.arch armv8.2-a
.eabi_attribute 28, 1
.eabi_attribute 20, 1
.eabi_attribute 21, 1
.eabi_attribute 23, 3
.eabi_attribute 24, 1
.eabi_attribute 25, 1
.eabi_attribute 26, 2
.eabi_attribute 30, 2
.eabi_attribute 34, 1
.eabi_attribute 38, 1
.eabi_attribute 18, 4
.file "armv8_2-fp16-move-1.c"
.text
.align 1
.p2align 2,,3
.global test_load_1
.arch armv8.2-a
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_load_1, %function
test_load_1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vld1.16 {d0[0]}, [r0]
bx lr
.size test_load_1, .-test_load_1
.align 1
.p2align 2,,3
.global test_load_2
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_load_2, %function
test_load_2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
add r3, r0, r1, lsl #1
vld1.16 {d0[0]}, [r3]
bx lr
.size test_load_2, .-test_load_2
.align 1
.p2align 2,,3
.global test_store_1
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_store_1, %function
test_store_1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vst1.16 {d0[0]}, [r0]
bx lr
.size test_store_1, .-test_store_1
.align 1
.p2align 2,,3
.global test_store_2
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_store_2, %function
test_store_2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vmov.f16 r3, s0 @ __fp16
strh r3, [r0, r1, lsl #1] @ __fp16
bx lr
.size test_store_2, .-test_store_2
.align 1
.p2align 2,,3
.global test_load_store_1
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_load_store_1, %function
test_load_store_1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vmov.f16 s0, r3 @ __fp16
ldrh r3, [r2, r1, lsl #1] @ __fp16
strh r3, [r0, r1, lsl #1] @ __fp16
bx lr
.size test_load_store_1, .-test_load_store_1
.align 1
.p2align 2,,3
.global test_load_store_2
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_load_store_2, %function
test_load_store_2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
adds r1, r1, #2
add r0, r0, r1, lsl #1
ldrh r3, [r2, r1, lsl #1] @ __fp16
vmov.f16 s0, r3 @ __fp16
strh r3, [r0, #-4] @ __fp16
bx lr
.size test_load_store_2, .-test_load_store_2
.align 1
.p2align 2,,3
.global test_select_1
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_1, %function
test_select_1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r0, #0
vseleq.f16 s0, s1, s0
bx lr
.size test_select_1, .-test_select_1
.align 1
.p2align 2,,3
.global test_select_2
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_2, %function
test_select_2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
cmp r0, #0
vseleq.f16 s0, s1, s0
bx lr
.size test_select_2, .-test_select_2
.align 1
.p2align 2,,3
.global test_select_3
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_3, %function
test_select_3:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s15, s0
vcvtb.f32.f16 s14, s1
vcmp.f32 s15, s14
vmrs APSR_nzcv, FPSCR
vseleq.f16 s0, s1, s2
bx lr
.size test_select_3, .-test_select_3
.align 1
.p2align 2,,3
.global test_select_4
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_4, %function
test_select_4:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s15, s0
vcvtb.f32.f16 s14, s1
vcmp.f32 s15, s14
vmrs APSR_nzcv, FPSCR
vseleq.f16 s0, s2, s1
bx lr
.size test_select_4, .-test_select_4
.align 1
.p2align 2,,3
.global test_select_5
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_5, %function
test_select_5:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s14, s1
vmov s15, s1 @ __fp16
vcmpe.f32 s0, s14
vmrs APSR_nzcv, FPSCR
bmi .L17
vmov s15, s2 @ __fp16
.L17:
vmov s0, s15 @ __fp16
bx lr
.size test_select_5, .-test_select_5
.align 1
.p2align 2,,3
.global test_select_6
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_6, %function
test_select_6:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s14, s1
vmov s15, s1 @ __fp16
vcmpe.f32 s0, s14
vmrs APSR_nzcv, FPSCR
bls .L19
vmov s15, s2 @ __fp16
.L19:
vmov s0, s15 @ __fp16
bx lr
.size test_select_6, .-test_select_6
.align 1
.p2align 2,,3
.global test_select_7
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_7, %function
test_select_7:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s15, s0
vcvtb.f32.f16 s14, s1
vcmpe.f32 s15, s14
vmrs APSR_nzcv, FPSCR
vselgt.f16 s0, s1, s2
bx lr
.size test_select_7, .-test_select_7
.align 1
.p2align 2,,3
.global test_select_8
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_select_8, %function
test_select_8:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s15, s0
vcvtb.f32.f16 s14, s1
vcmpe.f32 s15, s14
vmrs APSR_nzcv, FPSCR
vselge.f16 s0, s1, s2
bx lr
.size test_select_8, .-test_select_8
.align 1
.p2align 2,,3
.global test_compare_1
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_1, %function
test_compare_1:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmp.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it ne
movne r0, #0
bx lr
.size test_compare_1, .-test_compare_1
.align 1
.p2align 2,,3
.global test_compare_
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_, %function
test_compare_:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmp.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it eq
moveq r0, #0
bx lr
.size test_compare_, .-test_compare_
.align 1
.p2align 2,,3
.global test_compare_2
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_2, %function
test_compare_2:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmpe.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it le
movle r0, #0
bx lr
.size test_compare_2, .-test_compare_2
.align 1
.p2align 2,,3
.global test_compare_3
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_3, %function
test_compare_3:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmpe.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it lt
movlt r0, #0
bx lr
.size test_compare_3, .-test_compare_3
.align 1
.p2align 2,,3
.global test_compare_4
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_4, %function
test_compare_4:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmpe.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it pl
movpl r0, #0
bx lr
.size test_compare_4, .-test_compare_4
.align 1
.p2align 2,,3
.global test_compare_5
.syntax unified
.thumb
.thumb_func
.fpu fp-armv8
.type test_compare_5, %function
test_compare_5:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16 s0, s0
vcvtb.f32.f16 s1, s1
mov r0, #-1
vcmpe.f32 s0, s1
vmrs APSR_nzcv, FPSCR
it hi
movhi r0, #0
bx lr
.size test_compare_5, .-test_compare_5
.section .note.GNU-stack,"",%progbits
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: LRA patch for PR87718
2018-12-20 10:38 ` Sam Tebbs
@ 2018-12-20 11:39 ` Sam Tebbs
0 siblings, 0 replies; 5+ messages in thread
From: Sam Tebbs @ 2018-12-20 11:39 UTC (permalink / raw)
To: Christophe Lyon, Vladimir Makarov; +Cc: gcc Patches, nd
On 12/20/18 10:38 AM, Sam Tebbs wrote:
> On 11/24/18 11:29 AM, Christophe Lyon wrote:
>
>> On Thu, 22 Nov 2018 at 18:30, Vladimir Makarov <vmakarov@redhat.com>
>> wrote:
>>> The following patch fixes
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718
>>>
>>> The patch adds a special treatment for moves with a hard register in
>>> register cost and class calculation.
>>>
>>> The patch was bootstrapped and tested on x86-64 and ppc64.
>>>
>>> I found two testsuite regressions because of the patch. The expected
>>> generated code for PR82361 test is too specific. GCC with the patch
>>> generates the same quality code but with a different hard register on
>>> x86-64. So I just changed the test for PR82361.
>>>
>>> Another test is for ppc64. I think the expected generated code for
>>> this test is wrong. I'll submit a changed test for a discussion later.
>>>
>>> Although I spent much time on the solution and I think it is the
>>> right one, the patch is in very sensitive area of RA and may affect
>>> expected code generation for many targets. I am ready to work on the
>>> new regressions as soon as they are found.
>>>
>>> The patch was committed as rev. 260385.
>>>
>> Hi,
>>
>> This patch introduced at least several ICEs on arm targets:
>> on arm-none-linux-gnueabi --with-cpu=cortex-a9:
> <snip>
>> There are other regressions on the same targets, but not ICEs.
>> I can report them later.
>>
>> Thanks,
>>
>> Christophe
> Hi Christophe and Vladimir,
>
> Here are the regressions seen on arm-none-linux-gnueabihf and arm-none-eabi.
>
> FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
> ldrh\\tr[0-9]+ 2
> FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
> vld1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
> FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
> vmov\\.f16\\ts[0-9]+, r[0-9]+ 2
> FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler
> vmov(\\.f16)?\\tr[0-9]+, s[0-9]+
> FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov(\\.f16)?\\ts0,
> r[0-9]+
> FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\tr[0-9]+,
> s[0-2] 2
> FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\ts0,
> r[0-9]+ 2
>
> I didn't see a bug report for these, so I will open one.
>
> It is not clear if the test cases should be adjusted because of your
> patch or if they are failing because of incorrect codegen. Attached is
> the code generated for armv8_2-fp16-move-1.c (one of the test files
> failing) with and without your patch.
>
> Full command line used to compile and test armv8_2-fp16-move-1.c:
>
> bin/gcc armv8_2-fp16-move-1.c -fno-diagnostics-show-caret
> -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -O2
> -mfpu=fp-armv8 -march=armv8.2-a+fp16 -mfloat-abi=hard -ffat-lto-objects
> -fno-ident -S -o armv8_2-fp16-move-1.s.
Reported as PR 88560 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88560).
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: LRA patch for PR87718
@ 2018-11-23 8:04 Uros Bizjak
0 siblings, 0 replies; 5+ messages in thread
From: Uros Bizjak @ 2018-11-23 8:04 UTC (permalink / raw)
To: gcc-patches; +Cc: Vladimir Makarov
Hello!
> The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718
>
> The patch adds a special treatment for moves with a hard register in register cost and class
> calculation.
>
> The patch was bootstrapped and tested on x86-64 and ppc64.
>
> I found two testsuite regressions because of the patch. The expected generated code for
> PR82361 test is too specific. GCC with the patch generates the same quality code but with a
> different hard register on x86-64. So I just changed the test for PR82361. Another test is for
> ppc64. I think the expected generated code for this test is wrong. I'll submit a changed test
> for a discussion later. Although I spent much time on the solution and I think it is the right one,
> the patch is in very sensitive area of RA and may affect expected code generation for many
> targets. I am ready to work on the new regressions as soon as they are found.
The patch regressed:
FAIL: gcc.target/i386/pr22076.c scan-assembler-not movl
FAIL: gcc.target/i386/pr81563.c scan-assembler-times movl[\\t
]*-4\\(%ebp\\),[\\t ]*%edi 1
FAIL: gcc.target/i386/pr81563.c scan-assembler-times movl[\\t
]*-8\\(%ebp\\),[\\t ]*%esi 1
on 32 bit x86 target.
The PR22076 moves a value from mm0 via integer register to a volatile location:
movq .LC1, %mm0
paddb .LC0, %mm0
movq %mm0, 16(%esp)
movl 16(%esp), %eax
movl 20(%esp), %edx
movl %eax, (%esp)
movl %edx, 4(%esp)
movq (%esp), %mm0
addl $28, %esp
where before the patch we had:
movq .LC1, %mm0
paddb .LC0, %mm0
movq %mm0, 8(%esp)
movq 8(%esp), %mm0
addl $20, %esp
The PR81563 looks like a testsuite issue, where the compiler now
allocates call-clobbered register for a value that lives across the
call.
Uros.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-12-20 11:16 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-22 17:29 LRA patch for PR87718 Vladimir Makarov
2018-11-24 11:29 ` Christophe Lyon
2018-12-20 10:38 ` Sam Tebbs
2018-12-20 11:39 ` Sam Tebbs
2018-11-23 8:04 Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).