* LRA patch for PR87718 @ 2018-11-22 17:29 Vladimir Makarov 2018-11-24 11:29 ` Christophe Lyon 0 siblings, 1 reply; 5+ messages in thread From: Vladimir Makarov @ 2018-11-22 17:29 UTC (permalink / raw) To: gcc-patches [-- Attachment #1: Type: text/plain, Size: 1006 bytes --]  The following patch fixes https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718  The patch adds a special treatment for moves with a hard register in register cost and class calculation.  The patch was bootstrapped and tested on x86-64 and ppc64.  I found two testsuite regressions because of the patch. The expected generated code for PR82361 test is too specific. GCC with the patch generates the same quality code but with a different hard register on x86-64. So I just changed the test for PR82361.  Another test is for ppc64. I think the expected generated code for this test is wrong. I'll submit a changed test for a discussion later.  Although I spent much time on the solution and I think it is the right one, the patch is in very sensitive area of RA and may affect expected code generation for many targets. I am ready to work on the new regressions as soon as they are found.  The patch was committed as rev. 260385. [-- Attachment #2: pr87718.patch --] [-- Type: text/x-patch, Size: 11540 bytes --] Index: ChangeLog =================================================================== --- ChangeLog (revision 266384) +++ ChangeLog (working copy) @@ -1,3 +1,10 @@ +2018-11-22 Vladimir Makarov <vmakarov@redhat.com> + + PR rtl-optimization/87718 + * ira-costs.c: Remove trailing white-spaces. + (record_operand_costs): Add a special treatment for moves + involving a hard register. + 2018-11-22 Uros Bizjak <ubizjak@gmail.com> * config/i386/i386.c (ix86_avx_emit_vzeroupper): Remove. Index: ira-costs.c =================================================================== --- ira-costs.c (revision 266155) +++ ira-costs.c (working copy) @@ -1257,7 +1257,7 @@ record_address_regs (machine_mode mode, add_cost = (move_in_cost[i][rclass] * scale) / 2; if (INT_MAX - add_cost < pp_costs[k]) pp_costs[k] = INT_MAX; - else + else pp_costs[k] += add_cost; } } @@ -1283,10 +1283,100 @@ record_operand_costs (rtx_insn *insn, en { const char *constraints[MAX_RECOG_OPERANDS]; machine_mode modes[MAX_RECOG_OPERANDS]; - rtx ops[MAX_RECOG_OPERANDS]; rtx set; int i; + if ((set = single_set (insn)) != NULL_RTX + /* In rare cases the single set insn might have less 2 operands + as the source can be a fixed special reg. */ + && recog_data.n_operands > 1 + && recog_data.operand[0] == SET_DEST (set) + && recog_data.operand[1] == SET_SRC (set)) + { + int regno, other_regno; + rtx dest = SET_DEST (set); + rtx src = SET_SRC (set); + + if (GET_CODE (dest) == SUBREG + && known_eq (GET_MODE_SIZE (GET_MODE (dest)), + GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest))))) + dest = SUBREG_REG (dest); + if (GET_CODE (src) == SUBREG + && known_eq (GET_MODE_SIZE (GET_MODE (src)), + GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))) + src = SUBREG_REG (src); + if (REG_P (src) && REG_P (dest) + && (((regno = REGNO (src)) >= FIRST_PSEUDO_REGISTER + && (other_regno = REGNO (dest)) < FIRST_PSEUDO_REGISTER) + || ((regno = REGNO (dest)) >= FIRST_PSEUDO_REGISTER + && (other_regno = REGNO (src)) < FIRST_PSEUDO_REGISTER))) + { + machine_mode mode = GET_MODE (SET_SRC (set)); + cost_classes_t cost_classes_ptr = regno_cost_classes[regno]; + enum reg_class *cost_classes = cost_classes_ptr->classes; + reg_class_t rclass, hard_reg_class, pref_class; + int cost, k; + bool dead_p = find_regno_note (insn, REG_DEAD, REGNO (src)); + + hard_reg_class = REGNO_REG_CLASS (other_regno); + i = regno == (int) REGNO (src) ? 1 : 0; + for (k = cost_classes_ptr->num - 1; k >= 0; k--) + { + rclass = cost_classes[k]; + cost = ((i == 0 + ? ira_register_move_cost[mode][hard_reg_class][rclass] + : ira_register_move_cost[mode][rclass][hard_reg_class]) + * frequency); + op_costs[i]->cost[k] = cost; + /* If we have assigned a class to this allocno in our + first pass, add a cost to this alternative + corresponding to what we would add if this allocno + were not in the appropriate class. */ + if (pref) + { + if ((pref_class = pref[COST_INDEX (regno)]) == NO_REGS) + op_costs[i]->cost[k] + += ((i == 0 ? ira_memory_move_cost[mode][rclass][0] : 0) + + (i == 1 ? ira_memory_move_cost[mode][rclass][1] : 0) + * frequency); + else if (ira_reg_class_intersect[pref_class][rclass] + == NO_REGS) + op_costs[i]->cost[k] + += (ira_register_move_cost[mode][pref_class][rclass] + * frequency); + } + /* If this insn is a single set copying operand 1 to + operand 0 and one operand is an allocno with the + other a hard reg or an allocno that prefers a hard + register that is in its own register class then we + may want to adjust the cost of that register class to + -1. + + Avoid the adjustment if the source does not die to + avoid stressing of register allocator by preferencing + two colliding registers into single class. */ + if (dead_p + && TEST_HARD_REG_BIT (reg_class_contents[rclass], other_regno) + && (reg_class_size[(int) rclass] + == (ira_reg_class_max_nregs + [(int) rclass][(int) GET_MODE(src)]))) + { + if (reg_class_size[rclass] == 1) + op_costs[i]->cost[k] = -frequency; + else if (in_hard_reg_set_p (reg_class_contents[rclass], + GET_MODE(src), other_regno)) + op_costs[i]->cost[k] = -frequency; + } + } + op_costs[i]->mem_cost + = ira_memory_move_cost[mode][hard_reg_class][i] * frequency; + if (pref && (pref_class = pref[COST_INDEX (regno)]) != NO_REGS) + op_costs[i]->mem_cost + += ira_memory_move_cost[mode][pref_class][i] * frequency; + return; + } + } + for (i = 0; i < recog_data.n_operands; i++) { constraints[i] = recog_data.constraints[i]; @@ -1302,7 +1392,6 @@ record_operand_costs (rtx_insn *insn, en { memcpy (op_costs[i], init_cost, struct_costs_size); - ops[i] = recog_data.operand[i]; if (GET_CODE (recog_data.operand[i]) == SUBREG) recog_data.operand[i] = SUBREG_REG (recog_data.operand[i]); @@ -1318,7 +1407,7 @@ record_operand_costs (rtx_insn *insn, en recog_data.operand[i], 0, ADDRESS, SCRATCH, frequency * 2); } - + /* Check for commutative in a separate loop so everything will have been initialized. We must do this even if one operand is a constant--see addsi3 in m68k.md. */ @@ -1328,8 +1417,8 @@ record_operand_costs (rtx_insn *insn, en const char *xconstraints[MAX_RECOG_OPERANDS]; int j; - /* Handle commutative operands by swapping the constraints. - We assume the modes are the same. */ + /* Handle commutative operands by swapping the + constraints. We assume the modes are the same. */ for (j = 0; j < recog_data.n_operands; j++) xconstraints[j] = constraints[j]; @@ -1342,69 +1431,6 @@ record_operand_costs (rtx_insn *insn, en record_reg_classes (recog_data.n_alternatives, recog_data.n_operands, recog_data.operand, modes, constraints, insn, pref); - - /* If this insn is a single set copying operand 1 to operand 0 and - one operand is an allocno with the other a hard reg or an allocno - that prefers a hard register that is in its own register class - then we may want to adjust the cost of that register class to -1. - - Avoid the adjustment if the source does not die to avoid - stressing of register allocator by preferencing two colliding - registers into single class. - - Also avoid the adjustment if a copy between hard registers of the - class is expensive (ten times the cost of a default copy is - considered arbitrarily expensive). This avoids losing when the - preferred class is very expensive as the source of a copy - instruction. */ - if ((set = single_set (insn)) != NULL_RTX - /* In rare cases the single set insn might have less 2 operands - as the source can be a fixed special reg. */ - && recog_data.n_operands > 1 - && ops[0] == SET_DEST (set) && ops[1] == SET_SRC (set)) - { - int regno, other_regno; - rtx dest = SET_DEST (set); - rtx src = SET_SRC (set); - - if (GET_CODE (dest) == SUBREG - && known_eq (GET_MODE_SIZE (GET_MODE (dest)), - GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest))))) - dest = SUBREG_REG (dest); - if (GET_CODE (src) == SUBREG - && known_eq (GET_MODE_SIZE (GET_MODE (src)), - GET_MODE_SIZE (GET_MODE (SUBREG_REG (src))))) - src = SUBREG_REG (src); - if (REG_P (src) && REG_P (dest) - && find_regno_note (insn, REG_DEAD, REGNO (src)) - && (((regno = REGNO (src)) >= FIRST_PSEUDO_REGISTER - && (other_regno = REGNO (dest)) < FIRST_PSEUDO_REGISTER) - || ((regno = REGNO (dest)) >= FIRST_PSEUDO_REGISTER - && (other_regno = REGNO (src)) < FIRST_PSEUDO_REGISTER))) - { - machine_mode mode = GET_MODE (src); - cost_classes_t cost_classes_ptr = regno_cost_classes[regno]; - enum reg_class *cost_classes = cost_classes_ptr->classes; - reg_class_t rclass; - int k; - - i = regno == (int) REGNO (src) ? 1 : 0; - for (k = cost_classes_ptr->num - 1; k >= 0; k--) - { - rclass = cost_classes[k]; - if (TEST_HARD_REG_BIT (reg_class_contents[rclass], other_regno) - && (reg_class_size[(int) rclass] - == ira_reg_class_max_nregs [(int) rclass][(int) mode])) - { - if (reg_class_size[rclass] == 1) - op_costs[i]->cost[k] = -frequency; - else if (in_hard_reg_set_p (reg_class_contents[rclass], - mode, other_regno)) - op_costs[i]->cost[k] = -frequency; - } - } - } - } } \f @@ -1457,7 +1483,7 @@ scan_one_insn (rtx_insn *insn) /* If this insn loads a parameter from its stack slot, then it represents a savings, rather than a cost, if the parameter is - stored in memory. Record this fact. + stored in memory. Record this fact. Similarly if we're loading other constants from memory (constant pool, TOC references, small data areas, etc) and this is the only @@ -1468,7 +1494,7 @@ scan_one_insn (rtx_insn *insn) mem_cost might result in it being loaded using the specialized instruction into a register, then stored into stack and loaded again from the stack. See PR52208. - + Don't do this if SET_SRC (set) has side effect. See PR56124. */ if (set != 0 && REG_P (SET_DEST (set)) && MEM_P (SET_SRC (set)) && (note = find_reg_note (insn, REG_EQUIV, NULL_RTX)) != NULL_RTX @@ -1766,7 +1792,7 @@ find_costs_and_classes (FILE *dump_file) a = ALLOCNO_NEXT_REGNO_ALLOCNO (a)) { int *a_costs, *p_costs; - + a_num = ALLOCNO_NUM (a); if ((flag_ira_region == IRA_REGION_ALL || flag_ira_region == IRA_REGION_MIXED) @@ -1936,7 +1962,7 @@ find_costs_and_classes (FILE *dump_file) int a_num = ALLOCNO_NUM (a); int *total_a_costs = COSTS (total_allocno_costs, a_num)->cost; int *a_costs = COSTS (costs, a_num)->cost; - + if (aclass == NO_REGS) best = NO_REGS; else @@ -1998,7 +2024,7 @@ find_costs_and_classes (FILE *dump_file) } } } - + if (internal_flag_ira_verbose > 4 && dump_file) { if (allocno_p) @@ -2081,7 +2107,7 @@ process_bb_node_for_hard_reg_moves (ira_ int cost; enum reg_class hard_reg_class; machine_mode mode; - + mode = ALLOCNO_MODE (a); hard_reg_class = REGNO_REG_CLASS (hard_regno); ira_init_register_move_cost_if_necessary (mode); Index: testsuite/ChangeLog =================================================================== --- testsuite/ChangeLog (revision 266384) +++ testsuite/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2018-11-22 Vladimir Makarov <vmakarov@redhat.com> + + PR rtl-optimization/87718 + * gcc.target/i386/pr82361-1.c: Check only the first operand of + moves. + 2018-11-22 Thomas Preud'homme <thomas.preudhomme@linaro.org> * gcc.target/arm/pr85434.c: New test. Index: testsuite/gcc.target/i386/pr82361-1.c =================================================================== --- testsuite/gcc.target/i386/pr82361-1.c (revision 266155) +++ testsuite/gcc.target/i386/pr82361-1.c (working copy) @@ -6,7 +6,7 @@ /* { dg-final { scan-assembler-not "movl\t%eax, %eax" } } */ /* FIXME: We are still not able to optimize the modulo in f1/f2, only manage one. */ -/* { dg-final { scan-assembler-times "movl\t%edx, %edx" 2 } } */ +/* { dg-final { scan-assembler-times "movl\t%edx" 2 } } */ void f1 (unsigned int a, unsigned int b) ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: LRA patch for PR87718 2018-11-22 17:29 LRA patch for PR87718 Vladimir Makarov @ 2018-11-24 11:29 ` Christophe Lyon 2018-12-20 10:38 ` Sam Tebbs 0 siblings, 1 reply; 5+ messages in thread From: Christophe Lyon @ 2018-11-24 11:29 UTC (permalink / raw) To: Vladimir Makarov; +Cc: gcc Patches On Thu, 22 Nov 2018 at 18:30, Vladimir Makarov <vmakarov@redhat.com> wrote: > > The following patch fixes > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 > > The patch adds a special treatment for moves with a hard register in > register cost and class calculation. > > The patch was bootstrapped and tested on x86-64 and ppc64. > > I found two testsuite regressions because of the patch. The expected > generated code for PR82361 test is too specific. GCC with the patch > generates the same quality code but with a different hard register on > x86-64. So I just changed the test for PR82361. > > Another test is for ppc64. I think the expected generated code for > this test is wrong. I'll submit a changed test for a discussion later. > > Although I spent much time on the solution and I think it is the > right one, the patch is in very sensitive area of RA and may affect > expected code generation for many targets. I am ready to work on the > new regressions as soon as they are found. > > The patch was committed as rev. 260385. > Hi, This patch introduced at least several ICEs on arm targets: on arm-none-linux-gnueabi --with-cpu=cortex-a9: Executed from: gcc.target/arm/arm.exp gcc.target/arm/attr-neon-fp16.c (internal compiler error) gcc.target/arm/pr51968.c (internal compiler error) gcc.target/arm/pr68620.c (internal compiler error) Executed from: gcc.target/arm/simd/simd.exp gcc.target/arm/simd/vextp16_1.c (internal compiler error) gcc.target/arm/simd/vextp8_1.c (internal compiler error) gcc.target/arm/simd/vexts16_1.c (internal compiler error) gcc.target/arm/simd/vexts8_1.c (internal compiler error) gcc.target/arm/simd/vextu16_1.c (internal compiler error) gcc.target/arm/simd/vextu8_1.c (internal compiler error) gcc.target/arm/simd/vrev16p8_1.c (internal compiler error) gcc.target/arm/simd/vrev16s8_1.c (internal compiler error) gcc.target/arm/simd/vrev16u8_1.c (internal compiler error) gcc.target/arm/simd/vrev32p16_1.c (internal compiler error) gcc.target/arm/simd/vrev32p8_1.c (internal compiler error) gcc.target/arm/simd/vrev32s16_1.c (internal compiler error) gcc.target/arm/simd/vrev32s8_1.c (internal compiler error) gcc.target/arm/simd/vrev32u16_1.c (internal compiler error) gcc.target/arm/simd/vrev32u8_1.c (internal compiler error) gcc.target/arm/simd/vrev64f32_1.c (internal compiler error) gcc.target/arm/simd/vrev64p16_1.c (internal compiler error) gcc.target/arm/simd/vrev64p8_1.c (internal compiler error) gcc.target/arm/simd/vrev64s16_1.c (internal compiler error) gcc.target/arm/simd/vrev64s32_1.c (internal compiler error) gcc.target/arm/simd/vrev64s8_1.c (internal compiler error) gcc.target/arm/simd/vrev64u16_1.c (internal compiler error) gcc.target/arm/simd/vrev64u32_1.c (internal compiler error) gcc.target/arm/simd/vrev64u8_1.c (internal compiler error) arm-none-linux-gnueabihf shows only 1 ICE: gcc.target/arm/pr51968.c (internal compiler error) There are other regressions on the same targets, but not ICEs. I can report them later. Thanks, Christophe ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: LRA patch for PR87718 2018-11-24 11:29 ` Christophe Lyon @ 2018-12-20 10:38 ` Sam Tebbs 2018-12-20 11:39 ` Sam Tebbs 0 siblings, 1 reply; 5+ messages in thread From: Sam Tebbs @ 2018-12-20 10:38 UTC (permalink / raw) To: Christophe Lyon, Vladimir Makarov; +Cc: gcc Patches, nd [-- Attachment #1: Type: text/plain, Size: 2787 bytes --] On 11/24/18 11:29 AM, Christophe Lyon wrote: > On Thu, 22 Nov 2018 at 18:30, Vladimir Makarov <vmakarov@redhat.com> > wrote: >> The following patch fixes >> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 >> >> The patch adds a special treatment for moves with a hard register in >> register cost and class calculation. >> >> The patch was bootstrapped and tested on x86-64 and ppc64. >> >> I found two testsuite regressions because of the patch. The expected >> generated code for PR82361 test is too specific. GCC with the patch >> generates the same quality code but with a different hard register on >> x86-64. So I just changed the test for PR82361. >> >> Another test is for ppc64. I think the expected generated code for >> this test is wrong. I'll submit a changed test for a discussion later. >> >> Although I spent much time on the solution and I think it is the >> right one, the patch is in very sensitive area of RA and may affect >> expected code generation for many targets. I am ready to work on the >> new regressions as soon as they are found. >> >> The patch was committed as rev. 260385. >> > Hi, > > This patch introduced at least several ICEs on arm targets: > on arm-none-linux-gnueabi --with-cpu=cortex-a9: <snip> > There are other regressions on the same targets, but not ICEs. > I can report them later. > > Thanks, > > Christophe Hi Christophe and Vladimir, Here are the regressions seen on arm-none-linux-gnueabihf and arm-none-eabi. FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times ldrh\\tr[0-9]+ 2 FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times vld1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2 FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times vmov\\.f16\\ts[0-9]+, r[0-9]+ 2 FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov(\\.f16)?\\tr[0-9]+, s[0-9]+ FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov(\\.f16)?\\ts0, r[0-9]+ FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\tr[0-9]+, s[0-2] 2 FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\ts0, r[0-9]+ 2 I didn't see a bug report for these, so I will open one. It is not clear if the test cases should be adjusted because of your patch or if they are failing because of incorrect codegen. Attached is the code generated for armv8_2-fp16-move-1.c (one of the test files failing) with and without your patch. Full command line used to compile and test armv8_2-fp16-move-1.c: bin/gcc armv8_2-fp16-move-1.c -fno-diagnostics-show-caret -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -mfloat-abi=hard -ffat-lto-objects -fno-ident -S -o armv8_2-fp16-move-1.s. [-- Attachment #2: armv8_2-fp16-move-1-with.s --] [-- Type: text/plain, Size: 8451 bytes --] .arch armv8.2-a .eabi_attribute 28, 1 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 38, 1 .eabi_attribute 18, 4 .file "armv8_2-fp16-move-1.c" .text .align 1 .p2align 2,,3 .global test_load_1 .arch armv8.2-a .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_load_1, %function test_load_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vld1.16 {d0[0]}, [r0] bx lr .size test_load_1, .-test_load_1 .align 1 .p2align 2,,3 .global test_load_2 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_load_2, %function test_load_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. add r3, r0, r1, lsl #1 vld1.16 {d0[0]}, [r3] bx lr .size test_load_2, .-test_load_2 .align 1 .p2align 2,,3 .global test_store_1 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_store_1, %function test_store_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vst1.16 {d0[0]}, [r0] bx lr .size test_store_1, .-test_store_1 .align 1 .p2align 2,,3 .global test_store_2 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_store_2, %function test_store_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vmov.f16 r3, s0 @ __fp16 strh r3, [r0, r1, lsl #1] @ __fp16 bx lr .size test_store_2, .-test_store_2 .align 1 .p2align 2,,3 .global test_load_store_1 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_load_store_1, %function test_load_store_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. ldrh r3, [r2, r1, lsl #1] @ __fp16 strh r3, [r0, r1, lsl #1] @ __fp16 bx lr .size test_load_store_1, .-test_load_store_1 .align 1 .p2align 2,,3 .global test_load_store_2 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_load_store_2, %function test_load_store_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. adds r1, r1, #2 add r3, r2, r1, lsl #1 add r0, r0, r1, lsl #1 vld1.16 {d0[0]}, [r3] vmov.f16 r3, s0 @ __fp16 strh r3, [r0, #-4] @ __fp16 bx lr .size test_load_store_2, .-test_load_store_2 .align 1 .p2align 2,,3 .global test_select_1 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_1, %function test_select_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. cmp r0, #0 vseleq.f16 s0, s1, s0 bx lr .size test_select_1, .-test_select_1 .align 1 .p2align 2,,3 .global test_select_2 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_2, %function test_select_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. cmp r0, #0 vseleq.f16 s0, s1, s0 bx lr .size test_select_2, .-test_select_2 .align 1 .p2align 2,,3 .global test_select_3 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_3, %function test_select_3: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s15, s0 vcvtb.f32.f16 s14, s1 vcmp.f32 s15, s14 vmrs APSR_nzcv, FPSCR vseleq.f16 s0, s1, s2 bx lr .size test_select_3, .-test_select_3 .align 1 .p2align 2,,3 .global test_select_4 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_4, %function test_select_4: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s15, s0 vcvtb.f32.f16 s14, s1 vcmp.f32 s15, s14 vmrs APSR_nzcv, FPSCR vseleq.f16 s0, s2, s1 bx lr .size test_select_4, .-test_select_4 .align 1 .p2align 2,,3 .global test_select_5 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_5, %function test_select_5: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s15, s1 vcmpe.f32 s0, s15 vmrs APSR_nzcv, FPSCR bmi .L17 vmov s1, s2 @ __fp16 .L17: vmov s0, s1 @ __fp16 bx lr .size test_select_5, .-test_select_5 .align 1 .p2align 2,,3 .global test_select_6 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_6, %function test_select_6: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s15, s1 vcmpe.f32 s0, s15 vmrs APSR_nzcv, FPSCR bls .L19 vmov s1, s2 @ __fp16 .L19: vmov s0, s1 @ __fp16 bx lr .size test_select_6, .-test_select_6 .align 1 .p2align 2,,3 .global test_select_7 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_7, %function test_select_7: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s15, s0 vcvtb.f32.f16 s14, s1 vcmpe.f32 s15, s14 vmrs APSR_nzcv, FPSCR vselgt.f16 s0, s1, s2 bx lr .size test_select_7, .-test_select_7 .align 1 .p2align 2,,3 .global test_select_8 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_8, %function test_select_8: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s15, s0 vcvtb.f32.f16 s14, s1 vcmpe.f32 s15, s14 vmrs APSR_nzcv, FPSCR vselge.f16 s0, s1, s2 bx lr .size test_select_8, .-test_select_8 .align 1 .p2align 2,,3 .global test_compare_1 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_1, %function test_compare_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmp.f32 s0, s1 vmrs APSR_nzcv, FPSCR it ne movne r0, #0 bx lr .size test_compare_1, .-test_compare_1 .align 1 .p2align 2,,3 .global test_compare_ .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_, %function test_compare_: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmp.f32 s0, s1 vmrs APSR_nzcv, FPSCR it eq moveq r0, #0 bx lr .size test_compare_, .-test_compare_ .align 1 .p2align 2,,3 .global test_compare_2 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_2, %function test_compare_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmpe.f32 s0, s1 vmrs APSR_nzcv, FPSCR it le movle r0, #0 bx lr .size test_compare_2, .-test_compare_2 .align 1 .p2align 2,,3 .global test_compare_3 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_3, %function test_compare_3: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmpe.f32 s0, s1 vmrs APSR_nzcv, FPSCR it lt movlt r0, #0 bx lr .size test_compare_3, .-test_compare_3 .align 1 .p2align 2,,3 .global test_compare_4 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_4, %function test_compare_4: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmpe.f32 s0, s1 vmrs APSR_nzcv, FPSCR it pl movpl r0, #0 bx lr .size test_compare_4, .-test_compare_4 .align 1 .p2align 2,,3 .global test_compare_5 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_5, %function test_compare_5: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmpe.f32 s0, s1 vmrs APSR_nzcv, FPSCR it hi movhi r0, #0 bx lr .size test_compare_5, .-test_compare_5 .section .note.GNU-stack,"",%progbits [-- Attachment #3: armv8_2-fp16-move-1-without.s --] [-- Type: text/plain, Size: 8516 bytes --] .arch armv8.2-a .eabi_attribute 28, 1 .eabi_attribute 20, 1 .eabi_attribute 21, 1 .eabi_attribute 23, 3 .eabi_attribute 24, 1 .eabi_attribute 25, 1 .eabi_attribute 26, 2 .eabi_attribute 30, 2 .eabi_attribute 34, 1 .eabi_attribute 38, 1 .eabi_attribute 18, 4 .file "armv8_2-fp16-move-1.c" .text .align 1 .p2align 2,,3 .global test_load_1 .arch armv8.2-a .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_load_1, %function test_load_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vld1.16 {d0[0]}, [r0] bx lr .size test_load_1, .-test_load_1 .align 1 .p2align 2,,3 .global test_load_2 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_load_2, %function test_load_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. add r3, r0, r1, lsl #1 vld1.16 {d0[0]}, [r3] bx lr .size test_load_2, .-test_load_2 .align 1 .p2align 2,,3 .global test_store_1 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_store_1, %function test_store_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vst1.16 {d0[0]}, [r0] bx lr .size test_store_1, .-test_store_1 .align 1 .p2align 2,,3 .global test_store_2 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_store_2, %function test_store_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vmov.f16 r3, s0 @ __fp16 strh r3, [r0, r1, lsl #1] @ __fp16 bx lr .size test_store_2, .-test_store_2 .align 1 .p2align 2,,3 .global test_load_store_1 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_load_store_1, %function test_load_store_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vmov.f16 s0, r3 @ __fp16 ldrh r3, [r2, r1, lsl #1] @ __fp16 strh r3, [r0, r1, lsl #1] @ __fp16 bx lr .size test_load_store_1, .-test_load_store_1 .align 1 .p2align 2,,3 .global test_load_store_2 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_load_store_2, %function test_load_store_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. adds r1, r1, #2 add r0, r0, r1, lsl #1 ldrh r3, [r2, r1, lsl #1] @ __fp16 vmov.f16 s0, r3 @ __fp16 strh r3, [r0, #-4] @ __fp16 bx lr .size test_load_store_2, .-test_load_store_2 .align 1 .p2align 2,,3 .global test_select_1 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_1, %function test_select_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. cmp r0, #0 vseleq.f16 s0, s1, s0 bx lr .size test_select_1, .-test_select_1 .align 1 .p2align 2,,3 .global test_select_2 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_2, %function test_select_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. cmp r0, #0 vseleq.f16 s0, s1, s0 bx lr .size test_select_2, .-test_select_2 .align 1 .p2align 2,,3 .global test_select_3 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_3, %function test_select_3: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s15, s0 vcvtb.f32.f16 s14, s1 vcmp.f32 s15, s14 vmrs APSR_nzcv, FPSCR vseleq.f16 s0, s1, s2 bx lr .size test_select_3, .-test_select_3 .align 1 .p2align 2,,3 .global test_select_4 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_4, %function test_select_4: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s15, s0 vcvtb.f32.f16 s14, s1 vcmp.f32 s15, s14 vmrs APSR_nzcv, FPSCR vseleq.f16 s0, s2, s1 bx lr .size test_select_4, .-test_select_4 .align 1 .p2align 2,,3 .global test_select_5 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_5, %function test_select_5: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s14, s1 vmov s15, s1 @ __fp16 vcmpe.f32 s0, s14 vmrs APSR_nzcv, FPSCR bmi .L17 vmov s15, s2 @ __fp16 .L17: vmov s0, s15 @ __fp16 bx lr .size test_select_5, .-test_select_5 .align 1 .p2align 2,,3 .global test_select_6 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_6, %function test_select_6: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s14, s1 vmov s15, s1 @ __fp16 vcmpe.f32 s0, s14 vmrs APSR_nzcv, FPSCR bls .L19 vmov s15, s2 @ __fp16 .L19: vmov s0, s15 @ __fp16 bx lr .size test_select_6, .-test_select_6 .align 1 .p2align 2,,3 .global test_select_7 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_7, %function test_select_7: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s15, s0 vcvtb.f32.f16 s14, s1 vcmpe.f32 s15, s14 vmrs APSR_nzcv, FPSCR vselgt.f16 s0, s1, s2 bx lr .size test_select_7, .-test_select_7 .align 1 .p2align 2,,3 .global test_select_8 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_select_8, %function test_select_8: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s15, s0 vcvtb.f32.f16 s14, s1 vcmpe.f32 s15, s14 vmrs APSR_nzcv, FPSCR vselge.f16 s0, s1, s2 bx lr .size test_select_8, .-test_select_8 .align 1 .p2align 2,,3 .global test_compare_1 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_1, %function test_compare_1: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmp.f32 s0, s1 vmrs APSR_nzcv, FPSCR it ne movne r0, #0 bx lr .size test_compare_1, .-test_compare_1 .align 1 .p2align 2,,3 .global test_compare_ .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_, %function test_compare_: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmp.f32 s0, s1 vmrs APSR_nzcv, FPSCR it eq moveq r0, #0 bx lr .size test_compare_, .-test_compare_ .align 1 .p2align 2,,3 .global test_compare_2 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_2, %function test_compare_2: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmpe.f32 s0, s1 vmrs APSR_nzcv, FPSCR it le movle r0, #0 bx lr .size test_compare_2, .-test_compare_2 .align 1 .p2align 2,,3 .global test_compare_3 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_3, %function test_compare_3: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmpe.f32 s0, s1 vmrs APSR_nzcv, FPSCR it lt movlt r0, #0 bx lr .size test_compare_3, .-test_compare_3 .align 1 .p2align 2,,3 .global test_compare_4 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_4, %function test_compare_4: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmpe.f32 s0, s1 vmrs APSR_nzcv, FPSCR it pl movpl r0, #0 bx lr .size test_compare_4, .-test_compare_4 .align 1 .p2align 2,,3 .global test_compare_5 .syntax unified .thumb .thumb_func .fpu fp-armv8 .type test_compare_5, %function test_compare_5: @ args = 0, pretend = 0, frame = 0 @ frame_needed = 0, uses_anonymous_args = 0 @ link register save eliminated. vcvtb.f32.f16 s0, s0 vcvtb.f32.f16 s1, s1 mov r0, #-1 vcmpe.f32 s0, s1 vmrs APSR_nzcv, FPSCR it hi movhi r0, #0 bx lr .size test_compare_5, .-test_compare_5 .section .note.GNU-stack,"",%progbits ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: LRA patch for PR87718 2018-12-20 10:38 ` Sam Tebbs @ 2018-12-20 11:39 ` Sam Tebbs 0 siblings, 0 replies; 5+ messages in thread From: Sam Tebbs @ 2018-12-20 11:39 UTC (permalink / raw) To: Christophe Lyon, Vladimir Makarov; +Cc: gcc Patches, nd On 12/20/18 10:38 AM, Sam Tebbs wrote: > On 11/24/18 11:29 AM, Christophe Lyon wrote: > >> On Thu, 22 Nov 2018 at 18:30, Vladimir Makarov <vmakarov@redhat.com> >> wrote: >>> The following patch fixes >>> >>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718 >>> >>> The patch adds a special treatment for moves with a hard register in >>> register cost and class calculation. >>> >>> The patch was bootstrapped and tested on x86-64 and ppc64. >>> >>> I found two testsuite regressions because of the patch. The expected >>> generated code for PR82361 test is too specific. GCC with the patch >>> generates the same quality code but with a different hard register on >>> x86-64. So I just changed the test for PR82361. >>> >>> Another test is for ppc64. I think the expected generated code for >>> this test is wrong. I'll submit a changed test for a discussion later. >>> >>> Although I spent much time on the solution and I think it is the >>> right one, the patch is in very sensitive area of RA and may affect >>> expected code generation for many targets. I am ready to work on the >>> new regressions as soon as they are found. >>> >>> The patch was committed as rev. 260385. >>> >> Hi, >> >> This patch introduced at least several ICEs on arm targets: >> on arm-none-linux-gnueabi --with-cpu=cortex-a9: > <snip> >> There are other regressions on the same targets, but not ICEs. >> I can report them later. >> >> Thanks, >> >> Christophe > Hi Christophe and Vladimir, > > Here are the regressions seen on arm-none-linux-gnueabihf and arm-none-eabi. > > FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times > ldrh\\tr[0-9]+ 2 > FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times > vld1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2 > FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times > vmov\\.f16\\ts[0-9]+, r[0-9]+ 2 > FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler > vmov(\\.f16)?\\tr[0-9]+, s[0-9]+ > FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov(\\.f16)?\\ts0, > r[0-9]+ > FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\tr[0-9]+, > s[0-2] 2 > FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\ts0, > r[0-9]+ 2 > > I didn't see a bug report for these, so I will open one. > > It is not clear if the test cases should be adjusted because of your > patch or if they are failing because of incorrect codegen. Attached is > the code generated for armv8_2-fp16-move-1.c (one of the test files > failing) with and without your patch. > > Full command line used to compile and test armv8_2-fp16-move-1.c: > > bin/gcc armv8_2-fp16-move-1.c -fno-diagnostics-show-caret > -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -O2 > -mfpu=fp-armv8 -march=armv8.2-a+fp16 -mfloat-abi=hard -ffat-lto-objects > -fno-ident -S -o armv8_2-fp16-move-1.s. Reported as PR 88560 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88560). ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: LRA patch for PR87718
@ 2018-11-23 8:04 Uros Bizjak
0 siblings, 0 replies; 5+ messages in thread
From: Uros Bizjak @ 2018-11-23 8:04 UTC (permalink / raw)
To: gcc-patches; +Cc: Vladimir Makarov
Hello!
> The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718
>
> The patch adds a special treatment for moves with a hard register in register cost and class
> calculation.
>
> The patch was bootstrapped and tested on x86-64 and ppc64.
>
> I found two testsuite regressions because of the patch. The expected generated code for
> PR82361 test is too specific. GCC with the patch generates the same quality code but with a
> different hard register on x86-64. So I just changed the test for PR82361. Another test is for
> ppc64. I think the expected generated code for this test is wrong. I'll submit a changed test
> for a discussion later. Although I spent much time on the solution and I think it is the right one,
> the patch is in very sensitive area of RA and may affect expected code generation for many
> targets. I am ready to work on the new regressions as soon as they are found.
The patch regressed:
FAIL: gcc.target/i386/pr22076.c scan-assembler-not movl
FAIL: gcc.target/i386/pr81563.c scan-assembler-times movl[\\t
]*-4\\(%ebp\\),[\\t ]*%edi 1
FAIL: gcc.target/i386/pr81563.c scan-assembler-times movl[\\t
]*-8\\(%ebp\\),[\\t ]*%esi 1
on 32 bit x86 target.
The PR22076 moves a value from mm0 via integer register to a volatile location:
movq .LC1, %mm0
paddb .LC0, %mm0
movq %mm0, 16(%esp)
movl 16(%esp), %eax
movl 20(%esp), %edx
movl %eax, (%esp)
movl %edx, 4(%esp)
movq (%esp), %mm0
addl $28, %esp
where before the patch we had:
movq .LC1, %mm0
paddb .LC0, %mm0
movq %mm0, 8(%esp)
movq 8(%esp), %mm0
addl $20, %esp
The PR81563 looks like a testsuite issue, where the compiler now
allocates call-clobbered register for a value that lives across the
call.
Uros.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-12-20 11:16 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-11-22 17:29 LRA patch for PR87718 Vladimir Makarov 2018-11-24 11:29 ` Christophe Lyon 2018-12-20 10:38 ` Sam Tebbs 2018-12-20 11:39 ` Sam Tebbs 2018-11-23 8:04 Uros Bizjak
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).