LRA patch for PR87718

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* LRA patch for PR87718
@ 2018-11-22 17:29 Vladimir Makarov
  2018-11-24 11:29 ` Christophe Lyon
  0 siblings, 1 reply; 5+ messages in thread
From: Vladimir Makarov @ 2018-11-22 17:29 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]

 Â  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718

 Â  The patch adds a special treatment for moves with a hard register in 
register cost and class calculation.

 Â  The patch was bootstrapped and tested on x86-64 and ppc64.

 Â  I found two testsuite regressions because of the patch.Â  The expected 
generated code for PR82361 test is too specific.Â  GCC with the patch 
generates the same quality code but with a different hard register on 
x86-64.Â  So I just changed the test forÂ  PR82361.

 Â  Another test is for ppc64.Â  I think the expected generated code for 
this test is wrong.Â  I'll submit a changed test for a discussion later.

 Â  Although I spent much time on the solution and I think it is the 
right one, the patch is in very sensitive area of RA and may affect 
expected code generation for many targets.Â  I am ready to work on the 
new regressions as soon as they are found.

 Â  The patch was committed as rev. 260385.


[-- Attachment #2: pr87718.patch --]
[-- Type: text/x-patch, Size: 11540 bytes --]

Index: ChangeLog
===================================================================
--- ChangeLog	(revision 266384)
+++ ChangeLog	(working copy)
@@ -1,3 +1,10 @@
+2018-11-22  Vladimir Makarov  <vmakarov@redhat.com>
+
+	PR rtl-optimization/87718
+	* ira-costs.c: Remove trailing white-spaces.
+	(record_operand_costs): Add a special treatment for moves
+	involving a hard register.
+
 2018-11-22  Uros Bizjak  <ubizjak@gmail.com>
 
 	* config/i386/i386.c (ix86_avx_emit_vzeroupper): Remove.
Index: ira-costs.c
===================================================================
--- ira-costs.c	(revision 266155)
+++ ira-costs.c	(working copy)
@@ -1257,7 +1257,7 @@ record_address_regs (machine_mode mode,
 	    add_cost = (move_in_cost[i][rclass] * scale) / 2;
 	    if (INT_MAX - add_cost < pp_costs[k])
 	      pp_costs[k] = INT_MAX;
-	    else 
+	    else
 	      pp_costs[k] += add_cost;
 	  }
       }
@@ -1283,10 +1283,100 @@ record_operand_costs (rtx_insn *insn, en
 {
   const char *constraints[MAX_RECOG_OPERANDS];
   machine_mode modes[MAX_RECOG_OPERANDS];
-  rtx ops[MAX_RECOG_OPERANDS];
   rtx set;
   int i;
 
+  if ((set = single_set (insn)) != NULL_RTX
+      /* In rare cases the single set insn might have less 2 operands
+	 as the source can be a fixed special reg.  */
+      && recog_data.n_operands > 1
+      && recog_data.operand[0] == SET_DEST (set)
+      && recog_data.operand[1] == SET_SRC (set))
+    {
+      int regno, other_regno;
+      rtx dest = SET_DEST (set);
+      rtx src = SET_SRC (set);
+
+      if (GET_CODE (dest) == SUBREG
+	  && known_eq (GET_MODE_SIZE (GET_MODE (dest)),
+		       GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest)))))
+	dest = SUBREG_REG (dest);
+      if (GET_CODE (src) == SUBREG
+	  && known_eq (GET_MODE_SIZE (GET_MODE (src)),
+		       GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))))
+	src = SUBREG_REG (src);
+      if (REG_P (src) && REG_P (dest)
+	  && (((regno = REGNO (src)) >= FIRST_PSEUDO_REGISTER
+	       && (other_regno = REGNO (dest)) < FIRST_PSEUDO_REGISTER)
+	      || ((regno = REGNO (dest)) >= FIRST_PSEUDO_REGISTER
+		  && (other_regno = REGNO (src)) < FIRST_PSEUDO_REGISTER)))
+	{
+	  machine_mode mode = GET_MODE (SET_SRC (set));
+	  cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
+	  enum reg_class *cost_classes = cost_classes_ptr->classes;
+	  reg_class_t rclass, hard_reg_class, pref_class;
+	  int cost, k;
+	  bool dead_p = find_regno_note (insn, REG_DEAD, REGNO (src));
+
+	  hard_reg_class = REGNO_REG_CLASS (other_regno);
+	  i = regno == (int) REGNO (src) ? 1 : 0;
+	  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
+	    {
+	      rclass = cost_classes[k];
+	      cost = ((i == 0
+		       ? ira_register_move_cost[mode][hard_reg_class][rclass]
+		       : ira_register_move_cost[mode][rclass][hard_reg_class])
+		      * frequency);
+	      op_costs[i]->cost[k] = cost;
+	      /* If we have assigned a class to this allocno in our
+		 first pass, add a cost to this alternative
+		 corresponding to what we would add if this allocno
+		 were not in the appropriate class.  */
+	      if (pref)
+		{
+		  if ((pref_class = pref[COST_INDEX (regno)]) == NO_REGS)
+		    op_costs[i]->cost[k]
+		      += ((i == 0 ? ira_memory_move_cost[mode][rclass][0] : 0)
+			  + (i == 1 ? ira_memory_move_cost[mode][rclass][1] : 0)
+			  * frequency);
+		  else if (ira_reg_class_intersect[pref_class][rclass]
+			   == NO_REGS)
+		    op_costs[i]->cost[k]
+		      += (ira_register_move_cost[mode][pref_class][rclass]
+			  * frequency);
+		}
+	      /* If this insn is a single set copying operand 1 to
+		 operand 0 and one operand is an allocno with the
+		 other a hard reg or an allocno that prefers a hard
+		 register that is in its own register class then we
+		 may want to adjust the cost of that register class to
+		 -1.
+
+		 Avoid the adjustment if the source does not die to
+		 avoid stressing of register allocator by preferencing
+		 two colliding registers into single class.  */
+	      if (dead_p
+		  && TEST_HARD_REG_BIT (reg_class_contents[rclass], other_regno)
+		  && (reg_class_size[(int) rclass]
+		      == (ira_reg_class_max_nregs
+			  [(int) rclass][(int) GET_MODE(src)])))
+		{
+		  if (reg_class_size[rclass] == 1)
+		    op_costs[i]->cost[k] = -frequency;
+		  else if (in_hard_reg_set_p (reg_class_contents[rclass],
+					      GET_MODE(src), other_regno))
+		    op_costs[i]->cost[k] = -frequency;
+		}
+	    }
+	  op_costs[i]->mem_cost
+	    = ira_memory_move_cost[mode][hard_reg_class][i] * frequency;
+	  if (pref && (pref_class = pref[COST_INDEX (regno)]) != NO_REGS)
+	    op_costs[i]->mem_cost
+	      += ira_memory_move_cost[mode][pref_class][i] * frequency;
+	  return;
+	}
+    }
+
   for (i = 0; i < recog_data.n_operands; i++)
     {
       constraints[i] = recog_data.constraints[i];
@@ -1302,7 +1392,6 @@ record_operand_costs (rtx_insn *insn, en
     {
       memcpy (op_costs[i], init_cost, struct_costs_size);
 
-      ops[i] = recog_data.operand[i];
       if (GET_CODE (recog_data.operand[i]) == SUBREG)
 	recog_data.operand[i] = SUBREG_REG (recog_data.operand[i]);
 
@@ -1318,7 +1407,7 @@ record_operand_costs (rtx_insn *insn, en
 			     recog_data.operand[i], 0, ADDRESS, SCRATCH,
 			     frequency * 2);
     }
-  
+
   /* Check for commutative in a separate loop so everything will have
      been initialized.  We must do this even if one operand is a
      constant--see addsi3 in m68k.md.  */
@@ -1328,8 +1417,8 @@ record_operand_costs (rtx_insn *insn, en
 	const char *xconstraints[MAX_RECOG_OPERANDS];
 	int j;
 
-	/* Handle commutative operands by swapping the constraints.
-	   We assume the modes are the same.  */
+	/* Handle commutative operands by swapping the
+	   constraints.  We assume the modes are the same.  */
 	for (j = 0; j < recog_data.n_operands; j++)
 	  xconstraints[j] = constraints[j];
 
@@ -1342,69 +1431,6 @@ record_operand_costs (rtx_insn *insn, en
   record_reg_classes (recog_data.n_alternatives, recog_data.n_operands,
 		      recog_data.operand, modes,
 		      constraints, insn, pref);
-
-  /* If this insn is a single set copying operand 1 to operand 0 and
-     one operand is an allocno with the other a hard reg or an allocno
-     that prefers a hard register that is in its own register class
-     then we may want to adjust the cost of that register class to -1.
-
-     Avoid the adjustment if the source does not die to avoid
-     stressing of register allocator by preferencing two colliding
-     registers into single class.
-
-     Also avoid the adjustment if a copy between hard registers of the
-     class is expensive (ten times the cost of a default copy is
-     considered arbitrarily expensive).  This avoids losing when the
-     preferred class is very expensive as the source of a copy
-     instruction.  */
-  if ((set = single_set (insn)) != NULL_RTX
-      /* In rare cases the single set insn might have less 2 operands
-	 as the source can be a fixed special reg.  */
-      && recog_data.n_operands > 1
-      && ops[0] == SET_DEST (set) && ops[1] == SET_SRC (set))
-    {
-      int regno, other_regno;
-      rtx dest = SET_DEST (set);
-      rtx src = SET_SRC (set);
-
-      if (GET_CODE (dest) == SUBREG
-	  && known_eq (GET_MODE_SIZE (GET_MODE (dest)),
-		       GET_MODE_SIZE (GET_MODE (SUBREG_REG (dest)))))
-	dest = SUBREG_REG (dest);
-      if (GET_CODE (src) == SUBREG
-	  && known_eq (GET_MODE_SIZE (GET_MODE (src)),
-		       GET_MODE_SIZE (GET_MODE (SUBREG_REG (src)))))
-	src = SUBREG_REG (src);
-      if (REG_P (src) && REG_P (dest)
-	  && find_regno_note (insn, REG_DEAD, REGNO (src))
-	  && (((regno = REGNO (src)) >= FIRST_PSEUDO_REGISTER
-	       && (other_regno = REGNO (dest)) < FIRST_PSEUDO_REGISTER)
-	      || ((regno = REGNO (dest)) >= FIRST_PSEUDO_REGISTER
-		  && (other_regno = REGNO (src)) < FIRST_PSEUDO_REGISTER)))
-	{
-	  machine_mode mode = GET_MODE (src);
-	  cost_classes_t cost_classes_ptr = regno_cost_classes[regno];
-	  enum reg_class *cost_classes = cost_classes_ptr->classes;
-	  reg_class_t rclass;
-	  int k;
-
-	  i = regno == (int) REGNO (src) ? 1 : 0;
-	  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
-	    {
-	      rclass = cost_classes[k];
-	      if (TEST_HARD_REG_BIT (reg_class_contents[rclass], other_regno)
-		  && (reg_class_size[(int) rclass]
-		      == ira_reg_class_max_nregs [(int) rclass][(int) mode]))
-		{
-		  if (reg_class_size[rclass] == 1)
-		    op_costs[i]->cost[k] = -frequency;
-		  else if (in_hard_reg_set_p (reg_class_contents[rclass],
-					      mode, other_regno))
-		    op_costs[i]->cost[k] = -frequency;
-		}
-	    }
-	}
-    }
 }
 
 \f
@@ -1457,7 +1483,7 @@ scan_one_insn (rtx_insn *insn)
 
   /* If this insn loads a parameter from its stack slot, then it
      represents a savings, rather than a cost, if the parameter is
-     stored in memory.  Record this fact. 
+     stored in memory.  Record this fact.
 
      Similarly if we're loading other constants from memory (constant
      pool, TOC references, small data areas, etc) and this is the only
@@ -1468,7 +1494,7 @@ scan_one_insn (rtx_insn *insn)
      mem_cost might result in it being loaded using the specialized
      instruction into a register, then stored into stack and loaded
      again from the stack.  See PR52208.
-     
+
      Don't do this if SET_SRC (set) has side effect.  See PR56124.  */
   if (set != 0 && REG_P (SET_DEST (set)) && MEM_P (SET_SRC (set))
       && (note = find_reg_note (insn, REG_EQUIV, NULL_RTX)) != NULL_RTX
@@ -1766,7 +1792,7 @@ find_costs_and_classes (FILE *dump_file)
 		   a = ALLOCNO_NEXT_REGNO_ALLOCNO (a))
 		{
 		  int *a_costs, *p_costs;
-		      
+
 		  a_num = ALLOCNO_NUM (a);
 		  if ((flag_ira_region == IRA_REGION_ALL
 		       || flag_ira_region == IRA_REGION_MIXED)
@@ -1936,7 +1962,7 @@ find_costs_and_classes (FILE *dump_file)
 	      int a_num = ALLOCNO_NUM (a);
 	      int *total_a_costs = COSTS (total_allocno_costs, a_num)->cost;
 	      int *a_costs = COSTS (costs, a_num)->cost;
-	
+
 	      if (aclass == NO_REGS)
 		best = NO_REGS;
 	      else
@@ -1998,7 +2024,7 @@ find_costs_and_classes (FILE *dump_file)
 		}
 	    }
 	}
-      
+
       if (internal_flag_ira_verbose > 4 && dump_file)
 	{
 	  if (allocno_p)
@@ -2081,7 +2107,7 @@ process_bb_node_for_hard_reg_moves (ira_
 	int cost;
 	enum reg_class hard_reg_class;
 	machine_mode mode;
-	
+
 	mode = ALLOCNO_MODE (a);
 	hard_reg_class = REGNO_REG_CLASS (hard_regno);
 	ira_init_register_move_cost_if_necessary (mode);
Index: testsuite/ChangeLog
===================================================================
--- testsuite/ChangeLog	(revision 266384)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2018-11-22  Vladimir Makarov  <vmakarov@redhat.com>
+
+	PR rtl-optimization/87718
+	* gcc.target/i386/pr82361-1.c: Check only the first operand of
+	moves.
+
 2018-11-22  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
 
 	* gcc.target/arm/pr85434.c: New test.
Index: testsuite/gcc.target/i386/pr82361-1.c
===================================================================
--- testsuite/gcc.target/i386/pr82361-1.c	(revision 266155)
+++ testsuite/gcc.target/i386/pr82361-1.c	(working copy)
@@ -6,7 +6,7 @@
 /* { dg-final { scan-assembler-not "movl\t%eax, %eax" } } */
 /* FIXME: We are still not able to optimize the modulo in f1/f2, only manage
    one.  */
-/* { dg-final { scan-assembler-times "movl\t%edx, %edx" 2 } } */
+/* { dg-final { scan-assembler-times "movl\t%edx" 2 } } */
 
 void
 f1 (unsigned int a, unsigned int b)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: LRA patch for PR87718
  2018-11-22 17:29 LRA patch for PR87718 Vladimir Makarov
@ 2018-11-24 11:29 ` Christophe Lyon
  2018-12-20 10:38   ` Sam Tebbs
  0 siblings, 1 reply; 5+ messages in thread
From: Christophe Lyon @ 2018-11-24 11:29 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc Patches

On Thu, 22 Nov 2018 at 18:30, Vladimir Makarov <vmakarov@redhat.com> wrote:
>
>    The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718
>
>    The patch adds a special treatment for moves with a hard register in
> register cost and class calculation.
>
>    The patch was bootstrapped and tested on x86-64 and ppc64.
>
>    I found two testsuite regressions because of the patch.  The expected
> generated code for PR82361 test is too specific.  GCC with the patch
> generates the same quality code but with a different hard register on
> x86-64.  So I just changed the test for  PR82361.
>
>    Another test is for ppc64.  I think the expected generated code for
> this test is wrong.  I'll submit a changed test for a discussion later.
>
>    Although I spent much time on the solution and I think it is the
> right one, the patch is in very sensitive area of RA and may affect
> expected code generation for many targets.  I am ready to work on the
> new regressions as soon as they are found.
>
>    The patch was committed as rev. 260385.
>

Hi,

This patch introduced at least several ICEs on arm targets:
on arm-none-linux-gnueabi --with-cpu=cortex-a9:
  Executed from: gcc.target/arm/arm.exp
    gcc.target/arm/attr-neon-fp16.c (internal compiler error)
    gcc.target/arm/pr51968.c (internal compiler error)
    gcc.target/arm/pr68620.c (internal compiler error)
  Executed from: gcc.target/arm/simd/simd.exp
    gcc.target/arm/simd/vextp16_1.c (internal compiler error)
    gcc.target/arm/simd/vextp8_1.c (internal compiler error)
    gcc.target/arm/simd/vexts16_1.c (internal compiler error)
    gcc.target/arm/simd/vexts8_1.c (internal compiler error)
    gcc.target/arm/simd/vextu16_1.c (internal compiler error)
    gcc.target/arm/simd/vextu8_1.c (internal compiler error)
    gcc.target/arm/simd/vrev16p8_1.c (internal compiler error)
    gcc.target/arm/simd/vrev16s8_1.c (internal compiler error)
    gcc.target/arm/simd/vrev16u8_1.c (internal compiler error)
    gcc.target/arm/simd/vrev32p16_1.c (internal compiler error)
    gcc.target/arm/simd/vrev32p8_1.c (internal compiler error)
    gcc.target/arm/simd/vrev32s16_1.c (internal compiler error)
    gcc.target/arm/simd/vrev32s8_1.c (internal compiler error)
    gcc.target/arm/simd/vrev32u16_1.c (internal compiler error)
    gcc.target/arm/simd/vrev32u8_1.c (internal compiler error)
    gcc.target/arm/simd/vrev64f32_1.c (internal compiler error)
    gcc.target/arm/simd/vrev64p16_1.c (internal compiler error)
    gcc.target/arm/simd/vrev64p8_1.c (internal compiler error)
    gcc.target/arm/simd/vrev64s16_1.c (internal compiler error)
    gcc.target/arm/simd/vrev64s32_1.c (internal compiler error)
    gcc.target/arm/simd/vrev64s8_1.c (internal compiler error)
    gcc.target/arm/simd/vrev64u16_1.c (internal compiler error)
    gcc.target/arm/simd/vrev64u32_1.c (internal compiler error)
    gcc.target/arm/simd/vrev64u8_1.c (internal compiler error)

arm-none-linux-gnueabihf shows only 1 ICE:
gcc.target/arm/pr51968.c (internal compiler error)

There are other regressions on the same targets, but not ICEs.
I can report them later.

Thanks,

Christophe

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: LRA patch for PR87718
  2018-11-24 11:29 ` Christophe Lyon
@ 2018-12-20 10:38   ` Sam Tebbs
  2018-12-20 11:39     ` Sam Tebbs
  0 siblings, 1 reply; 5+ messages in thread
From: Sam Tebbs @ 2018-12-20 10:38 UTC (permalink / raw)
  To: Christophe Lyon, Vladimir Makarov; +Cc: gcc Patches, nd

[-- Attachment #1: Type: text/plain, Size: 2787 bytes --]

On 11/24/18 11:29 AM, Christophe Lyon wrote:

> On Thu, 22 Nov 2018 at 18:30, Vladimir Makarov <vmakarov@redhat.com> 
> wrote:
>> The following patch fixes
>>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718
>>
>> The patch adds a special treatment for moves with a hard register in
>> register cost and class calculation.
>>
>> The patch was bootstrapped and tested on x86-64 and ppc64.
>>
>> I found two testsuite regressions because of the patch. The expected
>> generated code for PR82361 test is too specific. GCC with the patch
>> generates the same quality code but with a different hard register on
>> x86-64. So I just changed the test for PR82361.
>>
>> Another test is for ppc64. I think the expected generated code for
>> this test is wrong. I'll submit a changed test for a discussion later.
>>
>> Although I spent much time on the solution and I think it is the
>> right one, the patch is in very sensitive area of RA and may affect
>> expected code generation for many targets. I am ready to work on the
>> new regressions as soon as they are found.
>>
>> The patch was committed as rev. 260385.
>>
> Hi,
>
> This patch introduced at least several ICEs on arm targets:
> on arm-none-linux-gnueabi --with-cpu=cortex-a9:
<snip>
> There are other regressions on the same targets, but not ICEs.
> I can report them later.
>
> Thanks,
>
> Christophe

Hi Christophe and Vladimir,

Here are the regressions seen on arm-none-linux-gnueabihf and arm-none-eabi.

FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times 
ldrh\\tr[0-9]+ 2
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times 
vld1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times 
vmov\\.f16\\ts[0-9]+, r[0-9]+ 2
FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler 
vmov(\\.f16)?\\tr[0-9]+, s[0-9]+
FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov(\\.f16)?\\ts0, 
r[0-9]+
FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\tr[0-9]+, 
s[0-2] 2
FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\ts0, 
r[0-9]+ 2

I didn't see a bug report for these, so I will open one.

It is not clear if the test cases should be adjusted because of your 
patch or if they are failing because of incorrect codegen. Attached is 
the code generated for armv8_2-fp16-move-1.c (one of the test files 
failing) with and without your patch.

Full command line used to compile and test armv8_2-fp16-move-1.c:

bin/gcc armv8_2-fp16-move-1.c -fno-diagnostics-show-caret 
-fno-diagnostics-show-line-numbers -fdiagnostics-color=never -O2 
-mfpu=fp-armv8 -march=armv8.2-a+fp16 -mfloat-abi=hard -ffat-lto-objects 
-fno-ident -S -o armv8_2-fp16-move-1.s.




[-- Attachment #2: armv8_2-fp16-move-1-with.s --]
[-- Type: text/plain, Size: 8451 bytes --]

	.arch armv8.2-a
	.eabi_attribute 28, 1
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 2
	.eabi_attribute 30, 2
	.eabi_attribute 34, 1
	.eabi_attribute 38, 1
	.eabi_attribute 18, 4
	.file	"armv8_2-fp16-move-1.c"
	.text
	.align	1
	.p2align 2,,3
	.global	test_load_1
	.arch armv8.2-a
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_load_1, %function
test_load_1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vld1.16	{d0[0]}, [r0]
	bx	lr
	.size	test_load_1, .-test_load_1
	.align	1
	.p2align 2,,3
	.global	test_load_2
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_load_2, %function
test_load_2:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	add	r3, r0, r1, lsl #1
	vld1.16	{d0[0]}, [r3]
	bx	lr
	.size	test_load_2, .-test_load_2
	.align	1
	.p2align 2,,3
	.global	test_store_1
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_store_1, %function
test_store_1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vst1.16	{d0[0]}, [r0]
	bx	lr
	.size	test_store_1, .-test_store_1
	.align	1
	.p2align 2,,3
	.global	test_store_2
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_store_2, %function
test_store_2:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vmov.f16	r3, s0	@ __fp16
	strh	r3, [r0, r1, lsl #1]	@ __fp16
	bx	lr
	.size	test_store_2, .-test_store_2
	.align	1
	.p2align 2,,3
	.global	test_load_store_1
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_load_store_1, %function
test_load_store_1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	ldrh	r3, [r2, r1, lsl #1]	@ __fp16
	strh	r3, [r0, r1, lsl #1]	@ __fp16
	bx	lr
	.size	test_load_store_1, .-test_load_store_1
	.align	1
	.p2align 2,,3
	.global	test_load_store_2
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_load_store_2, %function
test_load_store_2:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	adds	r1, r1, #2
	add	r3, r2, r1, lsl #1
	add	r0, r0, r1, lsl #1
	vld1.16	{d0[0]}, [r3]
	vmov.f16	r3, s0	@ __fp16
	strh	r3, [r0, #-4]	@ __fp16
	bx	lr
	.size	test_load_store_2, .-test_load_store_2
	.align	1
	.p2align 2,,3
	.global	test_select_1
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_1, %function
test_select_1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	cmp	r0, #0
	vseleq.f16	s0, s1, s0
	bx	lr
	.size	test_select_1, .-test_select_1
	.align	1
	.p2align 2,,3
	.global	test_select_2
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_2, %function
test_select_2:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	cmp	r0, #0
	vseleq.f16	s0, s1, s0
	bx	lr
	.size	test_select_2, .-test_select_2
	.align	1
	.p2align 2,,3
	.global	test_select_3
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_3, %function
test_select_3:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s15, s0
	vcvtb.f32.f16	s14, s1
	vcmp.f32	s15, s14
	vmrs	APSR_nzcv, FPSCR
	vseleq.f16	s0, s1, s2
	bx	lr
	.size	test_select_3, .-test_select_3
	.align	1
	.p2align 2,,3
	.global	test_select_4
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_4, %function
test_select_4:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s15, s0
	vcvtb.f32.f16	s14, s1
	vcmp.f32	s15, s14
	vmrs	APSR_nzcv, FPSCR
	vseleq.f16	s0, s2, s1
	bx	lr
	.size	test_select_4, .-test_select_4
	.align	1
	.p2align 2,,3
	.global	test_select_5
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_5, %function
test_select_5:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s15, s1
	vcmpe.f32	s0, s15
	vmrs	APSR_nzcv, FPSCR
	bmi	.L17
	vmov	s1, s2	@ __fp16
.L17:
	vmov	s0, s1	@ __fp16
	bx	lr
	.size	test_select_5, .-test_select_5
	.align	1
	.p2align 2,,3
	.global	test_select_6
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_6, %function
test_select_6:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s15, s1
	vcmpe.f32	s0, s15
	vmrs	APSR_nzcv, FPSCR
	bls	.L19
	vmov	s1, s2	@ __fp16
.L19:
	vmov	s0, s1	@ __fp16
	bx	lr
	.size	test_select_6, .-test_select_6
	.align	1
	.p2align 2,,3
	.global	test_select_7
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_7, %function
test_select_7:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s15, s0
	vcvtb.f32.f16	s14, s1
	vcmpe.f32	s15, s14
	vmrs	APSR_nzcv, FPSCR
	vselgt.f16	s0, s1, s2
	bx	lr
	.size	test_select_7, .-test_select_7
	.align	1
	.p2align 2,,3
	.global	test_select_8
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_8, %function
test_select_8:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s15, s0
	vcvtb.f32.f16	s14, s1
	vcmpe.f32	s15, s14
	vmrs	APSR_nzcv, FPSCR
	vselge.f16	s0, s1, s2
	bx	lr
	.size	test_select_8, .-test_select_8
	.align	1
	.p2align 2,,3
	.global	test_compare_1
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_1, %function
test_compare_1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmp.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	ne
	movne	r0, #0
	bx	lr
	.size	test_compare_1, .-test_compare_1
	.align	1
	.p2align 2,,3
	.global	test_compare_
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_, %function
test_compare_:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmp.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	eq
	moveq	r0, #0
	bx	lr
	.size	test_compare_, .-test_compare_
	.align	1
	.p2align 2,,3
	.global	test_compare_2
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_2, %function
test_compare_2:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmpe.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	le
	movle	r0, #0
	bx	lr
	.size	test_compare_2, .-test_compare_2
	.align	1
	.p2align 2,,3
	.global	test_compare_3
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_3, %function
test_compare_3:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmpe.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	lt
	movlt	r0, #0
	bx	lr
	.size	test_compare_3, .-test_compare_3
	.align	1
	.p2align 2,,3
	.global	test_compare_4
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_4, %function
test_compare_4:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmpe.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	pl
	movpl	r0, #0
	bx	lr
	.size	test_compare_4, .-test_compare_4
	.align	1
	.p2align 2,,3
	.global	test_compare_5
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_5, %function
test_compare_5:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmpe.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	hi
	movhi	r0, #0
	bx	lr
	.size	test_compare_5, .-test_compare_5
	.section	.note.GNU-stack,"",%progbits

[-- Attachment #3: armv8_2-fp16-move-1-without.s --]
[-- Type: text/plain, Size: 8516 bytes --]

	.arch armv8.2-a
	.eabi_attribute 28, 1
	.eabi_attribute 20, 1
	.eabi_attribute 21, 1
	.eabi_attribute 23, 3
	.eabi_attribute 24, 1
	.eabi_attribute 25, 1
	.eabi_attribute 26, 2
	.eabi_attribute 30, 2
	.eabi_attribute 34, 1
	.eabi_attribute 38, 1
	.eabi_attribute 18, 4
	.file	"armv8_2-fp16-move-1.c"
	.text
	.align	1
	.p2align 2,,3
	.global	test_load_1
	.arch armv8.2-a
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_load_1, %function
test_load_1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vld1.16	{d0[0]}, [r0]
	bx	lr
	.size	test_load_1, .-test_load_1
	.align	1
	.p2align 2,,3
	.global	test_load_2
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_load_2, %function
test_load_2:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	add	r3, r0, r1, lsl #1
	vld1.16	{d0[0]}, [r3]
	bx	lr
	.size	test_load_2, .-test_load_2
	.align	1
	.p2align 2,,3
	.global	test_store_1
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_store_1, %function
test_store_1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vst1.16	{d0[0]}, [r0]
	bx	lr
	.size	test_store_1, .-test_store_1
	.align	1
	.p2align 2,,3
	.global	test_store_2
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_store_2, %function
test_store_2:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vmov.f16	r3, s0	@ __fp16
	strh	r3, [r0, r1, lsl #1]	@ __fp16
	bx	lr
	.size	test_store_2, .-test_store_2
	.align	1
	.p2align 2,,3
	.global	test_load_store_1
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_load_store_1, %function
test_load_store_1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vmov.f16	s0, r3	@ __fp16
	ldrh	r3, [r2, r1, lsl #1]	@ __fp16
	strh	r3, [r0, r1, lsl #1]	@ __fp16
	bx	lr
	.size	test_load_store_1, .-test_load_store_1
	.align	1
	.p2align 2,,3
	.global	test_load_store_2
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_load_store_2, %function
test_load_store_2:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	adds	r1, r1, #2
	add	r0, r0, r1, lsl #1
	ldrh	r3, [r2, r1, lsl #1]	@ __fp16
	vmov.f16	s0, r3	@ __fp16
	strh	r3, [r0, #-4]	@ __fp16
	bx	lr
	.size	test_load_store_2, .-test_load_store_2
	.align	1
	.p2align 2,,3
	.global	test_select_1
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_1, %function
test_select_1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	cmp	r0, #0
	vseleq.f16	s0, s1, s0
	bx	lr
	.size	test_select_1, .-test_select_1
	.align	1
	.p2align 2,,3
	.global	test_select_2
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_2, %function
test_select_2:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	cmp	r0, #0
	vseleq.f16	s0, s1, s0
	bx	lr
	.size	test_select_2, .-test_select_2
	.align	1
	.p2align 2,,3
	.global	test_select_3
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_3, %function
test_select_3:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s15, s0
	vcvtb.f32.f16	s14, s1
	vcmp.f32	s15, s14
	vmrs	APSR_nzcv, FPSCR
	vseleq.f16	s0, s1, s2
	bx	lr
	.size	test_select_3, .-test_select_3
	.align	1
	.p2align 2,,3
	.global	test_select_4
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_4, %function
test_select_4:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s15, s0
	vcvtb.f32.f16	s14, s1
	vcmp.f32	s15, s14
	vmrs	APSR_nzcv, FPSCR
	vseleq.f16	s0, s2, s1
	bx	lr
	.size	test_select_4, .-test_select_4
	.align	1
	.p2align 2,,3
	.global	test_select_5
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_5, %function
test_select_5:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s14, s1
	vmov	s15, s1	@ __fp16
	vcmpe.f32	s0, s14
	vmrs	APSR_nzcv, FPSCR
	bmi	.L17
	vmov	s15, s2	@ __fp16
.L17:
	vmov	s0, s15	@ __fp16
	bx	lr
	.size	test_select_5, .-test_select_5
	.align	1
	.p2align 2,,3
	.global	test_select_6
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_6, %function
test_select_6:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s14, s1
	vmov	s15, s1	@ __fp16
	vcmpe.f32	s0, s14
	vmrs	APSR_nzcv, FPSCR
	bls	.L19
	vmov	s15, s2	@ __fp16
.L19:
	vmov	s0, s15	@ __fp16
	bx	lr
	.size	test_select_6, .-test_select_6
	.align	1
	.p2align 2,,3
	.global	test_select_7
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_7, %function
test_select_7:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s15, s0
	vcvtb.f32.f16	s14, s1
	vcmpe.f32	s15, s14
	vmrs	APSR_nzcv, FPSCR
	vselgt.f16	s0, s1, s2
	bx	lr
	.size	test_select_7, .-test_select_7
	.align	1
	.p2align 2,,3
	.global	test_select_8
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_select_8, %function
test_select_8:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s15, s0
	vcvtb.f32.f16	s14, s1
	vcmpe.f32	s15, s14
	vmrs	APSR_nzcv, FPSCR
	vselge.f16	s0, s1, s2
	bx	lr
	.size	test_select_8, .-test_select_8
	.align	1
	.p2align 2,,3
	.global	test_compare_1
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_1, %function
test_compare_1:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmp.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	ne
	movne	r0, #0
	bx	lr
	.size	test_compare_1, .-test_compare_1
	.align	1
	.p2align 2,,3
	.global	test_compare_
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_, %function
test_compare_:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmp.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	eq
	moveq	r0, #0
	bx	lr
	.size	test_compare_, .-test_compare_
	.align	1
	.p2align 2,,3
	.global	test_compare_2
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_2, %function
test_compare_2:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmpe.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	le
	movle	r0, #0
	bx	lr
	.size	test_compare_2, .-test_compare_2
	.align	1
	.p2align 2,,3
	.global	test_compare_3
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_3, %function
test_compare_3:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmpe.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	lt
	movlt	r0, #0
	bx	lr
	.size	test_compare_3, .-test_compare_3
	.align	1
	.p2align 2,,3
	.global	test_compare_4
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_4, %function
test_compare_4:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmpe.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	pl
	movpl	r0, #0
	bx	lr
	.size	test_compare_4, .-test_compare_4
	.align	1
	.p2align 2,,3
	.global	test_compare_5
	.syntax unified
	.thumb
	.thumb_func
	.fpu fp-armv8
	.type	test_compare_5, %function
test_compare_5:
	@ args = 0, pretend = 0, frame = 0
	@ frame_needed = 0, uses_anonymous_args = 0
	@ link register save eliminated.
	vcvtb.f32.f16	s0, s0
	vcvtb.f32.f16	s1, s1
	mov	r0, #-1
	vcmpe.f32	s0, s1
	vmrs	APSR_nzcv, FPSCR
	it	hi
	movhi	r0, #0
	bx	lr
	.size	test_compare_5, .-test_compare_5
	.section	.note.GNU-stack,"",%progbits

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: LRA patch for PR87718
  2018-12-20 10:38   ` Sam Tebbs
@ 2018-12-20 11:39     ` Sam Tebbs
  0 siblings, 0 replies; 5+ messages in thread
From: Sam Tebbs @ 2018-12-20 11:39 UTC (permalink / raw)
  To: Christophe Lyon, Vladimir Makarov; +Cc: gcc Patches, nd

On 12/20/18 10:38 AM, Sam Tebbs wrote:
> On 11/24/18 11:29 AM, Christophe Lyon wrote:
>
>> On Thu, 22 Nov 2018 at 18:30, Vladimir Makarov <vmakarov@redhat.com>
>> wrote:
>>> The following patch fixes
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718
>>>
>>> The patch adds a special treatment for moves with a hard register in
>>> register cost and class calculation.
>>>
>>> The patch was bootstrapped and tested on x86-64 and ppc64.
>>>
>>> I found two testsuite regressions because of the patch. The expected
>>> generated code for PR82361 test is too specific. GCC with the patch
>>> generates the same quality code but with a different hard register on
>>> x86-64. So I just changed the test for PR82361.
>>>
>>> Another test is for ppc64. I think the expected generated code for
>>> this test is wrong. I'll submit a changed test for a discussion later.
>>>
>>> Although I spent much time on the solution and I think it is the
>>> right one, the patch is in very sensitive area of RA and may affect
>>> expected code generation for many targets. I am ready to work on the
>>> new regressions as soon as they are found.
>>>
>>> The patch was committed as rev. 260385.
>>>
>> Hi,
>>
>> This patch introduced at least several ICEs on arm targets:
>> on arm-none-linux-gnueabi --with-cpu=cortex-a9:
> <snip>
>> There are other regressions on the same targets, but not ICEs.
>> I can report them later.
>>
>> Thanks,
>>
>> Christophe
> Hi Christophe and Vladimir,
>
> Here are the regressions seen on arm-none-linux-gnueabihf and arm-none-eabi.
>
> FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
> ldrh\\tr[0-9]+ 2
> FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
> vld1\\.16\\t{d[0-9]+\\[[0-9]+\\]}, \\[r[0-9]+\\] 2
> FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times
> vmov\\.f16\\ts[0-9]+, r[0-9]+ 2
> FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler
> vmov(\\.f16)?\\tr[0-9]+, s[0-9]+
> FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov(\\.f16)?\\ts0,
> r[0-9]+
> FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\tr[0-9]+,
> s[0-2] 2
> FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler-times vmov\\ts0,
> r[0-9]+ 2
>
> I didn't see a bug report for these, so I will open one.
>
> It is not clear if the test cases should be adjusted because of your
> patch or if they are failing because of incorrect codegen. Attached is
> the code generated for armv8_2-fp16-move-1.c (one of the test files
> failing) with and without your patch.
>
> Full command line used to compile and test armv8_2-fp16-move-1.c:
>
> bin/gcc armv8_2-fp16-move-1.c -fno-diagnostics-show-caret
> -fno-diagnostics-show-line-numbers -fdiagnostics-color=never -O2
> -mfpu=fp-armv8 -march=armv8.2-a+fp16 -mfloat-abi=hard -ffat-lto-objects
> -fno-ident -S -o armv8_2-fp16-move-1.s.
Reported as PR 88560 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88560).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: LRA patch for PR87718
@ 2018-11-23  8:04 Uros Bizjak
  0 siblings, 0 replies; 5+ messages in thread
From: Uros Bizjak @ 2018-11-23  8:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: Vladimir Makarov

Hello!

>  The following patch fixes
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87718
>
>  The patch adds a special treatment for moves with a hard register in register cost and class
> calculation.
>
>  The patch was bootstrapped and tested on x86-64 and ppc64.
>
>  I found two testsuite regressions because of the patch.  The expected generated code for
> PR82361 test is too specific.  GCC with the patch generates the same quality code but with a
> different hard register on x86-64.  So I just changed the test for  PR82361.   Another test is for
> ppc64.  I think the expected generated code for this test is wrong.  I'll submit a changed test
> for a discussion later.   Although I spent much time on the solution and I think it is the right one,
> the patch is in very sensitive area of RA and may affect expected code generation for many
> targets. I am ready to work on the new regressions as soon as they are found.

The patch regressed:

FAIL: gcc.target/i386/pr22076.c scan-assembler-not movl
FAIL: gcc.target/i386/pr81563.c scan-assembler-times movl[\\t
]*-4\\(%ebp\\),[\\t ]*%edi 1
FAIL: gcc.target/i386/pr81563.c scan-assembler-times movl[\\t
]*-8\\(%ebp\\),[\\t ]*%esi 1

on 32 bit x86 target.

The PR22076 moves a value from mm0 via integer register to a volatile location:

        movq    .LC1, %mm0
        paddb   .LC0, %mm0
        movq    %mm0, 16(%esp)
        movl    16(%esp), %eax
        movl    20(%esp), %edx
        movl    %eax, (%esp)
        movl    %edx, 4(%esp)
        movq    (%esp), %mm0
        addl    $28, %esp

where before the patch we had:

        movq    .LC1, %mm0
        paddb   .LC0, %mm0
        movq    %mm0, 8(%esp)
        movq    8(%esp), %mm0
        addl    $20, %esp

The PR81563 looks like a testsuite issue, where the compiler now
allocates call-clobbered register for a value that lives across the
call.

Uros.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-12-20 11:16 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-22 17:29 LRA patch for PR87718 Vladimir Makarov
2018-11-24 11:29 ` Christophe Lyon
2018-12-20 10:38   ` Sam Tebbs
2018-12-20 11:39     ` Sam Tebbs
2018-11-23  8:04 Uros Bizjak

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).