[PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
@ 2018-05-22 16:20 Wilco Dijkstra
  2018-05-22 17:17 ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Wilco Dijkstra @ 2018-05-22 16:20 UTC (permalink / raw)
  To: GCC Patches, James Greenhalgh; +Cc: nd, Richard Sandiford

A recent commit removing '*' from the md files caused a large regression in h264ref.
It turns out aarch64_ira_change_pseudo_allocno_class is no longer effective after the
SVE changes, and the combination results in the regression.  This patch fixes it by
using the new POINTER_AND_FP_REGS register class which is now used instead of ALL_REGS.
Add a missing ? to aarch64_get_lane to fix a failure in the testsuite.

Passes regress, OK for commit?

Since it is a regression introduced in GCC8, OK to backport to GCC8?

ChangeLog:
2018-05-22  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class):
	Use POINTER_AND_FP_REGSinstead of ALL_REGS.
	* config/aarch64/aarch64-simd.md (aarch64_get_lane): Increase cost of r=w alternative.
--

diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2ebd256329c1a6a6b790d16955cbcee3feca456c..3d5fe44b53198a92afb726712c6e9dee890afe38 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2961,7 +2961,7 @@ (define_insn "*aarch64_get_lane_zero_extendsi<mode>"
 ;; is guaranteed so upper bits should be considered undefined.
 ;; RTL uses GCC vector extension indices throughout so flip only for assembly.
 (define_insn "aarch64_get_lane<mode>"
-  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
+  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=?r, w, Utv")
 	(vec_select:<VEL>
 	  (match_operand:VALL_F16 1 "register_operand" "w, w, w")
 	  (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 47d98dfd095cdcd15908a86091cf2f8a4d6137b1..a119760c7f332aded200fa1b5bcfb1bbac7b6420 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1059,16 +1059,17 @@ aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg)
 }
 
 /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS.
-   The register allocator chooses ALL_REGS if FP_REGS and GENERAL_REGS have
-   the same cost even if ALL_REGS has a much larger cost.  ALL_REGS is also
-   used if the cost of both FP_REGS and GENERAL_REGS is lower than the memory
-   cost (in this case the best class is the lowest cost one).  Using ALL_REGS
-   irrespectively of its cost results in bad allocations with many redundant
-   int<->FP moves which are expensive on various cores.
-   To avoid this we don't allow ALL_REGS as the allocno class, but force a
-   decision between FP_REGS and GENERAL_REGS.  We use the allocno class if it
-   isn't ALL_REGS.  Similarly, use the best class if it isn't ALL_REGS.
-   Otherwise set the allocno class depending on the mode.
+   The register allocator chooses POINTER_AND_FP_REGS if FP_REGS and
+   GENERAL_REGS have the same cost - even if POINTER_AND_FP_REGS has a much
+   higher cost.  POINTER_AND_FP_REGS is also used if the cost of both FP_REGS
+   and GENERAL_REGS is lower than the memory cost (in this case the best class
+   is the lowest cost one).  Using POINTER_AND_FP_REGS irrespectively of its
+   cost results in bad allocations with many redundant int<->FP moves which
+   are expensive on various cores.
+   To avoid this we don't allow POINTER_AND_FP_REGS as the allocno class, but
+   force a decision between FP_REGS and GENERAL_REGS.  We use the allocno class
+   if it isn't POINTER_AND_FP_REGS.  Similarly, use the best class if it isn't
+   POINTER_AND_FP_REGS.  Otherwise set the allocno class depending on the mode.
    The result of this is that it is no longer inefficient to have a higher
    memory move cost than the register move cost.
 */
@@ -1079,10 +1080,10 @@ aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class,
 {
   machine_mode mode;
 
-  if (allocno_class != ALL_REGS)
+  if (allocno_class != POINTER_AND_FP_REGS)
     return allocno_class;
 
-  if (best_class != ALL_REGS)
+  if (best_class != POINTER_AND_FP_REGS)
     return best_class;
 
   mode = PSEUDO_REGNO_MODE (regno);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-22 16:20 [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class Wilco Dijkstra
@ 2018-05-22 17:17 ` Richard Sandiford
  2018-05-23 11:22   ` Wilco Dijkstra
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2018-05-22 17:17 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: GCC Patches, James Greenhalgh, nd

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:

> A recent commit removing '*' from the md files caused a large regression
> in h264ref.
> It turns out aarch64_ira_change_pseudo_allocno_class is no longer
> effective after the
> SVE changes, and the combination results in the regression.  This patch
> fixes it by
> using the new POINTER_AND_FP_REGS register class which is now used
> instead of ALL_REGS.
> Add a missing ? to aarch64_get_lane to fix a failure in the testsuite.
>
> Passes regress, OK for commit?
>
> Since it is a regression introduced in GCC8, OK to backport to GCC8?
>
> ChangeLog:
> 2018-05-22  Wilco Dijkstra  <wdijkstr@arm.com>
>
> 	* config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class):
> 	Use POINTER_AND_FP_REGSinstead of ALL_REGS.
> 	* config/aarch64/aarch64-simd.md (aarch64_get_lane): Increase
> cost of r=w alternative.
> --
>
> diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
> index 2ebd256329c1a6a6b790d16955cbcee3feca456c..3d5fe44b53198a92afb726712c6e9dee890afe38 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2961,7 +2961,7 @@ (define_insn "*aarch64_get_lane_zero_extendsi<mode>"
>  ;; is guaranteed so upper bits should be considered undefined.
>  ;; RTL uses GCC vector extension indices throughout so flip only for assembly.
>  (define_insn "aarch64_get_lane<mode>"
> -  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
> +  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=?r, w, Utv")
>  	(vec_select:<VEL>
>  	  (match_operand:VALL_F16 1 "register_operand" "w, w, w")
>  	  (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))]
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 47d98dfd095cdcd15908a86091cf2f8a4d6137b1..a119760c7f332aded200fa1b5bcfb1bbac7b6420 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1059,16 +1059,17 @@ aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg)
>  }
>  
>  /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS.
> -   The register allocator chooses ALL_REGS if FP_REGS and GENERAL_REGS have
> -   the same cost even if ALL_REGS has a much larger cost.  ALL_REGS is also
> -   used if the cost of both FP_REGS and GENERAL_REGS is lower than the memory
> -   cost (in this case the best class is the lowest cost one).  Using ALL_REGS
> -   irrespectively of its cost results in bad allocations with many redundant
> -   int<->FP moves which are expensive on various cores.
> -   To avoid this we don't allow ALL_REGS as the allocno class, but force a
> -   decision between FP_REGS and GENERAL_REGS.  We use the allocno class if it
> -   isn't ALL_REGS.  Similarly, use the best class if it isn't ALL_REGS.
> -   Otherwise set the allocno class depending on the mode.
> +   The register allocator chooses POINTER_AND_FP_REGS if FP_REGS and
> +   GENERAL_REGS have the same cost - even if POINTER_AND_FP_REGS has a much
> +   higher cost.  POINTER_AND_FP_REGS is also used if the cost of both FP_REGS
> +   and GENERAL_REGS is lower than the memory cost (in this case the best class
> +   is the lowest cost one).  Using POINTER_AND_FP_REGS irrespectively of its
> +   cost results in bad allocations with many redundant int<->FP moves which
> +   are expensive on various cores.
> +   To avoid this we don't allow POINTER_AND_FP_REGS as the allocno class, but
> +   force a decision between FP_REGS and GENERAL_REGS.  We use the allocno class
> +   if it isn't POINTER_AND_FP_REGS.  Similarly, use the best class if it isn't
> +   POINTER_AND_FP_REGS.  Otherwise set the allocno class depending on the mode.
>     The result of this is that it is no longer inefficient to have a higher
>     memory move cost than the register move cost.
>  */
> @@ -1079,10 +1080,10 @@ aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class,
>  {
>    machine_mode mode;
>  
> -  if (allocno_class != ALL_REGS)
> +  if (allocno_class != POINTER_AND_FP_REGS)
>      return allocno_class;
>  
> -  if (best_class != ALL_REGS)
> +  if (best_class != POINTER_AND_FP_REGS)
>      return best_class;
>  
>    mode = PSEUDO_REGNO_MODE (regno);

I think it'd be better to use !reg_class_subset_p (POINTER_AND_FP_REGS, ...)
instead of ... != POINTER_AND_FP_REGS, since this in principle still applies
to ALL_REGS too.

FWIW, the patch looks good to me with that change.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-22 17:17 ` Richard Sandiford
@ 2018-05-23 11:22   ` Wilco Dijkstra
  2018-05-23 11:53     ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Wilco Dijkstra @ 2018-05-23 11:22 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches, James Greenhalgh, nd

Richard Sandiford wrote:

> -  if (allocno_class != ALL_REGS)
> +  if (allocno_class != POINTER_AND_FP_REGS)
>      return allocno_class;
>  
> -  if (best_class != ALL_REGS)
> +  if (best_class != POINTER_AND_FP_REGS)
>      return best_class;
>  
>    mode = PSEUDO_REGNO_MODE (regno);

> I think it'd be better to use !reg_class_subset_p (POINTER_AND_FP_REGS, ...)
> instead of ... != POINTER_AND_FP_REGS, since this in principle still applies
> to ALL_REGS too.
> 
> FWIW, the patch looks good to me with that change.

How does reg_class_subset_p help? In my testing I didn't see ALL_REGS ever
used (and I don't believe it's possible to get it with SVE either). And it's not obvious
without looking at the implementation whether subset here means strict subset or not,
so it would obfuscate the clear meaning of the existing patch.

Wilco

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-23 11:22   ` Wilco Dijkstra
@ 2018-05-23 11:53     ` Richard Sandiford
  2018-05-25 13:41       ` Wilco Dijkstra
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2018-05-23 11:53 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: GCC Patches, James Greenhalgh, nd

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Richard Sandiford wrote:
>> -  if (allocno_class != ALL_REGS)
>> +  if (allocno_class != POINTER_AND_FP_REGS)
>>      return allocno_class;
>>  
>> -  if (best_class != ALL_REGS)
>> +  if (best_class != POINTER_AND_FP_REGS)
>>      return best_class;
>>  
>>    mode = PSEUDO_REGNO_MODE (regno);
>
>> I think it'd be better to use !reg_class_subset_p (POINTER_AND_FP_REGS, ...)
>> instead of ... != POINTER_AND_FP_REGS, since this in principle still applies
>> to ALL_REGS too.
>> 
>> FWIW, the patch looks good to me with that change.
>
> How does reg_class_subset_p help? In my testing I didn't see ALL_REGS ever
> used (and I don't believe it's possible to get it with SVE either). And
> it's not obvious
> without looking at the implementation whether subset here means strict
> subset or not,
> so it would obfuscate the clear meaning of the existing patch.

But I think the fact that we need this patch shows why hard-coding the
names of union classes is dangerous.  IMO the question isn't whether we
see ALL_REGS used but whether there's a reason in principle why it
wouldn't be used.  E.g. ALL_REGS is the starting class for the
best_class calculation, and LRA uses ALL_REGS as the default choice
for scratch reload registers.

It's not like we can claim that the testsuite will flag up if this
goes wrong again, since AIUI there are no tests that show the reason
we need to make this change.  (I realise the patch includes an md
change to keep the testsuite happy, but that's not the same thing.
I mean more a test that shows why removing the '*'s made things
worse, through no fault of its own.)

Conceptually what we're saying here is that if the given classes
include both GENERAL_REGS and FP_REGS, we'll choose between them
based on the mode of the register.  And that makes sense for any
class that includes both GENERAL_REGS and FP_REGS.  We could write
it that way if it seems better, i.e.:

  if (!reg_class_subset_p (GENERAL_REGS, ...)
      || !reg_class_subset_p (FP_REGS, ...))
    ...

That way we don't mention any union classes, and I think the meaning
is clear in the context of eventually returning GENERAL_REGS or FP_REGS.

reg_class_subset_p tests for the normal inclusive subset relation
rather than "strict subset".

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-23 11:53     ` Richard Sandiford
@ 2018-05-25 13:41       ` Wilco Dijkstra
  2018-05-29 16:01         ` James Greenhalgh
  0 siblings, 1 reply; 15+ messages in thread
From: Wilco Dijkstra @ 2018-05-25 13:41 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: GCC Patches, James Greenhalgh, nd

Richard Sandiford wrote:

> Conceptually what we're saying here is that if the given classes
> include both GENERAL_REGS and FP_REGS, we'll choose between them
> based on the mode of the register.  And that makes sense for any
> class that includes both GENERAL_REGS and FP_REGS.  We could write
> it that way if it seems better, i.e.:
>
>  if (!reg_class_subset_p (GENERAL_REGS, ...)
>      || !reg_class_subset_p (FP_REGS, ...))
>    ...
>
> That way we don't mention any union classes, and I think the meaning
> is clear in the context of eventually returning GENERAL_REGS or FP_REGS.
>
> reg_class_subset_p tests for the normal inclusive subset relation
> rather than "strict subset".

Right, checking for a subset of GENERAL_REGS and FP_REGS does make sense
and is more clear as well. It appears to behave identically, so here is the new version:


A recent commit removing '*' from the md files caused a large regression in h264ref.
It turns out aarch64_ira_change_pseudo_allocno_class is no longer effective after the
SVE changes, and the combination results in the regression.  This patch fixes it by
explicitly checking for a subset of GENERAL_REGS and FP_REGS.
Add a missing ? to aarch64_get_lane to fix a failure in the testsuite.

Passes regress, OK for commit? Since it is a regression introduced in GCC8, OK to
backport to GCC8?

ChangeLog:
2018-05-25  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class):
	Check for subset of GENERAL_REGS and FP_REGS.
	* config/aarch64/aarch64-simd.md (aarch64_get_lane): Increase cost of r=w alternative.

--
diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 2ebd256329c1a6a6b790d16955cbcee3feca456c..3d5fe44b53198a92afb726712c6e9dee890afe38 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2961,7 +2961,7 @@ (define_insn "*aarch64_get_lane_zero_extendsi<mode>"
 ;; is guaranteed so upper bits should be considered undefined.
 ;; RTL uses GCC vector extension indices throughout so flip only for assembly.
 (define_insn "aarch64_get_lane<mode>"
-  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=r, w, Utv")
+  [(set (match_operand:<VEL> 0 "aarch64_simd_nonimmediate_operand" "=?r, w, Utv")
 	(vec_select:<VEL>
 	  (match_operand:VALL_F16 1 "register_operand" "w, w, w")
 	  (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))]
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 47d98dfd095cdcd15908a86091cf2f8a4d6137b1..6e7722187f0f79195c8b6c43f463a3ac9aa61742 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1059,16 +1059,17 @@ aarch64_err_no_fpadvsimd (machine_mode mode, const char *msg)
 }
 
 /* Implement TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS.
-   The register allocator chooses ALL_REGS if FP_REGS and GENERAL_REGS have
-   the same cost even if ALL_REGS has a much larger cost.  ALL_REGS is also
-   used if the cost of both FP_REGS and GENERAL_REGS is lower than the memory
-   cost (in this case the best class is the lowest cost one).  Using ALL_REGS
-   irrespectively of its cost results in bad allocations with many redundant
-   int<->FP moves which are expensive on various cores.
-   To avoid this we don't allow ALL_REGS as the allocno class, but force a
-   decision between FP_REGS and GENERAL_REGS.  We use the allocno class if it
-   isn't ALL_REGS.  Similarly, use the best class if it isn't ALL_REGS.
-   Otherwise set the allocno class depending on the mode.
+   The register allocator chooses POINTER_AND_FP_REGS if FP_REGS and
+   GENERAL_REGS have the same cost - even if POINTER_AND_FP_REGS has a much
+   higher cost.  POINTER_AND_FP_REGS is also used if the cost of both FP_REGS
+   and GENERAL_REGS is lower than the memory cost (in this case the best class
+   is the lowest cost one).  Using POINTER_AND_FP_REGS irrespectively of its
+   cost results in bad allocations with many redundant int<->FP moves which
+   are expensive on various cores.
+   To avoid this we don't allow POINTER_AND_FP_REGS as the allocno class, but
+   force a decision between FP_REGS and GENERAL_REGS.  We use the allocno class
+   if it isn't POINTER_AND_FP_REGS.  Similarly, use the best class if it isn't
+   POINTER_AND_FP_REGS.  Otherwise set the allocno class depending on the mode.
    The result of this is that it is no longer inefficient to have a higher
    memory move cost than the register move cost.
 */
@@ -1079,10 +1080,12 @@ aarch64_ira_change_pseudo_allocno_class (int regno, reg_class_t allocno_class,
 {
   machine_mode mode;
 
-  if (allocno_class != ALL_REGS)
+  if (reg_class_subset_p (allocno_class, GENERAL_REGS)
+      || reg_class_subset_p (allocno_class, FP_REGS))
     return allocno_class;
 
-  if (best_class != ALL_REGS)
+  if (reg_class_subset_p (best_class, GENERAL_REGS)
+      || reg_class_subset_p (best_class, FP_REGS))
     return best_class;
 
   mode = PSEUDO_REGNO_MODE (regno);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-25 13:41       ` Wilco Dijkstra
@ 2018-05-29 16:01         ` James Greenhalgh
  2018-05-29 18:12           ` Wilco Dijkstra
  0 siblings, 1 reply; 15+ messages in thread
From: James Greenhalgh @ 2018-05-29 16:01 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Richard Sandiford, GCC Patches, nd

On Fri, May 25, 2018 at 08:16:03AM -0500, Wilco Dijkstra wrote:
> Richard Sandiford wrote:
> 
> > Conceptually what we're saying here is that if the given classes
> > include both GENERAL_REGS and FP_REGS, we'll choose between them
> > based on the mode of the register.Â  And that makes sense for any
> > class that includes both GENERAL_REGS and FP_REGS.Â  We could write
> > it that way if it seems better, i.e.:
> >
> >Â  if (!reg_class_subset_p (GENERAL_REGS, ...)
> >Â Â Â Â Â  || !reg_class_subset_p (FP_REGS, ...))
> >Â Â Â  ...
> >
> > That way we don't mention any union classes, and I think the meaning
> > is clear in the context of eventually returning GENERAL_REGS or FP_REGS.
> >
> > reg_class_subset_p tests for the normal inclusive subset relation
> > rather than "strict subset".
> 
> Right, checking for a subset of GENERAL_REGS and FP_REGS does make sense
> and is more clear as well. It appears to behave identically, so here is the new version:
> 
> 
> A recent commit removing '*' from the md files caused a large regression in h264ref.
> It turns out aarch64_ira_change_pseudo_allocno_class is no longer effective after the
> SVE changes, and the combination results in the regression.  This patch fixes it by
> explicitly checking for a subset of GENERAL_REGS and FP_REGS.

OK for trunk.

> Add a missing ? to aarch64_get_lane to fix a failure in the testsuite.

I'd prefer more detail than this for a workaround; which test, why did it
start to fail, why is this the right solution, etc.

Thanks,
James

> ChangeLog:
> 2018-05-25  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class):
> 	Check for subset of GENERAL_REGS and FP_REGS.
> 	* config/aarch64/aarch64-simd.md (aarch64_get_lane): Increase cost of r=w alternative.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-29 16:01         ` James Greenhalgh
@ 2018-05-29 18:12           ` Wilco Dijkstra
  2018-05-29 20:32             ` Richard Sandiford
  2018-05-31  8:38             ` Christophe Lyon
  0 siblings, 2 replies; 15+ messages in thread
From: Wilco Dijkstra @ 2018-05-29 18:12 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: Richard Sandiford, GCC Patches, nd

James Greenhalgh wrote:

> > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite.
>
> > I'd prefer more detail than this for a workaround; which test, why did it
> > start to fail, why is this the right solution, etc.

It was gcc.target/aarch64/vect_copy_lane_1.c generating:

test_copy_laneq_f64:
        umov    x0, v1.d[1]
        fmov    d0, x0
        ret

For some reason returning a double uses DImode temporaries, so it's essential
to prefer FP_REGS here and mark the lane copy correctly.

Wilco
    

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-29 18:12           ` Wilco Dijkstra
@ 2018-05-29 20:32             ` Richard Sandiford
  2018-05-30 10:40               ` Wilco Dijkstra
  2018-05-31  8:38             ` Christophe Lyon
  1 sibling, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2018-05-29 20:32 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: James Greenhalgh, GCC Patches, nd

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> James Greenhalgh wrote:
>
>> > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite.
>>
>> > I'd prefer more detail than this for a workaround; which test, why did it
>> > start to fail, why is this the right solution, etc.
>
> It was gcc.target/aarch64/vect_copy_lane_1.c generating:
>
> test_copy_laneq_f64:
>         umov    x0, v1.d[1]
>         fmov    d0, x0
>         ret
>
> For some reason returning a double uses DImode temporaries, so it's essential
> to prefer FP_REGS here and mark the lane copy correctly.

The "?" change seems to make intrinsic sense given the extra cost of the
GPR alternative.  But I think the real reason for this failure is that
we define no V1DF patterns, and target-independent code falls back to
using moves in the corresponding *integer* mode.  So for that function
we generate the rather ugly code:

(note 6 1 3 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 3 6 2 2 (clobber (reg/v:V1DF 92 [ aD.21157 ])) "vect_copy_lane_1.c":45 -1
     (nil))
(insn 2 3 4 2 (set (subreg:DI (reg/v:V1DF 92 [ aD.21157 ]) 0)
        (reg:DI 32 v0 [ aD.21157 ])) "vect_copy_lane_1.c":45 47 {*movdi_aarch64}
     (nil))
(insn 4 2 5 2 (set (reg/v:V2DF 93 [ bD.21158 ])
        (reg:V2DF 33 v1 [ bD.21158 ])) "vect_copy_lane_1.c":45 1063 {*aarch64_simd_movv2df}
     (nil))
(note 5 4 8 2 NOTE_INSN_FUNCTION_BEG)
(insn 8 5 9 2 (set (reg:DF 95)
        (vec_select:DF (reg/v:V2DF 93 [ bD.21158 ])
            (parallel [
                    (const_int 1 [0x1])
                ]))) "./include/arm_neon.h":14441 1993 {aarch64_get_lanev2df}
     (nil))
(insn 9 8 11 2 (set (reg:DI 96)
        (subreg:DI (reg:DF 95) 0)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64}
     (nil))
(insn 11 9 10 2 (clobber (reg:V1DF 91 [ <retval> ])) "vect_copy_lane_1.c":45 -1
     (nil))
(insn 10 11 15 2 (set (subreg:DI (reg:V1DF 91 [ <retval> ]) 0)
        (reg:DI 96)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64}
     (nil))
(insn 15 10 16 2 (set (reg:DI 32 v0)
        (subreg:DI (reg:V1DF 91 [ <retval> ]) 0)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64}
     (nil))
(insn 16 15 0 2 (use (reg/i:V1DF 32 v0)) "vect_copy_lane_1.c":45 -1
     (nil))

which by IRA gets optimised to:

(insn 9 8 15 2 (set (subreg:DF (reg:DI 96) 0)
        (vec_select:DF (reg:V2DF 33 v1 [ bD.21158 ])
            (parallel [
                    (const_int 1 [0x1])
                ]))) "vect_copy_lane_1.c":45 1993 {aarch64_get_lanev2df}
     (expr_list:REG_DEAD (reg:V2DF 33 v1 [ bD.21158 ])
        (nil)))
(insn 15 9 16 2 (set (reg:DI 32 v0)
        (reg:DI 96)) "vect_copy_lane_1.c":45 47 {*movdi_aarch64}
     (expr_list:REG_DEAD (reg:DI 96)
        (nil)))
(insn 16 15 18 2 (use (reg/i:V1DF 32 v0)) "vect_copy_lane_1.c":45 -1
     (nil))

with the move now being done purely in DImode.  This defeats the
heuristic in aarch64_ira_change_pseudo_allocno_class because the
pseudo appears to be a normal integer rather than a (float) vector.

Although the "?" fixes this particular instance, I think more
complicated V1DF code would still regress by being forced to
use GENERAL_REGS.  Of course, the fix is to add the move pattern
rather than disable the heuristic...

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-29 20:32             ` Richard Sandiford
@ 2018-05-30 10:40               ` Wilco Dijkstra
  2018-05-30 19:01                 ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Wilco Dijkstra @ 2018-05-30 10:40 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: James Greenhalgh, GCC Patches, nd

Richard Sandiford <richard.sandiford@linaro.org>

> The "?" change seems to make intrinsic sense given the extra cost of the
> GPR alternative.  But I think the real reason for this failure is that
> we define no V1DF patterns, and target-independent code falls back to
> using moves in the corresponding *integer* mode.  So for that function
> we generate the rather ugly code:

This:

typedef struct { double x; } X;
X f2(X *p)
{
  return *p;
}

emits at expand:

(insn 6 3 7 2 (set (reg:DF 90 [ D.21009 ])
        (mem:DF (reg/v/f:DI 92 [ p ]) [2 *p_2(D)+0 S8 A64])) "vect_copy_lane_1.c":26 -1
     (nil))
(insn 7 6 8 2 (set (subreg:DF (reg:DI 94) 0)
        (reg:DF 90 [ D.21009 ])) "vect_copy_lane_1.c":26 -1
     (nil))
(insn 8 7 9 2 (set (reg:DI 95)
        (reg:DI 94)) "vect_copy_lane_1.c":26 -1
     (nil))
(insn 9 8 13 2 (set (reg:DF 91 [ <retval> ])
        (subreg:DF (reg:DI 95) 0)) "vect_copy_lane_1.c":26 -1
     (nil))

So the underlying cause is the structure passing code. Things get worse when you return
2 doubles and it really becomes horrific at 3...

Wilco
    

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-30 10:40               ` Wilco Dijkstra
@ 2018-05-30 19:01                 ` Richard Sandiford
  0 siblings, 0 replies; 15+ messages in thread
From: Richard Sandiford @ 2018-05-30 19:01 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: James Greenhalgh, GCC Patches, nd

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Richard Sandiford <richard.sandiford@linaro.org>
>> The "?" change seems to make intrinsic sense given the extra cost of the
>> GPR alternative.  But I think the real reason for this failure is that
>> we define no V1DF patterns, and target-independent code falls back to
>> using moves in the corresponding *integer* mode.  So for that function
>> we generate the rather ugly code:
>
> This:
>
> typedef struct { double x; } X;
> X f2(X *p)
> {
>   return *p;
> }
>
> emits at expand:
>
> (insn 6 3 7 2 (set (reg:DF 90 [ D.21009 ])
>         (mem:DF (reg/v/f:DI 92 [ p ]) [2 *p_2(D)+0 S8 A64])) "vect_copy_lane_1.c":26 -1
>      (nil))
> (insn 7 6 8 2 (set (subreg:DF (reg:DI 94) 0)
>         (reg:DF 90 [ D.21009 ])) "vect_copy_lane_1.c":26 -1
>      (nil))
> (insn 8 7 9 2 (set (reg:DI 95)
>         (reg:DI 94)) "vect_copy_lane_1.c":26 -1
>      (nil))
> (insn 9 8 13 2 (set (reg:DF 91 [ <retval> ])
>         (subreg:DF (reg:DI 95) 0)) "vect_copy_lane_1.c":26 -1
>      (nil))
>
> So the underlying cause is the structure passing code. Things get
> worse when you return 2 doubles and it really becomes horrific at 3...

Yeah, the handling of structures can also be poor, but float64x1_t is a
vector type rather than a structure, so I don't think the above is the
problem in the specific case of test_copy_laneq_f64.

float64x1_t has the TYPE_MODE we want (V1DF).  But because we have
no V1DF move pattern, it ends up being moved as a DI instead.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-29 18:12           ` Wilco Dijkstra
  2018-05-29 20:32             ` Richard Sandiford
@ 2018-05-31  8:38             ` Christophe Lyon
  2018-05-31  9:32               ` Richard Sandiford
  1 sibling, 1 reply; 15+ messages in thread
From: Christophe Lyon @ 2018-05-31  8:38 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: James Greenhalgh, Richard Sandiford, GCC Patches, nd

On 29 May 2018 at 19:34, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> James Greenhalgh wrote:
>
>> > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite.
>>
>> > I'd prefer more detail than this for a workaround; which test, why did it
>> > start to fail, why is this the right solution, etc.
>
> It was gcc.target/aarch64/vect_copy_lane_1.c generating:
>
> test_copy_laneq_f64:
>         umov    x0, v1.d[1]
>         fmov    d0, x0
>         ret
>
> For some reason returning a double uses DImode temporaries, so it's essential
> to prefer FP_REGS here and mark the lane copy correctly.
>
> Wilco
>

Hi Wilco,

This has probably been reported elsewhere already but I can't find
such a report, so sorry for possible duplicate,
but this patch is causing ICEs on aarch64
FAIL:    gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
(internal compiler error)
FAIL:    gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
(internal compiler error)

and also many scan-assembler regressions:

http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/260951/report-build-info.html

Can you check?

Thanks

Christophe

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-31  8:38             ` Christophe Lyon
@ 2018-05-31  9:32               ` Richard Sandiford
  2018-05-31 11:22                 ` Wilco Dijkstra
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2018-05-31  9:32 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Wilco Dijkstra, James Greenhalgh, GCC Patches, nd

Christophe Lyon <christophe.lyon@linaro.org> writes:
> On 29 May 2018 at 19:34, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>> James Greenhalgh wrote:
>>
>>> > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite.
>>>
>>> > I'd prefer more detail than this for a workaround; which test, why did it
>>> > start to fail, why is this the right solution, etc.
>>
>> It was gcc.target/aarch64/vect_copy_lane_1.c generating:
>>
>> test_copy_laneq_f64:
>>         umov    x0, v1.d[1]
>>         fmov    d0, x0
>>         ret
>>
>> For some reason returning a double uses DImode temporaries, so it's essential
>> to prefer FP_REGS here and mark the lane copy correctly.
>>
>> Wilco
>>
>
> Hi Wilco,
>
> This has probably been reported elsewhere already but I can't find
> such a report, so sorry for possible duplicate,
> but this patch is causing ICEs on aarch64
> FAIL:    gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> (internal compiler error)
> FAIL:    gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
> (internal compiler error)
>
> and also many scan-assembler regressions:
>
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/260951/report-build-info.html

Thanks for the heads-up.  Looks like they're all SVE, so I'll take this.

Richard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-31  9:32               ` Richard Sandiford
@ 2018-05-31 11:22                 ` Wilco Dijkstra
  2018-05-31 12:53                   ` Richard Sandiford
  0 siblings, 1 reply; 15+ messages in thread
From: Wilco Dijkstra @ 2018-05-31 11:22 UTC (permalink / raw)
  To: Richard Sandiford, Christophe Lyon; +Cc: James Greenhalgh, GCC Patches, nd

Richard Sandiford wrote:

>> This has probably been reported elsewhere already but I can't find
>> such a report, so sorry for possible duplicate,
>> but this patch is causing ICEs on aarch64
>> FAIL:    gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
>> (internal compiler error)
>> FAIL:    gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
>> (internal compiler error)
>>
>> and also many scan-assembler regressions:
>>
>>  http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/260951/report-build-info.html
>
> Thanks for the heads-up.  Looks like they're all SVE, so I'll take this.

It seems this is due to unnecessary spills of PR_REGS - the subset doesn't work for those.
The original proposal doing:

  if (allocno_class != POINTER_AND_FP_REGS)
    return allocno_class;

doesn't appear to affect SVE. However the question is whether the register allocator
can get confused about PR_REGS and end up with POINTER_AND_FP_REGS for
both the allocno_class and best_class? If so then the return needs to support predicate
modes too.

Wilco

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-31 11:22                 ` Wilco Dijkstra
@ 2018-05-31 12:53                   ` Richard Sandiford
  2018-05-31 19:27                     ` James Greenhalgh
  0 siblings, 1 reply; 15+ messages in thread
From: Richard Sandiford @ 2018-05-31 12:53 UTC (permalink / raw)
  To: Wilco Dijkstra; +Cc: Christophe Lyon, James Greenhalgh, GCC Patches, nd

Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> Richard Sandiford wrote:
>
>>> This has probably been reported elsewhere already but I can't find
>>> such a report, so sorry for possible duplicate,
>>> but this patch is causing ICEs on aarch64
>>> FAIL:    gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
>>> (internal compiler error)
>>> FAIL:    gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
>>> (internal compiler error)
>>>
>>> and also many scan-assembler regressions:
>>>
>>>  http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/260951/report-build-info.html
>>
>> Thanks for the heads-up.  Looks like they're all SVE, so I'll take this.
>
> It seems this is due to unnecessary spills of PR_REGS - the subset doesn't work for those.

It does, but I'd originally suggested:

  if (!reg_class_subset_p (GENERAL_REGS, ...)
      || !reg_class_subset_p (FP_REGS, ...))
    ...bail out...

whereas the committed patch had:

  if (reg_class_subset_p (..., GENERAL_REGS)
      || reg_class_subset_p (..., FP_REGS))
    ...bail out...

That's an important difference.  The idea with the first was that
we should only make a choice between GENERAL_REGS and FP_REGS
if the original classes included both of them.  And that's what
we want because the new class has to be a refinement of the
original: it shouldn't include entirely new registers.

The committed version instead says that we won't make a choice
between GENERAL_REGS and FP_REGS if one of the classes is already
specific to one of them.  I think this would also lead to us changing
POINTER_REGS to GENERAL_REGS, although I don't know how much that
matters in practice.

> The original proposal doing:
>
>   if (allocno_class != POINTER_AND_FP_REGS)
>     return allocno_class;
>
> doesn't appear to affect SVE. However the question is whether the
> register allocator can get confused about PR_REGS and end up with
> POINTER_AND_FP_REGS for both the allocno_class and best_class? If so
> then the return needs to support predicate modes too.

Yeah, that shouldn't happen, since predicate modes are only allowed in
predicate registers.

I think the reduc_1 ICE is a separate bug that I'll post a patch for,
but it goes latent again after the patch below.

Tested on aarch64-linux-gnu.  I don't think it can be called obvious
given the above, and it's only SVE-specifc by chance, so: OK to install?

Thanks,
Richard


2018-05-31  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class):
	Fix subreg tests so that we only return a choice between
	GENERAL_REGS and FP_REGS if the original classes included both.

Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2018-05-30 19:31:14.212387813 +0100
+++ gcc/config/aarch64/aarch64.c	2018-05-31 13:12:56.836974021 +0100
@@ -1108,12 +1108,12 @@ aarch64_ira_change_pseudo_allocno_class
 {
   machine_mode mode;
 
-  if (reg_class_subset_p (allocno_class, GENERAL_REGS)
-      || reg_class_subset_p (allocno_class, FP_REGS))
+  if (!reg_class_subset_p (GENERAL_REGS, allocno_class)
+      || !reg_class_subset_p (FP_REGS, allocno_class))
     return allocno_class;
 
-  if (reg_class_subset_p (best_class, GENERAL_REGS)
-      || reg_class_subset_p (best_class, FP_REGS))
+  if (!reg_class_subset_p (GENERAL_REGS, best_class)
+      || !reg_class_subset_p (FP_REGS, best_class))
     return best_class;
 
   mode = PSEUDO_REGNO_MODE (regno);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class
  2018-05-31 12:53                   ` Richard Sandiford
@ 2018-05-31 19:27                     ` James Greenhalgh
  0 siblings, 0 replies; 15+ messages in thread
From: James Greenhalgh @ 2018-05-31 19:27 UTC (permalink / raw)
  To: Wilco Dijkstra, Christophe Lyon, GCC Patches, richard.sandiford; +Cc: nd

On Thu, May 31, 2018 at 07:23:29AM -0500, Richard Sandiford wrote:
> Wilco Dijkstra <Wilco.Dijkstra@arm.com> writes:
> > Richard Sandiford wrote:
> >
> >>> This has probably been reported elsewhere already but I can't find
> >>> such a report, so sorry for possible duplicate,
> >>> but this patch is causing ICEs on aarch64
> >>> FAIL:Â Â Â  gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve
> >>> (internal compiler error)
> >>> FAIL:Â Â Â  gcc.target/aarch64/sve/reduc_5.c -march=armv8.2-a+sve
> >>> (internal compiler error)
> >>>
> >>> and also many scan-assembler regressions:
> >>>
> >>>  http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/260951/report-build-info.html
> >>
> >> Thanks for the heads-up.Â  Looks like they're all SVE, so I'll take this.
> >
> > It seems this is due to unnecessary spills of PR_REGS - the subset doesn't work for those.
> 
> It does, but I'd originally suggested:
> 
>   if (!reg_class_subset_p (GENERAL_REGS, ...)
>       || !reg_class_subset_p (FP_REGS, ...))
>     ...bail out...
> 
> whereas the committed patch had:
> 
>   if (reg_class_subset_p (..., GENERAL_REGS)
>       || reg_class_subset_p (..., FP_REGS))
>     ...bail out...
> 
> That's an important difference.  The idea with the first was that
> we should only make a choice between GENERAL_REGS and FP_REGS
> if the original classes included both of them.  And that's what
> we want because the new class has to be a refinement of the
> original: it shouldn't include entirely new registers.
> 
> The committed version instead says that we won't make a choice
> between GENERAL_REGS and FP_REGS if one of the classes is already
> specific to one of them.  I think this would also lead to us changing
> POINTER_REGS to GENERAL_REGS, although I don't know how much that
> matters in practice.

Sorry to have missed this detail in review.

> > The original proposal doing:
> >
> >   if (allocno_class != POINTER_AND_FP_REGS)
> >     return allocno_class;
> >
> > doesn't appear to affect SVE. However the question is whether the
> > register allocator can get confused about PR_REGS and end up with
> > POINTER_AND_FP_REGS for both the allocno_class and best_class? If so
> > then the return needs to support predicate modes too.
> 
> Yeah, that shouldn't happen, since predicate modes are only allowed in
> predicate registers.
> 
> I think the reduc_1 ICE is a separate bug that I'll post a patch for,
> but it goes latent again after the patch below.
> 
> Tested on aarch64-linux-gnu.  I don't think it can be called obvious
> given the above, and it's only SVE-specifc by chance, so: OK to install?

This is OK for trunk.

Thanks,
James

> 2018-05-31  Richard Sandiford  <richard.sandiford@linaro.org>
> 
> gcc/
> 	* config/aarch64/aarch64.c (aarch64_ira_change_pseudo_allocno_class):
> 	Fix subreg tests so that we only return a choice between
> 	GENERAL_REGS and FP_REGS if the original classes included both.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2018-05-31 19:27 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-22 16:20 [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class Wilco Dijkstra
2018-05-22 17:17 ` Richard Sandiford
2018-05-23 11:22   ` Wilco Dijkstra
2018-05-23 11:53     ` Richard Sandiford
2018-05-25 13:41       ` Wilco Dijkstra
2018-05-29 16:01         ` James Greenhalgh
2018-05-29 18:12           ` Wilco Dijkstra
2018-05-29 20:32             ` Richard Sandiford
2018-05-30 10:40               ` Wilco Dijkstra
2018-05-30 19:01                 ` Richard Sandiford
2018-05-31  8:38             ` Christophe Lyon
2018-05-31  9:32               ` Richard Sandiford
2018-05-31 11:22                 ` Wilco Dijkstra
2018-05-31 12:53                   ` Richard Sandiford
2018-05-31 19:27                     ` James Greenhalgh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).