[AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
@ 2015-05-20 13:02 Jiong Wang
  2015-07-22 11:01 ` [AArch64] PR63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER Jiong Wang
  2015-07-22 11:37 ` [AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER James Greenhalgh
  0 siblings, 2 replies; 12+ messages in thread
From: Jiong Wang @ 2015-05-20 13:02 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

Current IRA still use both target macros in a few places.

Tell IRA to use the order we defined rather than with it's own cost
calculation. Allocate caller saved first, then callee saved.

This is especially useful for LR/x30, as it's free to allocate and is
pure caller saved when used in leaf function.

Haven't noticed significant impact on benchmarks, but by grepping some
keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
is smaller.

OK for trunk?

2015-05-19  Jiong. Wang  <jiong.wang@arm.com>

gcc/
  PR 63521
  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
  (HONOR_REG_ALLOC_ORDER): Define.

Regards,
Jiong


[-- Attachment #2: fix-v.patch --]
[-- Type: text/x-diff, Size: 1295 bytes --]

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bf59e40..0acdf10 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -337,6 +337,31 @@ extern unsigned long aarch64_tune_flags;
     V_ALIASES(28), V_ALIASES(29), V_ALIASES(30), V_ALIASES(31)  \
   }
 
+#define REG_ALLOC_ORDER				\
+{						\
+  /* Reverse order for argument registers.  */	\
+  7, 6, 5, 4, 3, 2, 1, 0,			\
+  /* Other caller-saved registers.  */		\
+  8, 9, 10, 11, 12, 13, 14, 15,			\
+  16, 17, 18, 30,				\
+  /* Callee-saved registers.  */		\
+  19, 20, 21, 22, 23, 24, 25, 26,		\
+  27, 28,					\
+  /* All other registers.  */			\
+  29, 31,					\
+  /* Reverse order for argument vregisters.  */	\
+  39, 38, 37, 36, 35, 34, 33, 32,		\
+  /* Other caller-saved vregisters.  */		\
+  48, 49, 50, 51, 52, 53, 54, 55,		\
+  56, 57, 58, 59, 60, 61, 62, 63,		\
+  /* Callee-saved vregisters.  */		\
+  40, 41, 42, 43, 44, 45, 46, 47,		\
+  /* Other pseudo registers.  */		\
+  64, 65, 66					\
+}
+
+#define HONOR_REG_ALLOC_ORDER 1
+
 /* Say that the epilogue uses the return address register.  Note that
    in the case of sibcalls, the values "used by the epilogue" are
    considered live at the start of the called function.  */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [AArch64] PR63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2015-05-20 13:02 [AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER Jiong Wang
@ 2015-07-22 11:01 ` Jiong Wang
  2015-07-22 11:37 ` [AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER James Greenhalgh
  1 sibling, 0 replies; 12+ messages in thread
From: Jiong Wang @ 2015-07-22 11:01 UTC (permalink / raw)
  To: gcc-patches


Jiong Wang writes:

> Current IRA still use both target macros in a few places.
>
> Tell IRA to use the order we defined rather than with it's own cost
> calculation. Allocate caller saved first, then callee saved.
>
> This is especially useful for LR/x30, as it's free to allocate and is
> pure caller saved when used in leaf function.
>
> Haven't noticed significant impact on benchmarks, but by grepping some
> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
> is smaller.
>
> OK for trunk?
>
> 2015-05-19  Jiong. Wang  <jiong.wang@arm.com>
>
> gcc/
>   PR 63521
>   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>   (HONOR_REG_ALLOC_ORDER): Define.
>
> Regards,
> Jiong

Ping.

I know it's hard to notice the register allocation improvements by this
hook as current IRA/LRA has improved register allocation quite a lot.

But given the example like below:

test.c
======

double dec (double, double);

int cal (int a, int b, double d, double e)
{
  double sum = dec (a , a + b);
  sum = dec (b, a - b);
  sum = dec (sum, a * b);
  return d + e + sum;
}

Although the instruction number is the same before and after this patch,
but the instruction scheduling looks better after this patch as we
allocated w7 instead of w0 there is few instruction dependecies.

Before Patch (-O2)
======
cal:
        stp     x29, x30, [sp, -48]!
        add     x29, sp, 0
        stp     x19, x20, [sp, 16]
        stp     d8, d9, [sp, 32]
        mov     w19, w0
        add     w0, w0, w1
        fmov    d9, d1
        mov     w20, w1
        fmov    d8, d0
        scvtf   d1, w0
        scvtf   d0, w19
        bl      dec
        scvtf   d0, w20 
        sub     w0, w19, w20
        mul     w19, w19, w20
        scvtf   d1, w0
        bl      dec
        scvtf   d1, w19
        bl      dec
        fadd    d8, d8, d9
        ldp     x19, x20, [sp, 16]
        fadd    d0, d8, d0
        ldp     d8, d9, [sp, 32]
        ldp     x29, x30, [sp], 48
        fcvtzs  w0, d0
        ret

After Patch
===========
cal:    
        stp     x29, x30, [sp, -48]!
        add     w7, w0, w1
        add     x29, sp, 0
        stp     d8, d9, [sp, 32]
        fmov    d9, d1
        fmov    d8, d0
        scvtf   d1, w7
        scvtf   d0, w0
        stp     x19, x20, [sp, 16]
        mov     w20, w1 
        mov     w19, w0
        bl      dec
        scvtf   d0, w20
        sub     w7, w19, w20
        mul     w19, w19, w20
        scvtf   d1, w7
        bl      dec 
        scvtf   d1, w19
        bl      dec
        fadd    d8, d8, d9
        ldp     x19, x20, [sp, 16]
        fadd    d0, d8, d0
        ldp     d8, d9, [sp, 32]
        ldp     x29, x30, [sp], 48
        fcvtzs  w0, d0
        ret
-- 
Regards,
Jiong

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2015-05-20 13:02 [AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER Jiong Wang
  2015-07-22 11:01 ` [AArch64] PR63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER Jiong Wang
@ 2015-07-22 11:37 ` James Greenhalgh
  2015-07-24  9:11   ` [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER Jiong Wang
  1 sibling, 1 reply; 12+ messages in thread
From: James Greenhalgh @ 2015-07-22 11:37 UTC (permalink / raw)
  To: Jiong Wang; +Cc: gcc-patches

On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
> Current IRA still use both target macros in a few places.
> 
> Tell IRA to use the order we defined rather than with it's own cost
> calculation. Allocate caller saved first, then callee saved.
> 
> This is especially useful for LR/x30, as it's free to allocate and is
> pure caller saved when used in leaf function.
> 
> Haven't noticed significant impact on benchmarks, but by grepping some
> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
> is smaller.
> 
> OK for trunk?

OK, sorry for the delay.

It might be mail client mangling, but please check that the trailing slashes
line up in the version that gets committed.

Thanks,
James

> 2015-05-19  Jiong. Wang  <jiong.wang@arm.com>
> 
> gcc/
>   PR 63521
>   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>   (HONOR_REG_ALLOC_ORDER): Define.
> 
 

> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index bf59e40..0acdf10 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -337,6 +337,31 @@ extern unsigned long aarch64_tune_flags;
>      V_ALIASES(28), V_ALIASES(29), V_ALIASES(30), V_ALIASES(31)  \
>    }
>  
> +#define REG_ALLOC_ORDER				\
> +{						\
> +  /* Reverse order for argument registers.  */	\
> +  7, 6, 5, 4, 3, 2, 1, 0,			\
> +  /* Other caller-saved registers.  */		\
> +  8, 9, 10, 11, 12, 13, 14, 15,			\
> +  16, 17, 18, 30,				\
> +  /* Callee-saved registers.  */		\
> +  19, 20, 21, 22, 23, 24, 25, 26,		\
> +  27, 28,					\
> +  /* All other registers.  */			\
> +  29, 31,					\
> +  /* Reverse order for argument vregisters.  */	\
> +  39, 38, 37, 36, 35, 34, 33, 32,		\
> +  /* Other caller-saved vregisters.  */		\
> +  48, 49, 50, 51, 52, 53, 54, 55,		\
> +  56, 57, 58, 59, 60, 61, 62, 63,		\
> +  /* Callee-saved vregisters.  */		\
> +  40, 41, 42, 43, 44, 45, 46, 47,		\
> +  /* Other pseudo registers.  */		\
> +  64, 65, 66					\
> +}
> +
> +#define HONOR_REG_ALLOC_ORDER 1
> +
>  /* Say that the epilogue uses the return address register.  Note that
>     in the case of sibcalls, the values "used by the epilogue" are
>     considered live at the start of the called function.  */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2015-07-22 11:37 ` [AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER James Greenhalgh
@ 2015-07-24  9:11   ` Jiong Wang
  2015-07-25 20:02     ` Andrew Pinski
  0 siblings, 1 reply; 12+ messages in thread
From: Jiong Wang @ 2015-07-24  9:11 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]


James Greenhalgh writes:

> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>> Current IRA still use both target macros in a few places.
>> 
>> Tell IRA to use the order we defined rather than with it's own cost
>> calculation. Allocate caller saved first, then callee saved.
>> 
>> This is especially useful for LR/x30, as it's free to allocate and is
>> pure caller saved when used in leaf function.
>> 
>> Haven't noticed significant impact on benchmarks, but by grepping some
>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
>> is smaller.
>> 
>> OK for trunk?
>
> OK, sorry for the delay.
>
> It might be mail client mangling, but please check that the trailing slashes
> line up in the version that gets committed.
>
> Thanks,
> James
>
>> 2015-05-19  Jiong. Wang  <jiong.wang@arm.com>
>> 
>> gcc/
>>   PR 63521
>>   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>   (HONOR_REG_ALLOC_ORDER): Define.

Patch reverted.


[-- Attachment #2: revert.patch --]
[-- Type: text/x-diff, Size: 1858 bytes --]

Index: gcc/ChangeLog
===================================================================
--- gcc/ChangeLog	(revision 226140)
+++ gcc/ChangeLog	(working copy)
@@ -1,3 +1,11 @@
+2015-07-24  Jiong Wang  <jiong.wang@arm.com>
+
+	Revert:
+	2015-07-22  Jiong Wang  <jiong.wang@arm.com>
+	PR target/63521
+	* config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
+	(HONOR_REG_ALLOC_ORDER): Define.
+
 2015-07-24  Richard Biener  <rguenther@suse.de>
 
 	* genmatch.c (add_operator): Allow SSA_NAME as predicate.
Index: gcc/config/aarch64/aarch64.h
===================================================================
--- gcc/config/aarch64/aarch64.h	(revision 226140)
+++ gcc/config/aarch64/aarch64.h	(working copy)
@@ -344,31 +344,6 @@
     V_ALIASES(28), V_ALIASES(29), V_ALIASES(30), V_ALIASES(31)  \
   }
 
-#define REG_ALLOC_ORDER					\
-  {							\
-    /* Reverse order for argument registers.  */	\
-    7, 6, 5, 4, 3, 2, 1, 0,				\
-    /* Other caller-saved registers.  */		\
-    8, 9, 10, 11, 12, 13, 14, 15,			\
-    16, 17, 18, 30,					\
-    /* Callee-saved registers.  */			\
-    19, 20, 21, 22, 23, 24, 25, 26,			\
-    27, 28,						\
-    /* All other registers.  */				\
-    29, 31,						\
-    /* Reverse order for argument vregisters.  */	\
-    39, 38, 37, 36, 35, 34, 33, 32,			\
-    /* Other caller-saved vregisters.  */		\
-    48, 49, 50, 51, 52, 53, 54, 55,			\
-    56, 57, 58, 59, 60, 61, 62, 63,			\
-    /* Callee-saved vregisters.  */			\
-    40, 41, 42, 43, 44, 45, 46, 47,			\
-    /* Other pseudo registers.  */			\
-    64, 65, 66						\
-}
-
-#define HONOR_REG_ALLOC_ORDER 1
-
 /* Say that the epilogue uses the return address register.  Note that
    in the case of sibcalls, the values "used by the epilogue" are
    considered live at the start of the called function.  */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2015-07-24  9:11   ` [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER Jiong Wang
@ 2015-07-25 20:02     ` Andrew Pinski
  2015-07-27  9:52       ` Jiong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Pinski @ 2015-07-25 20:02 UTC (permalink / raw)
  To: Jiong Wang; +Cc: James Greenhalgh, gcc-patches

On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.wang@arm.com> wrote:
>
> James Greenhalgh writes:
>
>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>> Current IRA still use both target macros in a few places.
>>>
>>> Tell IRA to use the order we defined rather than with it's own cost
>>> calculation. Allocate caller saved first, then callee saved.
>>>
>>> This is especially useful for LR/x30, as it's free to allocate and is
>>> pure caller saved when used in leaf function.
>>>
>>> Haven't noticed significant impact on benchmarks, but by grepping some
>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
>>> is smaller.
>>>
>>> OK for trunk?
>>
>> OK, sorry for the delay.
>>
>> It might be mail client mangling, but please check that the trailing slashes
>> line up in the version that gets committed.
>>
>> Thanks,
>> James
>>
>>> 2015-05-19  Jiong. Wang  <jiong.wang@arm.com>
>>>
>>> gcc/
>>>   PR 63521
>>>   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>>   (HONOR_REG_ALLOC_ORDER): Define.
>
> Patch reverted.

I did not see a reason why this patch was reverted.  Maybe I am
missing an email or something.

Thanks,
Andrew


>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2015-07-25 20:02     ` Andrew Pinski
@ 2015-07-27  9:52       ` Jiong Wang
  2015-07-27 10:25         ` pinskia
  0 siblings, 1 reply; 12+ messages in thread
From: Jiong Wang @ 2015-07-27  9:52 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: James Greenhalgh, gcc-patches


Andrew Pinski writes:

> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.wang@arm.com> wrote:
>>
>> James Greenhalgh writes:
>>
>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>>> Current IRA still use both target macros in a few places.
>>>>
>>>> Tell IRA to use the order we defined rather than with it's own cost
>>>> calculation. Allocate caller saved first, then callee saved.
>>>>
>>>> This is especially useful for LR/x30, as it's free to allocate and is
>>>> pure caller saved when used in leaf function.
>>>>
>>>> Haven't noticed significant impact on benchmarks, but by grepping some
>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
>>>> is smaller.
>>>>
>>>> OK for trunk?
>>>
>>> OK, sorry for the delay.
>>>
>>> It might be mail client mangling, but please check that the trailing slashes
>>> line up in the version that gets committed.
>>>
>>> Thanks,
>>> James
>>>
>>>> 2015-05-19  Jiong. Wang  <jiong.wang@arm.com>
>>>>
>>>> gcc/
>>>>   PR 63521
>>>>   * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>>>   (HONOR_REG_ALLOC_ORDER): Define.
>>
>> Patch reverted.
>
> I did not see a reason why this patch was reverted.  Maybe I am
> missing an email or something.

There are several execution regressions under gcc testsuite, although as
far as I can see it's this patch exposed hidding bugs in those
testcases, but there might be one other issue, so to be conservative, I
temporarily reverted this patch.

>
> Thanks,
> Andrew
>
>
>>

-- 
Regards,
Jiong

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2015-07-27  9:52       ` Jiong Wang
@ 2015-07-27 10:25         ` pinskia
  2015-07-27 10:44           ` James Greenhalgh
  0 siblings, 1 reply; 12+ messages in thread
From: pinskia @ 2015-07-27 10:25 UTC (permalink / raw)
  To: Jiong Wang; +Cc: James Greenhalgh, gcc-patches





> On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.wang@arm.com> wrote:
> 
> 
> Andrew Pinski writes:
> 
>>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.wang@arm.com> wrote:
>>> 
>>> James Greenhalgh writes:
>>> 
>>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>>>> Current IRA still use both target macros in a few places.
>>>>> 
>>>>> Tell IRA to use the order we defined rather than with it's own cost
>>>>> calculation. Allocate caller saved first, then callee saved.
>>>>> 
>>>>> This is especially useful for LR/x30, as it's free to allocate and is
>>>>> pure caller saved when used in leaf function.
>>>>> 
>>>>> Haven't noticed significant impact on benchmarks, but by grepping some
>>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
>>>>> is smaller.
>>>>> 
>>>>> OK for trunk?
>>>> 
>>>> OK, sorry for the delay.
>>>> 
>>>> It might be mail client mangling, but please check that the trailing slashes
>>>> line up in the version that gets committed.
>>>> 
>>>> Thanks,
>>>> James
>>>> 
>>>>> 2015-05-19  Jiong. Wang  <jiong.wang@arm.com>
>>>>> 
>>>>> gcc/
>>>>>  PR 63521
>>>>>  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>>>>  (HONOR_REG_ALLOC_ORDER): Define.
>>> 
>>> Patch reverted.
>> 
>> I did not see a reason why this patch was reverted.  Maybe I am
>> missing an email or something.
> 
> There are several execution regressions under gcc testsuite, although as
> far as I can see it's this patch exposed hidding bugs in those
> testcases, but there might be one other issue, so to be conservative, I
> temporarily reverted this patch.

If you are talking about:
gcc.target/aarch64/aapcs64/func-ret-2.c execution
Etc.

These test cases are too dependent on the original register allocation order and really can be safely ignored. Really these three tests should be moved or written in a more sane way. 

Thanks,
Andrew

> 
>> 
>> Thanks,
>> Andrew
>> 
>> 
> 
> -- 
> Regards,
> Jiong
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2015-07-27 10:25         ` pinskia
@ 2015-07-27 10:44           ` James Greenhalgh
  2016-08-03  6:18             ` Andrew Pinski
  0 siblings, 1 reply; 12+ messages in thread
From: James Greenhalgh @ 2015-07-27 10:44 UTC (permalink / raw)
  To: pinskia; +Cc: Jiong Wang, gcc-patches

On Mon, Jul 27, 2015 at 10:52:58AM +0100, pinskia@gmail.com wrote:
> > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.wang@arm.com> wrote:
> > 
> > Andrew Pinski writes:
> > 
> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.wang@arm.com> wrote:
> >>> 
> >>> James Greenhalgh writes:
> >>> 
> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
> >>>>> Current IRA still use both target macros in a few places.
> >>>>> 
> >>>>> Tell IRA to use the order we defined rather than with it's own cost
> >>>>> calculation. Allocate caller saved first, then callee saved.
> >>>>> 
> >>>>> This is especially useful for LR/x30, as it's free to allocate and is
> >>>>> pure caller saved when used in leaf function.
> >>>>> 
> >>>>> Haven't noticed significant impact on benchmarks, but by grepping some
> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
> >>>>> is smaller.
> >>>>> 
> >>>>> OK for trunk?
> >>>> 
> >>>> OK, sorry for the delay.
> >>>> 
> >>>> It might be mail client mangling, but please check that the trailing slashes
> >>>> line up in the version that gets committed.
> >>>> 
> >>>> Thanks,
> >>>> James
> >>>> 
> >>>>> 2015-05-19  Jiong. Wang  <jiong.wang@arm.com>
> >>>>> 
> >>>>> gcc/
> >>>>>  PR 63521
> >>>>>  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
> >>>>>  (HONOR_REG_ALLOC_ORDER): Define.
> >>> 
> >>> Patch reverted.
> >> 
> >> I did not see a reason why this patch was reverted.  Maybe I am
> >> missing an email or something.
> > 
> > There are several execution regressions under gcc testsuite, although as
> > far as I can see it's this patch exposed hidding bugs in those
> > testcases, but there might be one other issue, so to be conservative, I
> > temporarily reverted this patch.
> 
> If you are talking about:
> gcc.target/aarch64/aapcs64/func-ret-2.c execution
> Etc.
> 
> These test cases are too dependent on the original register allocation order
> and really can be safely ignored. Really these three tests should be moved or
> written in a more sane way. 

Yup, completely agreed - but the testcases do throw up something
interesting. If we are allocating registers to hold 128-bit values, and
we pick x7 as highest preference, we implicitly allocate x8 along with it.
I think we probably see the same thing if the first thing we do in a
function is a structure copy through a back-end expanded movmem, which
will likely begin with a 128-bit LDP using x7, x8.

If the argument for this patch is that we prefer to allocate x7-x0 first,
followed by x8, then we've potentially made a sub-optimal decision, our
allocation order for 128-bit values is x7,x8,x5,x6 etc.

My hunch is that we *might* get better code generation in this corner case
out of some permutation of the allocation order for argument
registers. I'm thinking something along the lines of

  {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }

I asked Jiong to take a look at that, and I agree with his decision to
reduce the churn on trunk and just revert the patch until we've come to
a conclusion based on some evidence - rather than just my hunch! I agree
that it would be harmless on trunk from a testing point of view, but I
think Jiong is right to revert the patch until we better understand the
code-generation implications.

Of course, it might be that I am completely wrong! If you've already taken
a look at using a register allocation order like the example I gave and
have something to share, I'd be happy to read your advice!

Thanks,
James

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2015-07-27 10:44           ` James Greenhalgh
@ 2016-08-03  6:18             ` Andrew Pinski
  2016-08-05 17:02               ` Jiong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Andrew Pinski @ 2016-08-03  6:18 UTC (permalink / raw)
  To: James Greenhalgh; +Cc: Jiong Wang, gcc-patches

On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pinskia@gmail.com wrote:
>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.wang@arm.com> wrote:
>> >
>> > Andrew Pinski writes:
>> >
>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.wang@arm.com> wrote:
>> >>>
>> >>> James Greenhalgh writes:
>> >>>
>> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>> >>>>> Current IRA still use both target macros in a few places.
>> >>>>>
>> >>>>> Tell IRA to use the order we defined rather than with it's own cost
>> >>>>> calculation. Allocate caller saved first, then callee saved.
>> >>>>>
>> >>>>> This is especially useful for LR/x30, as it's free to allocate and is
>> >>>>> pure caller saved when used in leaf function.
>> >>>>>
>> >>>>> Haven't noticed significant impact on benchmarks, but by grepping some
>> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
>> >>>>> is smaller.
>> >>>>>
>> >>>>> OK for trunk?
>> >>>>
>> >>>> OK, sorry for the delay.
>> >>>>
>> >>>> It might be mail client mangling, but please check that the trailing slashes
>> >>>> line up in the version that gets committed.
>> >>>>
>> >>>> Thanks,
>> >>>> James
>> >>>>
>> >>>>> 2015-05-19  Jiong. Wang  <jiong.wang@arm.com>
>> >>>>>
>> >>>>> gcc/
>> >>>>>  PR 63521
>> >>>>>  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>> >>>>>  (HONOR_REG_ALLOC_ORDER): Define.
>> >>>
>> >>> Patch reverted.
>> >>
>> >> I did not see a reason why this patch was reverted.  Maybe I am
>> >> missing an email or something.
>> >
>> > There are several execution regressions under gcc testsuite, although as
>> > far as I can see it's this patch exposed hidding bugs in those
>> > testcases, but there might be one other issue, so to be conservative, I
>> > temporarily reverted this patch.
>>
>> If you are talking about:
>> gcc.target/aarch64/aapcs64/func-ret-2.c execution
>> Etc.
>>
>> These test cases are too dependent on the original register allocation order
>> and really can be safely ignored. Really these three tests should be moved or
>> written in a more sane way.
>
> Yup, completely agreed - but the testcases do throw up something
> interesting. If we are allocating registers to hold 128-bit values, and
> we pick x7 as highest preference, we implicitly allocate x8 along with it.
> I think we probably see the same thing if the first thing we do in a
> function is a structure copy through a back-end expanded movmem, which
> will likely begin with a 128-bit LDP using x7, x8.
>
> If the argument for this patch is that we prefer to allocate x7-x0 first,
> followed by x8, then we've potentially made a sub-optimal decision, our
> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>
> My hunch is that we *might* get better code generation in this corner case
> out of some permutation of the allocation order for argument
> registers. I'm thinking something along the lines of
>
>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>
> I asked Jiong to take a look at that, and I agree with his decision to
> reduce the churn on trunk and just revert the patch until we've come to
> a conclusion based on some evidence - rather than just my hunch! I agree
> that it would be harmless on trunk from a testing point of view, but I
> think Jiong is right to revert the patch until we better understand the
> code-generation implications.
>
> Of course, it might be that I am completely wrong! If you've already taken
> a look at using a register allocation order like the example I gave and
> have something to share, I'd be happy to read your advice!

Any news on this patch?  It has been a year since it was reverted for
a bad test that was failing.

Thanks,
Andrew

>
> Thanks,
> James
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2016-08-03  6:18             ` Andrew Pinski
@ 2016-08-05 17:02               ` Jiong Wang
  2016-08-08 17:05                 ` Jiong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Jiong Wang @ 2016-08-05 17:02 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: James Greenhalgh, gcc-patches


Andrew Pinski writes:

> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
> <james.greenhalgh@arm.com> wrote:
>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pinskia@gmail.com wrote:
>>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.wang@arm.com> wrote:
>>> >
>>> > Andrew Pinski writes:
>>> >
>>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.wang@arm.com> wrote:
>>> >>>
>>> >>> James Greenhalgh writes:
>>> >>>
>>> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>> >>>>> Current IRA still use both target macros in a few places.
>>> >>>>>
>>> >>>>> Tell IRA to use the order we defined rather than with it's own cost
>>> >>>>> calculation. Allocate caller saved first, then callee saved.
>>> >>>>>
>>> >>>>> This is especially useful for LR/x30, as it's free to allocate and is
>>> >>>>> pure caller saved when used in leaf function.
>>> >>>>>
>>> >>>>> Haven't noticed significant impact on benchmarks, but by grepping some
>>> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
>>> >>>>> is smaller.
>>> >>>>>
>>> >>>>> OK for trunk?
>>> >>>>
>>> >>>> OK, sorry for the delay.
>>> >>>>
>>> >>>> It might be mail client mangling, but please check that the trailing slashes
>>> >>>> line up in the version that gets committed.
>>> >>>>
>>> >>>> Thanks,
>>> >>>> James
>>> >>>>
>>> >>>>> 2015-05-19  Jiong. Wang  <jiong.wang@arm.com>
>>> >>>>>
>>> >>>>> gcc/
>>> >>>>>  PR 63521
>>> >>>>>  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>> >>>>>  (HONOR_REG_ALLOC_ORDER): Define.
>>> >>>
>>> >>> Patch reverted.
>>> >>
>>> >> I did not see a reason why this patch was reverted.  Maybe I am
>>> >> missing an email or something.
>>> >
>>> > There are several execution regressions under gcc testsuite, although as
>>> > far as I can see it's this patch exposed hidding bugs in those
>>> > testcases, but there might be one other issue, so to be conservative, I
>>> > temporarily reverted this patch.
>>>
>>> If you are talking about:
>>> gcc.target/aarch64/aapcs64/func-ret-2.c execution
>>> Etc.
>>>
>>> These test cases are too dependent on the original register allocation order
>>> and really can be safely ignored. Really these three tests should be moved or
>>> written in a more sane way.
>>
>> Yup, completely agreed - but the testcases do throw up something
>> interesting. If we are allocating registers to hold 128-bit values, and
>> we pick x7 as highest preference, we implicitly allocate x8 along with it.
>> I think we probably see the same thing if the first thing we do in a
>> function is a structure copy through a back-end expanded movmem, which
>> will likely begin with a 128-bit LDP using x7, x8.
>>
>> If the argument for this patch is that we prefer to allocate x7-x0 first,
>> followed by x8, then we've potentially made a sub-optimal decision, our
>> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>>
>> My hunch is that we *might* get better code generation in this corner case
>> out of some permutation of the allocation order for argument
>> registers. I'm thinking something along the lines of
>>
>>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>>
>> I asked Jiong to take a look at that, and I agree with his decision to
>> reduce the churn on trunk and just revert the patch until we've come to
>> a conclusion based on some evidence - rather than just my hunch! I agree
>> that it would be harmless on trunk from a testing point of view, but I
>> think Jiong is right to revert the patch until we better understand the
>> code-generation implications.
>>
>> Of course, it might be that I am completely wrong! If you've already taken
>> a look at using a register allocation order like the example I gave and
>> have something to share, I'd be happy to read your advice!
>
> Any news on this patch?  It has been a year since it was reverted for
> a bad test that was failing.

Hi Andrew,

  Yeah, those tests actually are expected to fail once register
  allocation order changed, it's clearly documented in the comments:

  gcc.target/aarch64/aapcs64/abitest-2.h:
  
/* ...

   Note that for value that is returned in the caller-allocated memory
   block, we get the address from the saved x8 register.  x8 is saved
   just after the callee is returned; we assume that x8 has not been
   clobbered at then, although there is no requirement for the callee
   preserve the value stored in x8.  Luckily, all test cases here are
   simple enough that x8 doesn't normally get clobbered (although not
   guaranteed).  */

  I had a local fix which use the redundant value returned to x0 to
  repair the clobbered value in x8 as they will be identical for
  structure type return, however that trick can't play anymore as we
  recently defined TARGET_OMIT_STRUCT_RETURN_REG to true which will
  remove that redundant x8 to x0 copy.
  
  Anyway, I will come back with some benchmark results of this patch on
  top of latest trunk after the weekend run, and also answers to Jame's
  concerns.

-- 
Regards,
Jiong

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2016-08-05 17:02               ` Jiong Wang
@ 2016-08-08 17:05                 ` Jiong Wang
  2016-08-10 16:49                   ` Vladimir N Makarov
  0 siblings, 1 reply; 12+ messages in thread
From: Jiong Wang @ 2016-08-08 17:05 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: James Greenhalgh, gcc-patches, Vladimir N Makarov


Jiong Wang writes:

> Andrew Pinski writes:
>
>> On Mon, Jul 27, 2015 at 3:36 AM, James Greenhalgh
>> <james.greenhalgh@arm.com> wrote:
>>> On Mon, Jul 27, 2015 at 10:52:58AM +0100, pinskia@gmail.com wrote:
>>>> > On Jul 27, 2015, at 2:26 AM, Jiong Wang <jiong.wang@arm.com> wrote:
>>>> >
>>>> > Andrew Pinski writes:
>>>> >
>>>> >>> On Fri, Jul 24, 2015 at 2:07 AM, Jiong Wang <jiong.wang@arm.com> wrote:
>>>> >>>
>>>> >>> James Greenhalgh writes:
>>>> >>>
>>>> >>>>> On Wed, May 20, 2015 at 01:35:41PM +0100, Jiong Wang wrote:
>>>> >>>>> Current IRA still use both target macros in a few places.
>>>> >>>>>
>>>> >>>>> Tell IRA to use the order we defined rather than with it's own cost
>>>> >>>>> calculation. Allocate caller saved first, then callee saved.
>>>> >>>>>
>>>> >>>>> This is especially useful for LR/x30, as it's free to allocate and is
>>>> >>>>> pure caller saved when used in leaf function.
>>>> >>>>>
>>>> >>>>> Haven't noticed significant impact on benchmarks, but by grepping some
>>>> >>>>> keywords like "Spilling", "Push.*spill" etc in ira rtl dump, the number
>>>> >>>>> is smaller.
>>>> >>>>>
>>>> >>>>> OK for trunk?
>>>> >>>>
>>>> >>>> OK, sorry for the delay.
>>>> >>>>
>>>> >>>> It might be mail client mangling, but please check that the trailing slashes
>>>> >>>> line up in the version that gets committed.
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>> James
>>>> >>>>
>>>> >>>>> 2015-05-19  Jiong. Wang  <jiong.wang@arm.com>
>>>> >>>>>
>>>> >>>>> gcc/
>>>> >>>>>  PR 63521
>>>> >>>>>  * config/aarch64/aarch64.h (REG_ALLOC_ORDER): Define.
>>>> >>>>>  (HONOR_REG_ALLOC_ORDER): Define.
>>>> >>>
>>>> >>> Patch reverted.
>>>> >>
>>>> >> I did not see a reason why this patch was reverted.  Maybe I am
>>>> >> missing an email or something.
>>>> >
>>>> > There are several execution regressions under gcc testsuite, although as
>>>> > far as I can see it's this patch exposed hidding bugs in those
>>>> > testcases, but there might be one other issue, so to be conservative, I
>>>> > temporarily reverted this patch.
>>>>
>>>> If you are talking about:
>>>> gcc.target/aarch64/aapcs64/func-ret-2.c execution
>>>> Etc.
>>>>
>>>> These test cases are too dependent on the original register allocation order
>>>> and really can be safely ignored. Really these three tests should be moved or
>>>> written in a more sane way.
>>>
>>> Yup, completely agreed - but the testcases do throw up something
>>> interesting. If we are allocating registers to hold 128-bit values, and
>>> we pick x7 as highest preference, we implicitly allocate x8 along with it.
>>> I think we probably see the same thing if the first thing we do in a
>>> function is a structure copy through a back-end expanded movmem, which
>>> will likely begin with a 128-bit LDP using x7, x8.
>>>
>>> If the argument for this patch is that we prefer to allocate x7-x0 first,
>>> followed by x8, then we've potentially made a sub-optimal decision, our
>>> allocation order for 128-bit values is x7,x8,x5,x6 etc.
>>>
>>> My hunch is that we *might* get better code generation in this corner case
>>> out of some permutation of the allocation order for argument
>>> registers. I'm thinking something along the lines of
>>>
>>>   {x6, x5, x4, x7, x3, x2, x1, x0, x8, ... }
>>>
>>> I asked Jiong to take a look at that, and I agree with his decision to
>>> reduce the churn on trunk and just revert the patch until we've come to
>>> a conclusion based on some evidence - rather than just my hunch! I agree
>>> that it would be harmless on trunk from a testing point of view, but I
>>> think Jiong is right to revert the patch until we better understand the
>>> code-generation implications.
>>>
>>> Of course, it might be that I am completely wrong! If you've already taken
>>> a look at using a register allocation order like the example I gave and
>>> have something to share, I'd be happy to read your advice!
>>
>> Any news on this patch?  It has been a year since it was reverted for
>> a bad test that was failing.
>
> Hi Andrew,
>
>   Yeah, those tests actually are expected to fail once register
>   allocation order changed, it's clearly documented in the comments:
>
>   gcc.target/aarch64/aapcs64/abitest-2.h:
>   
> /* ...
>
>    Note that for value that is returned in the caller-allocated memory
>    block, we get the address from the saved x8 register.  x8 is saved
>    just after the callee is returned; we assume that x8 has not been
>    clobbered at then, although there is no requirement for the callee
>    preserve the value stored in x8.  Luckily, all test cases here are
>    simple enough that x8 doesn't normally get clobbered (although not
>    guaranteed).  */
>
>   I had a local fix which use the redundant value returned to x0 to
>   repair the clobbered value in x8 as they will be identical for
>   structure type return, however that trick can't play anymore as we
>   recently defined TARGET_OMIT_STRUCT_RETURN_REG to true which will
>   remove that redundant x8 to x0 copy.
>   
>   Anyway, I will come back with some benchmark results of this patch on
>   top of latest trunk after the weekend run, and also answers to Jame's
>   concerns.

There is very tiny performance regression on SPEC2K6INT
464.h264ref. Checking the codegen, there are some bad instruction
scheduling, it looks to me caused by REG_ALLOC_ORDER is not used
consistently inside IRA that parts of the code are using new allocation
order while some other parts are still using original order.

I see in ira.c:setup_class_hard_regs we are checking whether
REG_ALLOC_ORDER is defined:

  #ifdef REG_ALLOC_ORDER
          hard_regno = reg_alloc_order[i];
  #else
          hard_regno = i;
  #endif

but in ira-color.c:assign_hard_reg, there is no similar check:

  /* We don't care about giving callee saved registers to allocnos no
     living through calls because call clobbered registers are
     allocated first (it is usual practice to put them first in
     REG_ALLOC_ORDER).  */
  mode = ALLOCNO_MODE (a);
  for (i = 0; i < class_size; i++)
    {
      hard_regno = ira_class_hard_regs[aclass][i];

We might want to use reg_alloc_order to map above "i" if REG_ALLOC_ORDER is
defined?

Vlad, any comments?

-- 
Regards,
Jiong

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER
  2016-08-08 17:05                 ` Jiong Wang
@ 2016-08-10 16:49                   ` Vladimir N Makarov
  0 siblings, 0 replies; 12+ messages in thread
From: Vladimir N Makarov @ 2016-08-10 16:49 UTC (permalink / raw)
  To: Jiong Wang, Andrew Pinski; +Cc: James Greenhalgh, gcc-patches



On 08/08/2016 01:04 PM, Jiong Wang wrote:
> [...]
> There is very tiny performance regression on SPEC2K6INT
> 464.h264ref. Checking the codegen, there are some bad instruction
> scheduling, it looks to me caused by REG_ALLOC_ORDER is not used
> consistently inside IRA that parts of the code are using new allocation
> order while some other parts are still using original order.
>
> I see in ira.c:setup_class_hard_regs we are checking whether
> REG_ALLOC_ORDER is defined:
>
>    #ifdef REG_ALLOC_ORDER
>            hard_regno = reg_alloc_order[i];
>    #else
>            hard_regno = i;
>    #endif
>
> but in ira-color.c:assign_hard_reg, there is no similar check:
>
>    /* We don't care about giving callee saved registers to allocnos no
>       living through calls because call clobbered registers are
>       allocated first (it is usual practice to put them first in
>       REG_ALLOC_ORDER).  */
>    mode = ALLOCNO_MODE (a);
>    for (i = 0; i < class_size; i++)
>      {
>        hard_regno = ira_class_hard_regs[aclass][i];
>
> We might want to use reg_alloc_order to map above "i" if REG_ALLOC_ORDER is
> defined?
>
> Vlad, any comments?
>
The order is already present in ira_class_hard_regs.  So there is no 
need to use it in ira-color.c::assign_hard_reg.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-08-10 16:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-20 13:02 [AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER Jiong Wang
2015-07-22 11:01 ` [AArch64] PR63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER Jiong Wang
2015-07-22 11:37 ` [AArch64] PR 63521. define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER James Greenhalgh
2015-07-24  9:11   ` [Revert][AArch64] PR 63521 Define REG_ALLOC_ORDER/HONOR_REG_ALLOC_ORDER Jiong Wang
2015-07-25 20:02     ` Andrew Pinski
2015-07-27  9:52       ` Jiong Wang
2015-07-27 10:25         ` pinskia
2015-07-27 10:44           ` James Greenhalgh
2016-08-03  6:18             ` Andrew Pinski
2016-08-05 17:02               ` Jiong Wang
2016-08-08 17:05                 ` Jiong Wang
2016-08-10 16:49                   ` Vladimir N Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).