[RFC][PATCH][AArch64] Improve generic branch cost

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [RFC][PATCH][AArch64] Improve generic branch cost
@ 2017-03-09 14:42 Wilco Dijkstra
  2017-03-09 22:06 ` Andrew Pinski
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2017-03-09 14:42 UTC (permalink / raw)
  To: GCC Patches, Evandro Menezes, Andrew.pinski, jim.wilson; +Cc: nd

Hi,

Recently we've put a lot of effort into improving ifcvt to use CSEL on AArch64.
In  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01639.html James determined
the best value for AArch64 code generation.  Although this setting is used when
explicitly targeting Cortex cores, it is not otherwise used.  This means by
default GCC will not use (F)CSEL in many common cases.  Most code is built
without -mcpu= and thus doesn't use CSEL like this example from GLIBC:

strtok:
    stp    x29, x30, [sp, -48]!
    add    x29, sp, 0
    stp    x21, x22, [sp, 32]
    mov    x21, x1
    stp    x19, x20, [sp, 16]
    adrp    x22, .LANCHOR0
    mov    x19, x0
    cbz    x0, .L12
.L2:    ldrb    w0, [x19]

.L12:
    ldr    x19, [x22, #:lo12:.LANCHOR0]
    b    .L2

With -mcpu=cortex-a57 GCC generates:

    stp    x29, x30, [sp, -48]!
    cmp    x0, 0
    add    x29, sp, 0
    stp    x21, x22, [sp, 32]
    adrp    x21, .LANCHOR0
    stp    x19, x20, [sp, 16]
    mov    x19, x0
    ldr    x0, [x21, #:lo12:.LANCHOR0]
    csel    x19, x0, x19, eq
    ldrb    w0, [x19]

This is generally faster and smaller.  On one benchmark the new setting fixes a 
regression since GCC6 and improves performance by 49%.  So I propose to change
generic_branch_cost to be the same as cortexa57_branch_cost so that all supported
cores benefit equally from CSEL.  Are there any objections to this?

Wilco

ChangeLog:
2017-03-09  Wilco Dijkstra  <wdijkstr@arm.com>

	* config/aarch64/aarch64.c (generic_branch_cost): Copy cortexa57_branch_cost.
--

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 5870b5e5d7e8e48cf925b3a62030346f041a7fd6..ea16074af86087a6200d9895583e05acf43d90e2 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -377,8 +377,8 @@ static const struct cpu_vector_cost xgene1_vector_cost =
 /* Generic costs for branch instructions.  */
 static const struct cpu_branch_cost generic_branch_cost =
 {
-  2,  /* Predictable.  */
-  2   /* Unpredictable.  */
+  1,  /* Predictable.  */
+  3   /* Unpredictable.  */
 };

 /* Branch costs for Cortex-A57.  */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH][AArch64] Improve generic branch cost
  2017-03-09 14:42 [RFC][PATCH][AArch64] Improve generic branch cost Wilco Dijkstra
@ 2017-03-09 22:06 ` Andrew Pinski
  2017-03-14  9:37   ` James Greenhalgh
  2017-03-16 17:22 ` [PATCH][AArch64] Enable AES fusion with -mcpu=generic Wilco Dijkstra
  2017-03-17 10:15 ` [RFC][PATCH][AArch64] Improve generic branch cost Richard Earnshaw (lists)
  2 siblings, 1 reply; 11+ messages in thread
From: Andrew Pinski @ 2017-03-09 22:06 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: GCC Patches, Evandro Menezes, Andrew.pinski, jim.wilson, nd

On Thu, Mar 9, 2017 at 6:42 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Hi,
>
> Recently we've put a lot of effort into improving ifcvt to use CSEL on AArch64.
> In  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01639.html James determined
> the best value for AArch64 code generation.  Although this setting is used when
> explicitly targeting Cortex cores, it is not otherwise used.  This means by
> default GCC will not use (F)CSEL in many common cases.  Most code is built
> without -mcpu= and thus doesn't use CSEL like this example from GLIBC:
>
> strtok:
>     stp    x29, x30, [sp, -48]!
>     add    x29, sp, 0
>     stp    x21, x22, [sp, 32]
>     mov    x21, x1
>     stp    x19, x20, [sp, 16]
>     adrp    x22, .LANCHOR0
>     mov    x19, x0
>     cbz    x0, .L12
> .L2:    ldrb    w0, [x19]
>
> .L12:
>     ldr    x19, [x22, #:lo12:.LANCHOR0]
>     b    .L2
>
> With -mcpu=cortex-a57 GCC generates:
>
>     stp    x29, x30, [sp, -48]!
>     cmp    x0, 0
>     add    x29, sp, 0
>     stp    x21, x22, [sp, 32]
>     adrp    x21, .LANCHOR0
>     stp    x19, x20, [sp, 16]
>     mov    x19, x0
>     ldr    x0, [x21, #:lo12:.LANCHOR0]
>     csel    x19, x0, x19, eq
>     ldrb    w0, [x19]
>
> This is generally faster and smaller.  On one benchmark the new setting fixes a
> regression since GCC6 and improves performance by 49%.  So I propose to change
> generic_branch_cost to be the same as cortexa57_branch_cost so that all supported
> cores benefit equally from CSEL.  Are there any objections to this?

I have no objections.  In fact thunderx2t99's branch_cost is 1,3.  I
had not looked into improving thunderx branch cost yet but that might
be because I have local patches that improve phiopt for doing ifcvt
earlier.  Also my phiopt change does not have a cost model either so
using csel more is good for thunderx 1 and ThunderX 2.

Thanks,
Andrew

>
> Wilco
>
>
> ChangeLog:
> 2017-03-09  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * config/aarch64/aarch64.c (generic_branch_cost): Copy cortexa57_branch_cost.
> --
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5870b5e5d7e8e48cf925b3a62030346f041a7fd6..ea16074af86087a6200d9895583e05acf43d90e2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -377,8 +377,8 @@ static const struct cpu_vector_cost xgene1_vector_cost =
>  /* Generic costs for branch instructions.  */
>  static const struct cpu_branch_cost generic_branch_cost =
>  {
> -  2,  /* Predictable.  */
> -  2   /* Unpredictable.  */
> +  1,  /* Predictable.  */
> +  3   /* Unpredictable.  */
>  };
>
>  /* Branch costs for Cortex-A57.  */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH][AArch64] Improve generic branch cost
  2017-03-09 22:06 ` Andrew Pinski
@ 2017-03-14  9:37   ` James Greenhalgh
  2017-03-17  3:19     ` Jim Wilson
  0 siblings, 1 reply; 11+ messages in thread
From: James Greenhalgh @ 2017-03-14  9:37 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Wilco Dijkstra, GCC Patches, Evandro Menezes, Andrew.pinski,
	jim.wilson, nd, philipp.tomsich, benedikt.huber

On Thu, Mar 09, 2017 at 02:06:16PM -0800, Andrew Pinski wrote:
> On Thu, Mar 9, 2017 at 6:42 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> > Hi,
> >
> > Recently we've put a lot of effort into improving ifcvt to use CSEL on
> > AArch64.  In  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01639.html
> > James determined the best value for AArch64 code generation.

This was before the rewrite to the ifcvt costs which I made earlier in the
GCC 7 release cycle. But I think 1,3 is about right, and I'd be happy
to see us take that direction for "generic".

I'd like to hear comments from the Exynos-M1, Falkor and
xgene-1 subtarget contributors, particularly as these targets use
generic_branch_costs for their subtarget-sepcific tuning. It may be that
your patch needs to preserve the 2,2 setting for such cores even if the
generic target does move to 1,3.

At this stage in the release, this patch will have to wait for GCC 8
regardless of any comments received. I'd suggest that when we do think
about this for GCC 8, we might want to take a wider look at the "generic"
tunings, any opinions from other subtarget contributors, or the other
AArch64 maintainers as to further changes they would advocate for would
be welcome.

> I have no objections.  In fact thunderx2t99's branch_cost is 1,3.  I
> had not looked into improving thunderx branch cost yet but that might
> be because I have local patches that improve phiopt for doing ifcvt
> earlier.  Also my phiopt change does not have a cost model either so
> using csel more is good for thunderx 1 and ThunderX 2.

Thanks for the comments,
James

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH][AArch64]  Enable AES fusion with -mcpu=generic
  2017-03-09 14:42 [RFC][PATCH][AArch64] Improve generic branch cost Wilco Dijkstra
  2017-03-09 22:06 ` Andrew Pinski
@ 2017-03-16 17:22 ` Wilco Dijkstra
  2017-03-16 18:01   ` Andrew Pinski
  2017-04-20 15:59   ` Wilco Dijkstra
  2017-03-17 10:15 ` [RFC][PATCH][AArch64] Improve generic branch cost Richard Earnshaw (lists)
  2 siblings, 2 replies; 11+ messages in thread
From: Wilco Dijkstra @ 2017-03-16 17:22 UTC (permalink / raw)
  To: GCC Patches, Evandro Menezes, Andrew.pinski, jim.wilson; +Cc: nd

Many supported cores implement fusion of AES instructions.  When fusion
happens it can give a significant performance gain.  If not, scheduling
fusion candidates next to each other has almost no effect on performance.
Due to the high benefit/low cost it makes sense to enable AES fusion with
-mcpu=generic so that cores that support it always benefit.  Any objections?

Bootstrapped on AArch64, no regressions.

ChangeLog:
2017-03-16  Wilco Dijkstra  <wdijkstr@arm.com>

        * gcc/config/aarch64/aarch64.c (generic_tunings): Add AES fusion.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 728ce7029f1e2b5161d9f317d10e564dd5a5f472..c8cf7169a5d387de336920b50c83761dc0c96f3a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -536,7 +536,7 @@ static const struct tune_params generic_tunings =
   &generic_approx_modes,
   4, /* memmov_cost  */
   2, /* issue_rate  */
-  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
   8,	/* function_align.  */
   8,	/* jump_align.  */
   4,	/* loop_align.  */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64] Enable AES fusion with -mcpu=generic
  2017-03-16 17:22 ` [PATCH][AArch64] Enable AES fusion with -mcpu=generic Wilco Dijkstra
@ 2017-03-16 18:01   ` Andrew Pinski
  2017-03-17  3:26     ` Jim Wilson
  2017-04-20 15:59   ` Wilco Dijkstra
  1 sibling, 1 reply; 11+ messages in thread
From: Andrew Pinski @ 2017-03-16 18:01 UTC (permalink / raw)
  To: Wilco Dijkstra
  Cc: GCC Patches, Evandro Menezes, Andrew.pinski, jim.wilson, nd

On Thu, Mar 16, 2017 at 10:22 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> Many supported cores implement fusion of AES instructions.  When fusion
> happens it can give a significant performance gain.  If not, scheduling
> fusion candidates next to each other has almost no effect on performance.
> Due to the high benefit/low cost it makes sense to enable AES fusion with
> -mcpu=generic so that cores that support it always benefit.  Any objections?

I am ok with this due to our new cores support this and there was no
performance lost for ThunderX1.

Thanks,
Andrew

>
> Bootstrapped on AArch64, no regressions.
>
> ChangeLog:
> 2017-03-16  Wilco Dijkstra  <wdijkstr@arm.com>
>
>         * gcc/config/aarch64/aarch64.c (generic_tunings): Add AES fusion.
>
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 728ce7029f1e2b5161d9f317d10e564dd5a5f472..c8cf7169a5d387de336920b50c83761dc0c96f3a 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -536,7 +536,7 @@ static const struct tune_params generic_tunings =
>    &generic_approx_modes,
>    4, /* memmov_cost  */
>    2, /* issue_rate  */
> -  AARCH64_FUSE_NOTHING, /* fusible_ops  */
> +  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
>    8,   /* function_align.  */
>    8,   /* jump_align.  */
>    4,   /* loop_align.  */

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH][AArch64] Improve generic branch cost
  2017-03-14  9:37   ` James Greenhalgh
@ 2017-03-17  3:19     ` Jim Wilson
  0 siblings, 0 replies; 11+ messages in thread
From: Jim Wilson @ 2017-03-17  3:19 UTC (permalink / raw)
  To: James Greenhalgh
  Cc: Andrew Pinski, Wilco Dijkstra, GCC Patches, Evandro Menezes,
	Andrew.pinski, nd, Philipp Tomsich, benedikt.huber

On Tue, Mar 14, 2017 at 2:37 AM, James Greenhalgh
<james.greenhalgh@arm.com> wrote:
> I'd like to hear comments from the Exynos-M1, Falkor and
> xgene-1 subtarget contributors, particularly as these targets use
> generic_branch_costs for their subtarget-sepcific tuning. It may be that
> your patch needs to preserve the 2,2 setting for such cores even if the
> generic target does move to 1,3.

I was at Linaro Connect last week of course.  I took a look at this
issue this week.  I don't see any measurable performance change on
SPEC CPU2006 for falkor, so the change looks OK to me.

In general, I'm not too concerned about changes like this, as I'm
watching the FSF GCC tree, and will make appropriate changes to the
falkor tuning structure as necessary to maintain good performance.

Jim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64] Enable AES fusion with -mcpu=generic
  2017-03-16 18:01   ` Andrew Pinski
@ 2017-03-17  3:26     ` Jim Wilson
  2017-03-17 10:56       ` James Greenhalgh
  0 siblings, 1 reply; 11+ messages in thread
From: Jim Wilson @ 2017-03-17  3:26 UTC (permalink / raw)
  To: Andrew Pinski
  Cc: Wilco Dijkstra, GCC Patches, Evandro Menezes, Andrew.pinski, nd

On Thu, Mar 16, 2017 at 11:01 AM, Andrew Pinski <apinski@cavium.com> wrote:
> On Thu, Mar 16, 2017 at 10:22 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
>> Many supported cores implement fusion of AES instructions.  When fusion
>> happens it can give a significant performance gain.  If not, scheduling
>> fusion candidates next to each other has almost no effect on performance.
>> Due to the high benefit/low cost it makes sense to enable AES fusion with
>> -mcpu=generic so that cores that support it always benefit.  Any objections?

No objection.  I'm not currently tracking performance of -mcpu=generic
on falkor, so I'm not very concerned about changes to the generic
tuning structure.

Jim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [RFC][PATCH][AArch64] Improve generic branch cost
  2017-03-09 14:42 [RFC][PATCH][AArch64] Improve generic branch cost Wilco Dijkstra
  2017-03-09 22:06 ` Andrew Pinski
  2017-03-16 17:22 ` [PATCH][AArch64] Enable AES fusion with -mcpu=generic Wilco Dijkstra
@ 2017-03-17 10:15 ` Richard Earnshaw (lists)
  2 siblings, 0 replies; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-03-17 10:15 UTC (permalink / raw)
  To: Wilco Dijkstra, GCC Patches, Evandro Menezes, Andrew.pinski, jim.wilson
  Cc: nd

On 09/03/17 14:42, Wilco Dijkstra wrote:
> Hi,
> 
> Recently we've put a lot of effort into improving ifcvt to use CSEL on AArch64.
> In  https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01639.html James determined
> the best value for AArch64 code generation.  Although this setting is used when
> explicitly targeting Cortex cores, it is not otherwise used.  This means by
> default GCC will not use (F)CSEL in many common cases.  Most code is built
> without -mcpu= and thus doesn't use CSEL like this example from GLIBC:
> 
> strtok:
>     stp    x29, x30, [sp, -48]!
>     add    x29, sp, 0
>     stp    x21, x22, [sp, 32]
>     mov    x21, x1
>     stp    x19, x20, [sp, 16]
>     adrp    x22, .LANCHOR0
>     mov    x19, x0
>     cbz    x0, .L12
> .L2:    ldrb    w0, [x19]
> 
> .L12:
>     ldr    x19, [x22, #:lo12:.LANCHOR0]
>     b    .L2
> 
> With -mcpu=cortex-a57 GCC generates:
> 
>     stp    x29, x30, [sp, -48]!
>     cmp    x0, 0
>     add    x29, sp, 0
>     stp    x21, x22, [sp, 32]
>     adrp    x21, .LANCHOR0
>     stp    x19, x20, [sp, 16]
>     mov    x19, x0
>     ldr    x0, [x21, #:lo12:.LANCHOR0]
>     csel    x19, x0, x19, eq
>     ldrb    w0, [x19]
> 
> This is generally faster and smaller.  On one benchmark the new setting fixes a 
> regression since GCC6 and improves performance by 49%.  So I propose to change
> generic_branch_cost to be the same as cortexa57_branch_cost so that all supported
> cores benefit equally from CSEL.  Are there any objections to this?
> 
> Wilco
> 
> 
> ChangeLog:
> 2017-03-09  Wilco Dijkstra  <wdijkstr@arm.com>
> 
> 	* config/aarch64/aarch64.c (generic_branch_cost): Copy cortexa57_branch_cost.

This is OK.  We already have a number of cores using these values so I
don't think this is likely to be a risky change even in stage 4.

R.

> --
> 
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5870b5e5d7e8e48cf925b3a62030346f041a7fd6..ea16074af86087a6200d9895583e05acf43d90e2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -377,8 +377,8 @@ static const struct cpu_vector_cost xgene1_vector_cost =
>  /* Generic costs for branch instructions.  */
>  static const struct cpu_branch_cost generic_branch_cost =
>  {
> -  2,  /* Predictable.  */
> -  2   /* Unpredictable.  */
> +  1,  /* Predictable.  */
> +  3   /* Unpredictable.  */
>  };
>  
>  /* Branch costs for Cortex-A57.  */
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64] Enable AES fusion with -mcpu=generic
  2017-03-17  3:26     ` Jim Wilson
@ 2017-03-17 10:56       ` James Greenhalgh
  0 siblings, 0 replies; 11+ messages in thread
From: James Greenhalgh @ 2017-03-17 10:56 UTC (permalink / raw)
  To: Jim Wilson
  Cc: Andrew Pinski, Wilco Dijkstra, GCC Patches, Evandro Menezes,
	Andrew.pinski, nd

On Thu, Mar 16, 2017 at 08:26:42PM -0700, Jim Wilson wrote:
> On Thu, Mar 16, 2017 at 11:01 AM, Andrew Pinski <apinski@cavium.com> wrote:
> > On Thu, Mar 16, 2017 at 10:22 AM, Wilco Dijkstra <Wilco.Dijkstra@arm.com> wrote:
> >> Many supported cores implement fusion of AES instructions.  When fusion
> >> happens it can give a significant performance gain.  If not, scheduling
> >> fusion candidates next to each other has almost no effect on performance.
> >> Due to the high benefit/low cost it makes sense to enable AES fusion with
> >> -mcpu=generic so that cores that support it always benefit.  Any objections?
> 
> No objection.  I'm not currently tracking performance of -mcpu=generic
> on falkor, so I'm not very concerned about changes to the generic
> tuning structure.

Thanks for the feedback Jim, Andrew.

This patch is OK for trunk. As Richard pointed out on the branch costs
thread, if we had a bug here we'd likely have seen it by now on those
cores which do enable the fusion.

Thanks,
James

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64]  Enable AES fusion with -mcpu=generic
  2017-03-16 17:22 ` [PATCH][AArch64] Enable AES fusion with -mcpu=generic Wilco Dijkstra
  2017-03-16 18:01   ` Andrew Pinski
@ 2017-04-20 15:59   ` Wilco Dijkstra
  2017-05-05 13:38     ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 11+ messages in thread
From: Wilco Dijkstra @ 2017-04-20 15:59 UTC (permalink / raw)
  To: GCC Patches, James Greenhalgh
  Cc: nd, Evandro Menezes, Andrew.pinski, jim.wilson


ping

From: Wilco Dijkstra
Sent: 16 March 2017 17:22
To: GCC Patches; Evandro Menezes; Andrew.pinski@cavium.com; jim.wilson@linaro.org
Cc: nd
Subject: [PATCH][AArch64] Enable AES fusion with -mcpu=generic
    
Many supported cores implement fusion of AES instructions.  When fusion
happens it can give a significant performance gain.  If not, scheduling
fusion candidates next to each other has almost no effect on performance.
Due to the high benefit/low cost it makes sense to enable AES fusion with
-mcpu=generic so that cores that support it always benefit.  Any objections?

Bootstrapped on AArch64, no regressions.

ChangeLog:
2017-03-16  Wilco Dijkstra  <wdijkstr@arm.com>

        * gcc/config/aarch64/aarch64.c (generic_tunings): Add AES fusion.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 728ce7029f1e2b5161d9f317d10e564dd5a5f472..c8cf7169a5d387de336920b50c83761dc0c96f3a 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -536,7 +536,7 @@ static const struct tune_params generic_tunings =
   &generic_approx_modes,
   4, /* memmov_cost  */
   2, /* issue_rate  */
-  AARCH64_FUSE_NOTHING, /* fusible_ops  */
+  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
   8,   /* function_align.  */
   8,   /* jump_align.  */
   4,   /* loop_align.  */
    

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH][AArch64] Enable AES fusion with -mcpu=generic
  2017-04-20 15:59   ` Wilco Dijkstra
@ 2017-05-05 13:38     ` Richard Earnshaw (lists)
  0 siblings, 0 replies; 11+ messages in thread
From: Richard Earnshaw (lists) @ 2017-05-05 13:38 UTC (permalink / raw)
  To: Wilco Dijkstra, GCC Patches, James Greenhalgh
  Cc: nd, Evandro Menezes, Andrew.pinski, jim.wilson

On 20/04/17 16:53, Wilco Dijkstra wrote:
> 
> ping

James has already approved this on 17 March, why are you pinging again?

https://gcc.gnu.org/ml/gcc-patches/2017-03/msg00918.html

> 
> From: Wilco Dijkstra
> Sent: 16 March 2017 17:22
> To: GCC Patches; Evandro Menezes; Andrew.pinski@cavium.com; jim.wilson@linaro.org
> Cc: nd
> Subject: [PATCH][AArch64] Enable AES fusion with -mcpu=generic
>     
> Many supported cores implement fusion of AES instructions.  When fusion
> happens it can give a significant performance gain.  If not, scheduling
> fusion candidates next to each other has almost no effect on performance.
> Due to the high benefit/low cost it makes sense to enable AES fusion with
> -mcpu=generic so that cores that support it always benefit.  Any objections?
> 
> Bootstrapped on AArch64, no regressions.
> 
> ChangeLog:
> 2017-03-16  Wilco Dijkstra  <wdijkstr@arm.com>
> 
>         * gcc/config/aarch64/aarch64.c (generic_tunings): Add AES fusion.
> 
> --
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 728ce7029f1e2b5161d9f317d10e564dd5a5f472..c8cf7169a5d387de336920b50c83761dc0c96f3a 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -536,7 +536,7 @@ static const struct tune_params generic_tunings =
>    &generic_approx_modes,
>    4, /* memmov_cost  */
>    2, /* issue_rate  */
> -  AARCH64_FUSE_NOTHING, /* fusible_ops  */
> +  (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
>    8,   /* function_align.  */
>    8,   /* jump_align.  */
>    4,   /* loop_align.  */
>     
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-05-05 13:38 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-09 14:42 [RFC][PATCH][AArch64] Improve generic branch cost Wilco Dijkstra
2017-03-09 22:06 ` Andrew Pinski
2017-03-14  9:37   ` James Greenhalgh
2017-03-17  3:19     ` Jim Wilson
2017-03-16 17:22 ` [PATCH][AArch64] Enable AES fusion with -mcpu=generic Wilco Dijkstra
2017-03-16 18:01   ` Andrew Pinski
2017-03-17  3:26     ` Jim Wilson
2017-03-17 10:56       ` James Greenhalgh
2017-04-20 15:59   ` Wilco Dijkstra
2017-05-05 13:38     ` Richard Earnshaw (lists)
2017-03-17 10:15 ` [RFC][PATCH][AArch64] Improve generic branch cost Richard Earnshaw (lists)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).