From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 60799 invoked by alias); 21 Apr 2015 14:00:15 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 60775 invoked by uid 89); 21 Apr 2015 14:00:13 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (146.101.78.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 21 Apr 2015 14:00:11 +0000 Received: from cam-owa2.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.140]) by uk-mta-13.uk.mimecast.lan; Tue, 21 Apr 2015 15:00:08 +0100 Received: from e106327-lin.cambridge.arm.com ([10.1.2.79]) by cam-owa2.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.3959); Tue, 21 Apr 2015 15:00:06 +0100 Message-ID: <553657E5.7080002@arm.com> Date: Tue, 21 Apr 2015 14:00:00 -0000 From: Matthew Wahab User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: gcc-patches Subject: [PATCH][AArch64] Add branch-cost to cpu tuning information. X-MC-Unique: tBZyZUBaTVC8SMI_uteaNg-1 Content-Type: multipart/mixed; boundary="------------080001020604010905080002" X-IsSubscribed: yes X-SW-Source: 2015-04/txt/msg01182.txt.bz2 This is a multi-part message in MIME format. --------------080001020604010905080002 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Content-length: 965 The AArch64 backend sets BRANCH_COST to be the constant value 2 for all cpu= s, meaning that the compiler thinks that branches cost the same across all cpu= s. This patch reworks the handling of branch costs to allow per-cpu values to = be set. The actual value of the branch-costs is unchanged as the correct value= s for will need to be decided for each core. Tested aarch64-none-linux-gnu with gcc-check. Ok for trunk? Matthew 2015-05-21 Matthew Wahab * gcc/config/aarch64-protos.h (struct cpu_branch_cost): New. (tune_params): Add field branch_costs. (aarch64_branch_cost): Declare. * gcc/config/aarch64.c (generic_branch_cost): New. (generic_tunings): Set field cpu_branch_cost to generic_branch_cost. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (thunderx_tunings): Likewise. (xgene1_tunings): Likewise. (aarch64_branch_cost): Define. * gcc/config/aarch64/aarch64.h (BRANCH_COST): Redefine. --------------080001020604010905080002 Content-Type: text/x-patch; name=percpu_branchcost.patch Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="percpu_branchcost.patch" Content-length: 4159 diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch= 64-protos.h index 8676c5c..77b01fa 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -162,12 +162,20 @@ struct cpu_vector_cost const int cond_not_taken_branch_cost; /* Cost of not taken branch. */ }; =20 +/* Branch costs. */ +struct cpu_branch_cost +{ + const int predictable; /* Predictable branch or optimizing for size. = */ + const int unpredictable; /* Unpredictable branch or optimizing for spee= d. */ +}; + struct tune_params { const struct cpu_cost_table *const insn_extra_cost; const struct cpu_addrcost_table *const addr_cost; const struct cpu_regmove_cost *const regmove_cost; const struct cpu_vector_cost *const vec_costs; + const struct cpu_branch_cost *const branch_costs; const int memmov_cost; const int issue_rate; const unsigned int fuseable_ops; @@ -259,6 +267,8 @@ void aarch64_print_operand (FILE *, rtx, char); void aarch64_print_operand_address (FILE *, rtx); void aarch64_emit_call_insn (rtx); =20 +int aarch64_branch_cost (bool, bool); + /* Initialize builtins for SIMD intrinsics. */ void init_aarch64_simd_builtins (void); =20 diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 77a641e..a020316 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -339,12 +339,20 @@ static const struct cpu_vector_cost xgene1_vector_cos= t =3D #define AARCH64_FUSE_ADRP_LDR (1 << 3) #define AARCH64_FUSE_CMP_BRANCH (1 << 4) =20 +/* Generic costs for branch instructions. */ +static const struct cpu_branch_cost generic_branch_cost =3D +{ + 2, /* Predictable. */ + 2 /* Unpredictable. */ +}; + static const struct tune_params generic_tunings =3D { &cortexa57_extra_costs, &generic_addrcost_table, &generic_regmove_cost, &generic_vector_cost, + &generic_branch_cost, 4, /* memmov_cost */ 2, /* issue_rate */ AARCH64_FUSE_NOTHING, /* fuseable_ops */ @@ -362,6 +370,7 @@ static const struct tune_params cortexa53_tunings =3D &generic_addrcost_table, &cortexa53_regmove_cost, &generic_vector_cost, + &generic_branch_cost, 4, /* memmov_cost */ 2, /* issue_rate */ (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD @@ -380,6 +389,7 @@ static const struct tune_params cortexa57_tunings =3D &cortexa57_addrcost_table, &cortexa57_regmove_cost, &cortexa57_vector_cost, + &generic_branch_cost, 4, /* memmov_cost */ 3, /* issue_rate */ (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD @@ -398,6 +408,7 @@ static const struct tune_params thunderx_tunings =3D &generic_addrcost_table, &thunderx_regmove_cost, &generic_vector_cost, + &generic_branch_cost, 6, /* memmov_cost */ 2, /* issue_rate */ AARCH64_FUSE_CMP_BRANCH, /* fuseable_ops */ @@ -415,6 +426,7 @@ static const struct tune_params xgene1_tunings =3D &xgene1_addrcost_table, &xgene1_regmove_cost, &xgene1_vector_cost, + &generic_branch_cost, 6, /* memmov_cost */ 4, /* issue_rate */ AARCH64_FUSE_NOTHING, /* fuseable_ops */ @@ -5361,6 +5373,19 @@ aarch64_address_cost (rtx x, return cost; } =20 +int +aarch64_branch_cost (bool speed_p, bool predictable_p) +{ + /* When optimizing for speed, use the cost of unpredictable branches. */ + const struct cpu_branch_cost *branch_costs =3D + aarch64_tune_params->branch_costs; + + if (!speed_p || predictable_p) + return branch_costs->predictable; + else + return branch_costs->unpredictable; +} + /* Return true if the RTX X in mode MODE is a zero or sign extract usable in an ADD or SUB (extended register) instruction. */ static bool diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index bf59e40..93a32f5 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -823,7 +823,8 @@ do { \ #define TRAMPOLINE_SECTION text_section =20 /* To start with. */ -#define BRANCH_COST(SPEED_P, PREDICTABLE_P) 2 +#define BRANCH_COST(SPEED_P, PREDICTABLE_P) \ + (aarch64_branch_cost (SPEED_P, PREDICTABLE_P)) =0C =20 /* Assembly output. */ --------------080001020604010905080002--