From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 5488 invoked by alias); 25 Jan 2013 18:25:01 -0000 Received: (qmail 5468 invoked by uid 22791); 25 Jan 2013 18:25:00 -0000 X-SWARE-Spam-Status: No, hits=-2.1 required=5.0 tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_SPAMHAUS_DROP,KHOP_THREADED,MSGID_MULTIPLE_AT,RCVD_IN_DNSWL_LOW X-Spam-Check-By: sourceware.org Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44) by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Jan 2013 18:24:29 +0000 Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Fri, 25 Jan 2013 18:24:27 +0000 Received: from e103227vm ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0); Fri, 25 Jan 2013 18:23:37 +0000 From: "Greta Yorsh" To: "GCC Patches" Cc: "Richard Earnshaw" , "Ramana Radhakrishnan" , , , "Greta Yorsh" References: <000401cdfb27$da2f1f30$8e8d5d90$@yorsh@arm.com> In-Reply-To: <000401cdfb27$da2f1f30$8e8d5d90$@yorsh@arm.com> Subject: [PATCH,ARM][3/5] New bypass between mac operations in cortex-a7 pipeline description Date: Fri, 25 Jan 2013 18:25:00 -0000 Message-ID: <001201cdfb29$16052800$420f7800$@yorsh@arm.com> MIME-Version: 1.0 X-MC-Unique: 113012518242700501 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0013_01CDFB29.16052800" Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org X-SW-Source: 2013-01/txt/msg01252.txt.bz2 This is a multi-part message in MIME format. ------=_NextPart_000_0013_01CDFB29.16052800 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable Content-length: 953 Add bypasses to forward the result of one MAC operation to the accumulator of another MAC operation. Towards this end, we add a new function arm_mac_accumulator_is_result to be used as a guard for bypasses. Existing guard arm_mac_accumulator_is_mul_result requires a multiply operation as the producer and a multiply-accumulate operation as the consumer. The new guard allows more general producers and consumers. It allows the consumer to be a multiply-accumulate or multiply-subtract operation. It allows the producer to be any SET operation, although only MAC operations are used as producers in the pipeline description of Cortex-A7. gcc/ 2013-01-03 Greta Yorsh * config/arm/arm-protos.h (arm_mac_accumulator_is_result): New declaration. * config/arm/arm.c (arm_mac_accumulator_is_result): New function. * config/arm/cortex-a7.md: New bypasses using arm_mac_accumulator_is_result. ------=_NextPart_000_0013_01CDFB29.16052800 Content-Type: text/plain; name=3-mac-forward-path.v2.patch.txt Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="3-mac-forward-path.v2.patch.txt" Content-length: 4281 diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 4c61e35..885ccff 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -102,6 +102,7 @@ extern int arm_early_load_addr_dep (rtx, rtx); extern int arm_no_early_alu_shift_dep (rtx, rtx); extern int arm_no_early_alu_shift_value_dep (rtx, rtx); extern int arm_no_early_mul_dep (rtx, rtx); +extern int arm_mac_accumulator_is_result (rtx, rtx); extern int arm_mac_accumulator_is_mul_result (rtx, rtx); =20 extern int tls_mentioned_p (rtx); diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 13d745f..39f1eb3 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -24610,6 +24610,62 @@ arm_cxx_guard_type (void) return TARGET_AAPCS_BASED ? integer_type_node : long_long_integer_type_n= ode; } =20 +/* Return non-zero iff the consumer (a multiply-accumulate or a + multiple-subtract instruction) has an accumulator dependency on the + result of the producer and no other dependency on that result. It + does not check if the producer is multiply-accumulate instruction. */ +int +arm_mac_accumulator_is_result (rtx producer, rtx consumer) +{ + rtx result; + rtx op0, op1, acc; + + producer =3D PATTERN (producer); + consumer =3D PATTERN (consumer); + + if (GET_CODE (producer) =3D=3D COND_EXEC) + producer =3D COND_EXEC_CODE (producer); + if (GET_CODE (consumer) =3D=3D COND_EXEC) + consumer =3D COND_EXEC_CODE (consumer); + + if (GET_CODE (producer) !=3D SET) + return 0; + + result =3D XEXP (producer, 0); + + if (GET_CODE (consumer) !=3D SET) + return 0; + + /* Check that the consumer is of the form + (set (...) (plus (mult ...) (...))) + or + (set (...) (minus (...) (mult ...))). */ + if (GET_CODE (XEXP (consumer, 1)) =3D=3D PLUS) + { + if (GET_CODE (XEXP (XEXP (consumer, 1), 0)) !=3D MULT) + return 0; + + op0 =3D XEXP (XEXP (XEXP (consumer, 1), 0), 0); + op1 =3D XEXP (XEXP (XEXP (consumer, 1), 0), 1); + acc =3D XEXP (XEXP (consumer, 1), 1); + } + else if (GET_CODE (XEXP (consumer, 1)) =3D=3D MINUS) + { + if (GET_CODE (XEXP (XEXP (consumer, 1), 1)) !=3D MULT) + return 0; + + op0 =3D XEXP (XEXP (XEXP (consumer, 1), 1), 0); + op1 =3D XEXP (XEXP (XEXP (consumer, 1), 1), 1); + acc =3D XEXP (XEXP (consumer, 1), 0); + } + else + return 0; + + return (reg_overlap_mentioned_p (result, acc) + && !reg_overlap_mentioned_p (result, op0) + && !reg_overlap_mentioned_p (result, op1)); +} + /* Return non-zero if the consumer (a multiply-accumulate instruction) has an accumulator dependency on the result of the producer (a multiplication instruction) and no other dependency on that result. */ diff --git a/gcc/config/arm/cortex-a7.md b/gcc/config/arm/cortex-a7.md index 930242d..2cef5fd 100644 --- a/gcc/config/arm/cortex-a7.md +++ b/gcc/config/arm/cortex-a7.md @@ -137,6 +137,12 @@ (eq_attr "neon_type" "none"))) "cortex_a7_both") =20 +;; Forward the result of a multiply operation to the accumulator=20 +;; of the following multiply and accumulate instruction. +(define_bypass 1 "cortex_a7_mul" + "cortex_a7_mul" + "arm_mac_accumulator_is_result") + ;; The latency depends on the operands, so we use an estimate here. (define_insn_reservation "cortex_a7_idiv" 5 (and (eq_attr "tune" "cortexa7") @@ -264,6 +271,10 @@ neon_fp_vmla_qqq_scalar")) "cortex_a7_both+cortex_a7_fpmul_pipe") =20 +(define_bypass 4 "cortex_a7_fpmacs,cortex_a7_neon_mla" + "cortex_a7_fpmacs,cortex_a7_neon_mla" + "arm_mac_accumulator_is_result") + ;; Non-multiply instructions can issue between two cycles of a ;; double-precision multiply.=20 =20 @@ -285,6 +296,10 @@ (eq_attr "neon_type" "none"))) "cortex_a7_ex1+cortex_a7_fpmul_pipe, cortex_a7_fpmul_pipe*4") =20 +(define_bypass 7 "cortex_a7_fpmacd" + "cortex_a7_fpmacd,cortex_a7_fpfmad" + "arm_mac_accumulator_is_result") + ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;; Floating-point divide/square root instructions. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ------=_NextPart_000_0013_01CDFB29.16052800--