From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-336009-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 5488 invoked by alias); 25 Jan 2013 18:25:01 -0000
Received: (qmail 5468 invoked by uid 22791); 25 Jan 2013 18:25:00 -0000
X-SWARE-Spam-Status: No, hits=-2.1 required=5.0	tests=AWL,BAYES_00,KHOP_RCVD_UNTRUST,KHOP_SPAMHAUS_DROP,KHOP_THREADED,MSGID_MULTIPLE_AT,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: sourceware.org
Received: from service87.mimecast.com (HELO service87.mimecast.com) (91.220.42.44)    by sourceware.org (qpsmtpd/0.43rc1) with ESMTP; Fri, 25 Jan 2013 18:24:29 +0000
Received: from cam-owa1.Emea.Arm.com (fw-tnat.cambridge.arm.com [217.140.96.21]) by service87.mimecast.com; Fri, 25 Jan 2013 18:24:27 +0000
Received: from e103227vm ([10.1.255.212]) by cam-owa1.Emea.Arm.com with Microsoft SMTPSVC(6.0.3790.0);	 Fri, 25 Jan 2013 18:23:37 +0000
From: "Greta Yorsh" <greta.yorsh@arm.com>
To: "GCC Patches" <gcc-patches@gcc.gnu.org>
Cc: "Richard Earnshaw" <Richard.Earnshaw@arm.com>,	"Ramana Radhakrishnan" <Ramana.Radhakrishnan@arm.com>,	<nickc@redhat.com>,	<paul@codesourcery.com>,	"Greta Yorsh" <Greta.Yorsh@arm.com>
References: <000401cdfb27$da2f1f30$8e8d5d90$@yorsh@arm.com>
In-Reply-To: <000401cdfb27$da2f1f30$8e8d5d90$@yorsh@arm.com>
Subject: [PATCH,ARM][3/5] New bypass between mac operations in cortex-a7 pipeline description
Date: Fri, 25 Jan 2013 18:25:00 -0000
Message-ID: <001201cdfb29$16052800$420f7800$@yorsh@arm.com>
MIME-Version: 1.0
X-MC-Unique: 113012518242700501
Content-Type: multipart/mixed;	boundary="----=_NextPart_000_0013_01CDFB29.16052800"
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
X-SW-Source: 2013-01/txt/msg01252.txt.bz2

This is a multi-part message in MIME format.

------=_NextPart_000_0013_01CDFB29.16052800
Content-Type: text/plain; charset=WINDOWS-1252
Content-Transfer-Encoding: quoted-printable
Content-length: 953

Add bypasses to forward the result of one MAC operation to the accumulator
of another MAC operation.

Towards this end, we add a new function arm_mac_accumulator_is_result to be
used as a guard for bypasses. Existing guard
arm_mac_accumulator_is_mul_result requires a multiply operation as the
producer and a multiply-accumulate operation as the consumer. The new guard
allows more general producers and consumers. It allows the consumer to be a
multiply-accumulate or multiply-subtract operation. It allows the producer
to be any SET operation, although only MAC operations are used as producers
in the pipeline description of Cortex-A7.

gcc/

2013-01-03  Greta Yorsh  <Greta.Yorsh@arm.com>

        * config/arm/arm-protos.h (arm_mac_accumulator_is_result): New
        declaration.
        * config/arm/arm.c (arm_mac_accumulator_is_result): New function.
        * config/arm/cortex-a7.md: New bypasses using
        arm_mac_accumulator_is_result.

------=_NextPart_000_0013_01CDFB29.16052800
Content-Type: text/plain; name=3-mac-forward-path.v2.patch.txt
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
	filename="3-mac-forward-path.v2.patch.txt"
Content-length: 4281

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 4c61e35..885ccff 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -102,6 +102,7 @@ extern int arm_early_load_addr_dep (rtx, rtx);
 extern int arm_no_early_alu_shift_dep (rtx, rtx);
 extern int arm_no_early_alu_shift_value_dep (rtx, rtx);
 extern int arm_no_early_mul_dep (rtx, rtx);
+extern int arm_mac_accumulator_is_result (rtx, rtx);
 extern int arm_mac_accumulator_is_mul_result (rtx, rtx);
=20
 extern int tls_mentioned_p (rtx);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 13d745f..39f1eb3 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24610,6 +24610,62 @@ arm_cxx_guard_type (void)
   return TARGET_AAPCS_BASED ? integer_type_node : long_long_integer_type_n=
ode;
 }
=20
+/* Return non-zero iff the consumer (a multiply-accumulate or a
+   multiple-subtract instruction) has an accumulator dependency on the
+   result of the producer and no other dependency on that result.  It
+   does not check if the producer is multiply-accumulate instruction.  */
+int
+arm_mac_accumulator_is_result (rtx producer, rtx consumer)
+{
+  rtx result;
+  rtx op0, op1, acc;
+
+  producer =3D PATTERN (producer);
+  consumer =3D PATTERN (consumer);
+
+  if (GET_CODE (producer) =3D=3D COND_EXEC)
+    producer =3D COND_EXEC_CODE (producer);
+  if (GET_CODE (consumer) =3D=3D COND_EXEC)
+    consumer =3D COND_EXEC_CODE (consumer);
+
+  if (GET_CODE (producer) !=3D SET)
+    return 0;
+
+  result =3D XEXP (producer, 0);
+
+  if (GET_CODE (consumer) !=3D SET)
+    return 0;
+
+  /* Check that the consumer is of the form
+     (set (...) (plus (mult ...) (...)))
+     or
+     (set (...) (minus (...) (mult ...))).  */
+  if (GET_CODE (XEXP (consumer, 1)) =3D=3D PLUS)
+    {
+      if (GET_CODE (XEXP (XEXP (consumer, 1), 0)) !=3D MULT)
+        return 0;
+
+      op0 =3D XEXP (XEXP (XEXP (consumer, 1), 0), 0);
+      op1 =3D XEXP (XEXP (XEXP (consumer, 1), 0), 1);
+      acc =3D XEXP (XEXP (consumer, 1), 1);
+    }
+  else if (GET_CODE (XEXP (consumer, 1)) =3D=3D MINUS)
+    {
+      if (GET_CODE (XEXP (XEXP (consumer, 1), 1)) !=3D MULT)
+        return 0;
+
+      op0 =3D XEXP (XEXP (XEXP (consumer, 1), 1), 0);
+      op1 =3D XEXP (XEXP (XEXP (consumer, 1), 1), 1);
+      acc =3D XEXP (XEXP (consumer, 1), 0);
+    }
+  else
+    return 0;
+
+  return (reg_overlap_mentioned_p (result, acc)
+          && !reg_overlap_mentioned_p (result, op0)
+          && !reg_overlap_mentioned_p (result, op1));
+}
+
 /* Return non-zero if the consumer (a multiply-accumulate instruction)
    has an accumulator dependency on the result of the producer (a
    multiplication instruction) and no other dependency on that result.  */
diff --git a/gcc/config/arm/cortex-a7.md b/gcc/config/arm/cortex-a7.md
index 930242d..2cef5fd 100644
--- a/gcc/config/arm/cortex-a7.md
+++ b/gcc/config/arm/cortex-a7.md
@@ -137,6 +137,12 @@
             (eq_attr "neon_type" "none")))
   "cortex_a7_both")
=20
+;; Forward the result of a multiply operation to the accumulator=20
+;; of the following multiply and accumulate instruction.
+(define_bypass 1 "cortex_a7_mul"
+                 "cortex_a7_mul"
+                 "arm_mac_accumulator_is_result")
+
 ;; The latency depends on the operands, so we use an estimate here.
 (define_insn_reservation "cortex_a7_idiv" 5
   (and (eq_attr "tune" "cortexa7")
@@ -264,6 +271,10 @@
                  neon_fp_vmla_qqq_scalar"))
   "cortex_a7_both+cortex_a7_fpmul_pipe")
=20
+(define_bypass 4 "cortex_a7_fpmacs,cortex_a7_neon_mla"
+                 "cortex_a7_fpmacs,cortex_a7_neon_mla"
+                 "arm_mac_accumulator_is_result")
+
 ;; Non-multiply instructions can issue between two cycles of a
 ;; double-precision multiply.=20
=20
@@ -285,6 +296,10 @@
             (eq_attr "neon_type" "none")))
   "cortex_a7_ex1+cortex_a7_fpmul_pipe, cortex_a7_fpmul_pipe*4")
=20
+(define_bypass 7 "cortex_a7_fpmacd"
+                 "cortex_a7_fpmacd,cortex_a7_fpfmad"
+                 "arm_mac_accumulator_is_result")
+
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ;; Floating-point divide/square root instructions.
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

------=_NextPart_000_0013_01CDFB29.16052800--