public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Atom: Scheduler improvements for better imul placement
@ 2012-04-11  8:40 Igor Zamyatin
  2012-04-11 13:38 ` Andi Kleen
  0 siblings, 1 reply; 26+ messages in thread
From: Igor Zamyatin @ 2012-04-11  8:40 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 470 bytes --]

Hi All!

It is known that imul placement is rather critical for Atom processors
and changes try to improve imul scheduling for Atom.

This gives +5% performance on several tests from new OA 2.0 testsuite
from EEMBC.

Tested for i386 and x86-64, ok for trunk?


ChangeLog:

2012-04-10  Yuri Rumyantsev  <ysrumyan@gmail.com>

	* config/i386/i386.c (x86_sched_reorder): New function.
	Add new function x86_sched_reorder
	to take advantage of Atom pipelened IMUL execution.

[-- Attachment #2: imul_sched.patch --]
[-- Type: application/octet-stream, Size: 4088 bytes --]

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8974ddc..32d031f 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -23721,6 +23721,115 @@ ix86_agi_dependent (rtx set_insn, rtx use_insn)
   return false;
 }
 
+/* Try to reorder ready list to take advantage of Atom pipelined IMUL
+   execution. It is applied if
+   (1) IMUL instrction is on the top of list;
+   (2) There no others IMUL instructions in ready list;
+   (3) There exists the only producer of independent IMUL instruction in
+       ready list;
+   (4) Put found producer on the top of ready list.
+   Returns issue rate. */
+
+static int
+ix86_sched_reorder(FILE *dump, int sched_verbose, rtx *ready, int *pn_ready,
+                   int clock_var ATTRIBUTE_UNUSED)
+{
+  static int issue_rate = -1;
+  int n_ready = *pn_ready;
+  int n_imul = 0;
+  rtx insn;
+  rtx insn1;
+  int i;
+  sd_iterator_def sd_it;
+  dep_t dep;
+  int index = -1;
+
+  /* set up issue rate */
+  if (issue_rate < 0)
+    issue_rate = ix86_issue_rate();
+
+  /* do reodering for Atom only */
+  if (ix86_tune != PROCESSOR_ATOM)
+    return issue_rate;
+  /* nothing to do if ready list contains only 1 instruction */
+  if (n_ready <= 1)
+    return issue_rate;
+
+  /* check that IMUL instruction is on the top of ready list. */
+  insn = PATTERN(ready[n_ready -1]);
+  if (GET_CODE (insn) == PARALLEL)
+    insn = XVECEXP (insn, 0, 0);
+  if (GET_CODE (insn) != SET)
+    return issue_rate;
+  if (!(GET_CODE (SET_SRC (insn)) == MULT
+        && GET_MODE (SET_SRC (insn)) == SImode))
+    return issue_rate;
+
+  /* count for number of IMUL instructions. */
+  for (i = 0; i < n_ready; i++)
+    {
+      insn = PATTERN (ready[i]);
+      if (GET_CODE (insn) == PARALLEL)
+        insn = XVECEXP (insn, 0, 0);
+
+      if (GET_CODE (insn) != SET)
+        continue;
+      if (!(GET_CODE (SET_SRC (insn)) == MULT
+            && GET_MODE (SET_SRC (insn)) == SImode))
+        n_imul++;
+    }
+    /* do nothing if we have > 1 IMUL instrusction in ready list. */
+  if (n_imul > 1)
+    return issue_rate;
+  /* search for producer of independent IMUL instruction. */
+  for (i = n_ready -2; i>= 0; i--)
+    {
+      insn = ready[i];
+      FOR_EACH_DEP (insn, SD_LIST_FORW, sd_it, dep)
+        {
+          rtx con;
+	  con = DEP_CON (dep);
+          insn1 = PATTERN (con);
+          if (GET_CODE (insn1) == PARALLEL)
+            insn1 = XVECEXP (insn1, 0, 0);
+          if (GET_CODE (insn1) == SET
+              && GET_CODE (SET_SRC (insn1)) == MULT
+              && GET_MODE (SET_SRC (insn1)) == SImode)
+            {
+              sd_iterator_def sd_it1;
+              dep_t dep1;
+              /* check if there is no other dependee for IMUL. */
+              index = i;
+              FOR_EACH_DEP (con, SD_LIST_BACK, sd_it1, dep1)
+                {
+                  rtx pro;
+                  pro = DEP_PRO (dep1);
+                  if (pro != insn)
+                    index = -1;
+	        }
+              if (index >= 0)
+                break;
+            }
+        }
+      if (index >= 0)
+        break;
+    }
+  if (index < 0)
+    return issue_rate; /* didn't find IMUL producer. */
+
+  if (sched_verbose)
+    fprintf(dump, ";;\tatom sched_reorder: swap %d and %d insns\n",
+            INSN_UID (ready[index]), INSN_UID (ready[n_ready - 1]));
+
+  /* put IMUL producer (ready[index]) at the top of ready list */
+  insn1= ready[index];
+  for (i = index; i < n_ready - 1; i++)
+    ready[i] = ready[i + 1];
+  ready[n_ready - 1] = insn1;
+
+  return issue_rate;
+}
+
 static int
 ix86_adjust_cost (rtx insn, rtx link, rtx dep_insn, int cost)
 {
@@ -38072,6 +38181,9 @@ ix86_enum_va_list (int idx, const char **pname, tree *ptree)
 #undef TARGET_SCHED_REASSOCIATION_WIDTH
 #define TARGET_SCHED_REASSOCIATION_WIDTH ix86_reassociation_width
 
+#undef TARGET_SCHED_REORDER
+#define TARGET_SCHED_REORDER ix86_sched_reorder
+
 /* The size of the dispatch window is the total number of bytes of
    object code allowed in a window.  */
 #define DISPATCH_WINDOW_SIZE 16

^ permalink raw reply	[flat|nested] 26+ messages in thread
* Re: [PATCH] Atom: Scheduler improvements for better imul placement
@ 2012-05-28 21:41 Uros Bizjak
       [not found] ` <0EFAB2BDD0F67E4FB6CCC8B9F87D756915E05C45@IRSMSX101.ger.corp.intel.com>
  0 siblings, 1 reply; 26+ messages in thread
From: Uros Bizjak @ 2012-05-28 21:41 UTC (permalink / raw)
  To: gcc-patches
  Cc: Andrey Belevantsev, Richard Guenther, Alexander Monakov, Andi Kleen

Hello!

> Ping?

Please at least add and URL to the patch, it took me some time to
found the latest version [1], I'm not even sure if it is the latest
version...

I assume that you cleared all issues with middle-end and scheduler
maintainers, it is not clear from the message.

+   (1) IMUL instrction is on the top of list;

Typo above.

+  static int issue_rate = -1;
+  int n_ready = *pn_ready;
+  rtx insn;
+  rtx insn1;
+  rtx insn2;

Please put three definitions above on the same line.

+  int i;
+  sd_iterator_def sd_it;
+  dep_t dep;
+  int index = -1;
+
+  /* set up issue rate */
+  if (issue_rate < 0)
+    issue_rate = ix86_issue_rate();

Please set issue_rate unconditionally here.  Also, please follow the
GNU style of comments (Full sentence with two spaces after the dot)
everywhere, e.g:

/* Set up issue rate.  */

+  if (!(GET_CODE (SET_SRC (insn)) == MULT
+      && GET_MODE (SET_SRC (insn)) == SImode))
+    return issue_rate;

Is it correct that only SImode multiplies are checked against SImode
multiplies? Can't we use DImode or HImode multiply (or other
long-latency insns) to put them into the shadow of the first multiply
insn?

As proposed in [2], there are many other fine-tuning approaches
proposed by the scheduler maintainer. OTOH, even the "big hammer"
approach in the proposed patch makes things better, so it is the step
in the right direction - and it is existing practice anyway.

Under this rationale, I think that the patch should be committed to
mainline. But please also consider proposed fine-tunings to refine the
scheduling accuracy.

So, OK for mainline, if there are no objections from other maintainers
in next two days.

[1] http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00964.html
[2] http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00806.html

Thanks,
Uros.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2012-06-01 17:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-11  8:40 [PATCH] Atom: Scheduler improvements for better imul placement Igor Zamyatin
2012-04-11 13:38 ` Andi Kleen
2012-04-11 14:10   ` Richard Guenther
2012-04-12 10:20     ` Igor Zamyatin
2012-04-12 10:45       ` Richard Guenther
2012-04-12 12:00         ` Alexander Monakov
2012-04-12 12:24           ` Richard Guenther
2012-04-12 12:36             ` Igor Zamyatin
2012-04-12 12:38               ` Richard Guenther
2012-04-12 13:02                 ` Andrey Belevantsev
2012-04-12 13:55                   ` Richard Guenther
2012-04-12 14:03                     ` Andrey Belevantsev
2012-04-12 14:22                       ` Richard Guenther
2012-04-13  7:16                         ` Andrey Belevantsev
2012-04-13 10:18                   ` Igor Zamyatin
2012-04-13 11:00                     ` Igor Zamyatin
2012-04-13 12:21                     ` Andrey Belevantsev
2012-04-16 20:27                       ` Igor Zamyatin
2012-04-20 12:05                         ` Igor Zamyatin
2012-05-06  7:27                           ` Igor Zamyatin
2012-05-28 20:20                             ` Igor Zamyatin
2012-04-12 10:18   ` Igor Zamyatin
2012-05-28 21:41 Uros Bizjak
     [not found] ` <0EFAB2BDD0F67E4FB6CCC8B9F87D756915E05C45@IRSMSX101.ger.corp.intel.com>
2012-05-29 19:01   ` Igor Zamyatin
2012-06-01 13:46     ` H.J. Lu
2012-06-01 17:48     ` Eric Botcazou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).