public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
  2011-07-29 17:48 [DF] [performance] generate DF_REF_BASE REFs in REGNO order Dimitrios Apostolou
@ 2011-07-29 17:48 ` Paolo Bonzini
  2011-07-29 17:57   ` Kenneth Zadeck
  2011-07-29 18:01 ` Dimitrios Apostolou
  1 sibling, 1 reply; 11+ messages in thread
From: Paolo Bonzini @ 2011-07-29 17:48 UTC (permalink / raw)
  To: Dimitrios Apostolou
  Cc: gcc-patches, Steven Bosscher, seongbae.park, Kenneth Zadeck,
	Manolis Marazakis

On 07/29/2011 07:23 PM, Dimitrios Apostolou wrote:
>
> 2011-07-29  Dimitrios Apostolou <jimis@gmx.net>
>              Paolo Bonzini <bonzini@gnu.org>
>
>          (df_def_record_1): Assert a parallel must contain an EXPR_LIST at
>      this point.  Receive the LOC and move its extraction...
>          (df_defs_record): ... here. Rewrote logic with a switch statement
>      instead of multiple if-else.
>      (df_find_hard_reg_defs, df_find_hard_reg_defs_1): New functions
>      that duplicate the logic of df_defs_record() and df_def_record_1()
>      but without actually recording any DEFs, only marking them in
>      the defs HARD_REG_SET.
>      (df_get_call_refs): Call df_find_hard_reg_defs() to mark DEFs that
>      are the result of the call. Record DF_REF_BASE DEFs in REGNO
>      order. Use regs_invalidated_by_call HARD_REG_SET instead of
>      regs_invalidated_by_call_regset bitmap.
>      (df_insn_refs_collect): Record DF_REF_REGULAR DEFs after
>      df_get_call_refs().

Ok for mainline.  I will commit it for you after rebootstrapping (just 
to be safe).

> P.S. maraz: that's 4.3% improvement in instruction count, should you start worrying or is it too late?

Now I'm curious!

Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [DF] [performance] generate DF_REF_BASE REFs in REGNO order
@ 2011-07-29 17:48 Dimitrios Apostolou
  2011-07-29 17:48 ` Paolo Bonzini
  2011-07-29 18:01 ` Dimitrios Apostolou
  0 siblings, 2 replies; 11+ messages in thread
From: Dimitrios Apostolou @ 2011-07-29 17:48 UTC (permalink / raw)
  To: gcc-patches
  Cc: Steven Bosscher, Paolo Bonzini, seongbae.park, Kenneth Zadeck,
	Manolis Marazakis

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2164 bytes --]

Hello list,

the attached patch achieves a performance improvement by first recording 
DF_REF_BASE DEFs within df_get_call_refs() before the DF_REF_REGULARs are 
recorded in df_defs_record(). BASE DEFs are also recorded in REGNO order. 
Improvement has been measured as follows, for compiling tcp_ipv4.c of 
linux kernel with -O0 optimisation:

trunk  : 1438.4 M instr, 0.627s
patched: 1376.5 M instr, 0.604s

It also includes suggested changes from Paolo discussed on list (subject: 
what can be in a group set). Many thanks to him for the invaluable help 
while writing the patch.

For whoever is interested, you can see the two profiles with fully 
annotated source before and after the change, at the following links. The 
big difference is that qsort() is now called only 33 times instead of 
thousands, from df_sort_and_compress_refs().
Further measurements, comments and ideas for further improvements are 
welcome.

http://gcc.gnu.org/wiki/OptimisingGCC?action=AttachFile&do=view&target=callgrind-tcp_ipv4-trunk-co-109439-prod.txt
http://gcc.gnu.org/wiki/OptimisingGCC?action=AttachFile&do=view&target=callgrind-tcp_ipv4-df2-co-prod.txt


Changelog:

2011-07-29  Dimitrios Apostolou  <jimis@gmx.net>
             Paolo Bonzini  <bonzini@gnu.org>

         (df_def_record_1): Assert a parallel must contain an EXPR_LIST at
 	this point.  Receive the LOC and move its extraction...
         (df_defs_record): ... here. Rewrote logic with a switch statement
 	instead of multiple if-else.
 	(df_find_hard_reg_defs, df_find_hard_reg_defs_1): New functions
 	that duplicate the logic of df_defs_record() and df_def_record_1()
 	but without actually recording any DEFs, only marking them in
 	the defs HARD_REG_SET.
 	(df_get_call_refs): Call df_find_hard_reg_defs() to mark DEFs that
 	are the result of the call. Record DF_REF_BASE DEFs in REGNO
 	order. Use regs_invalidated_by_call HARD_REG_SET instead of
 	regs_invalidated_by_call_regset bitmap.
 	(df_insn_refs_collect): Record DF_REF_REGULAR DEFs after
 	df_get_call_refs().


Thanks,
Dimitris


P.S. maraz: that's 4.3% improvement in instruction count, should you start 
worrying or is it too late? ;-)

[-- Attachment #2: Type: TEXT/PLAIN, Size: 11214 bytes --]

=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c	2011-02-02 20:08:06 +0000
+++ gcc/df-scan.c	2011-07-29 16:01:50 +0000
@@ -111,7 +111,7 @@ static void df_ref_record (enum df_ref_c
 			   rtx, rtx *,
 			   basic_block, struct df_insn_info *,
 			   enum df_ref_type, int ref_flags);
-static void df_def_record_1 (struct df_collection_rec *, rtx,
+static void df_def_record_1 (struct df_collection_rec *, rtx *,
 			     basic_block, struct df_insn_info *,
 			     int ref_flags);
 static void df_defs_record (struct df_collection_rec *, rtx,
@@ -2916,40 +2916,27 @@ df_read_modify_subreg_p (rtx x)
 }
 
 
-/* Process all the registers defined in the rtx, X.
+/* Process all the registers defined in the rtx pointed by LOC.
    Autoincrement/decrement definitions will be picked up by
    df_uses_record.  */
 
 static void
 df_def_record_1 (struct df_collection_rec *collection_rec,
-                 rtx x, basic_block bb, struct df_insn_info *insn_info,
+                 rtx *loc, basic_block bb, struct df_insn_info *insn_info,
 		 int flags)
 {
-  rtx *loc;
-  rtx dst;
-
- /* We may recursively call ourselves on EXPR_LIST when dealing with PARALLEL
-     construct.  */
-  if (GET_CODE (x) == EXPR_LIST || GET_CODE (x) == CLOBBER)
-    loc = &XEXP (x, 0);
-  else
-    loc = &SET_DEST (x);
-  dst = *loc;
+  rtx dst = *loc;
 
   /* It is legal to have a set destination be a parallel. */
   if (GET_CODE (dst) == PARALLEL)
     {
       int i;
-
       for (i = XVECLEN (dst, 0) - 1; i >= 0; i--)
 	{
 	  rtx temp = XVECEXP (dst, 0, i);
-	  if (GET_CODE (temp) == EXPR_LIST || GET_CODE (temp) == CLOBBER
-	      || GET_CODE (temp) == SET)
-	    df_def_record_1 (collection_rec,
-                             temp, bb, insn_info,
-			     GET_CODE (temp) == CLOBBER
-			     ? flags | DF_REF_MUST_CLOBBER : flags);
+	  gcc_assert (GET_CODE (temp) == EXPR_LIST);
+	  df_def_record_1 (collection_rec, &XEXP (temp, 0),
+			   bb, insn_info, flags);
 	}
       return;
     }
@@ -3003,26 +2990,98 @@ df_defs_record (struct df_collection_rec
 		int flags)
 {
   RTX_CODE code = GET_CODE (x);
+  int i;
 
-  if (code == SET || code == CLOBBER)
-    {
-      /* Mark the single def within the pattern.  */
-      int clobber_flags = flags;
-      clobber_flags |= (code == CLOBBER) ? DF_REF_MUST_CLOBBER : 0;
-      df_def_record_1 (collection_rec, x, bb, insn_info, clobber_flags);
-    }
-  else if (code == COND_EXEC)
+  switch (code)
     {
+    case SET:
+      df_def_record_1 (collection_rec, &SET_DEST (x), bb, insn_info, flags);
+      break;
+
+    case CLOBBER:
+      flags |= DF_REF_MUST_CLOBBER;
+      df_def_record_1 (collection_rec, &XEXP (x, 0), bb, insn_info, flags);
+      break;
+
+    case COND_EXEC:
       df_defs_record (collection_rec, COND_EXEC_CODE (x),
 		      bb, insn_info, DF_REF_CONDITIONAL);
+      break;
+
+    case PARALLEL:
+      for (i = XVECLEN (x, 0) - 1; i >= 0; i--)
+	df_defs_record (collection_rec, XVECEXP (x, 0, i),
+			bb, insn_info, flags);
+      break;
+    default:
+      /* No DEFs to record in other cases */
+      break;
     }
-  else if (code == PARALLEL)
+}
+
+/* Set the bits in *defs of registers defined in the pattern rtx */
+
+static void
+df_find_hard_reg_defs_1 (rtx *loc, basic_block bb,
+			 int flags, HARD_REG_SET *defs)
+{
+  rtx dst = *loc;
+
+  /* It is legal to have a set destination be a parallel. */
+  if (GET_CODE (dst) == PARALLEL)
     {
       int i;
+      for (i = XVECLEN (dst, 0) - 1; i >= 0; i--)
+	{
+	  rtx temp = XVECEXP (dst, 0, i);
+	  gcc_assert (GET_CODE (temp) == EXPR_LIST);
+	  df_find_hard_reg_defs_1 (&XEXP (temp, 0), bb, flags, defs);
+	}
+      return;
+    }
+
+  if (GET_CODE (dst) == STRICT_LOW_PART)
+      dst = XEXP (dst, 0);
+
+  if (GET_CODE (dst) == ZERO_EXTRACT)
+      dst = XEXP (dst, 0);
 
-      /* Mark the multiple defs within the pattern.  */
+  /* At this point if we do not have a reg or a subreg, just return.  */
+  if (REG_P (dst))
+    SET_HARD_REG_BIT (*defs, REGNO (dst));
+  else if (GET_CODE (dst) == SUBREG && REG_P (SUBREG_REG (dst)))
+    SET_HARD_REG_BIT (*defs, REGNO (SUBREG_REG (dst)));
+}
+
+static void
+df_find_hard_reg_defs (rtx x, basic_block bb, 
+		       int flags, HARD_REG_SET *defs)
+{
+  RTX_CODE code = GET_CODE (x);
+  int i;
+
+  switch (code)
+    {
+    case SET:
+      df_find_hard_reg_defs_1 (&SET_DEST (x), bb, flags, defs);
+      break;
+
+    case CLOBBER:
+      flags |= DF_REF_MUST_CLOBBER;
+      df_find_hard_reg_defs_1 (&XEXP (x, 0), bb, flags, defs);
+      break;
+
+    case COND_EXEC:
+      df_find_hard_reg_defs (COND_EXEC_CODE (x), bb, DF_REF_CONDITIONAL, defs);
+      break;
+
+    case PARALLEL:
       for (i = XVECLEN (x, 0) - 1; i >= 0; i--)
-	df_defs_record (collection_rec, XVECEXP (x, 0, i), bb, insn_info, flags);
+	df_find_hard_reg_defs (XVECEXP (x, 0, i), bb, flags, defs);
+      break;
+    default:
+      /* No DEFs to record in other cases */
+      break;
     }
 }
 
@@ -3308,7 +3367,7 @@ df_get_conditional_uses (struct df_colle
 }
 
 
-/* Get call's extra defs and uses. */
+/* Get call's extra defs and uses (track caller-saved registers). */
 
 static void
 df_get_call_refs (struct df_collection_rec * collection_rec,
@@ -3317,20 +3376,50 @@ df_get_call_refs (struct df_collection_r
                   int flags)
 {
   rtx note;
-  bitmap_iterator bi;
-  unsigned int ui;
   bool is_sibling_call;
   unsigned int i;
-  df_ref def;
-  bitmap_head defs_generated;
+  HARD_REG_SET defs_generated;
 
-  bitmap_initialize (&defs_generated, &df_bitmap_obstack);
+  CLEAR_HARD_REG_SET (defs_generated);
+  df_find_hard_reg_defs (PATTERN (insn_info->insn), bb, 
+			 0, &defs_generated);
 
-  /* Do not generate clobbers for registers that are the result of the
-     call.  This causes ordering problems in the chain building code
-     depending on which def is seen first.  */
-  FOR_EACH_VEC_ELT (df_ref, collection_rec->def_vec, i, def)
-    bitmap_set_bit (&defs_generated, DF_REF_REGNO (def));
+  is_sibling_call = SIBLING_CALL_P (insn_info->insn);
+
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    {
+      if (i == STACK_POINTER_REGNUM)
+	/* The stack ptr is used (honorarily) by a CALL insn.  */
+	df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+		       NULL, bb, insn_info, DF_REF_REG_USE,
+		       DF_REF_CALL_STACK_USAGE | flags);
+      else if (global_regs[i])
+	{
+	  /* Calls to const functions cannot access any global registers and
+	     calls to pure functions cannot set them.  All other calls may
+	     reference any of the global registers, so they are recorded as
+	     used. */
+	  if (!RTL_CONST_CALL_P (insn_info->insn))
+	    {
+	      df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			     NULL, bb, insn_info, DF_REF_REG_USE, flags);
+	      if (!RTL_PURE_CALL_P (insn_info->insn))
+		df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			       NULL, bb, insn_info, DF_REF_REG_DEF, flags);
+	    }
+	}
+      else
+	if (TEST_HARD_REG_BIT(regs_invalidated_by_call, i)
+	    /* no clobbers for regs that are the result of the call */
+	    && !TEST_HARD_REG_BIT (defs_generated, i)
+	    && (!is_sibling_call
+		|| !bitmap_bit_p (df->exit_block_uses, i)
+		|| refers_to_regno_p (i, i+1,
+				      crtl->return_rtx, NULL)))
+	  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			 NULL, bb, insn_info, DF_REF_REG_DEF,
+			 DF_REF_MAY_CLOBBER | flags);
+    }
 
   /* Record the registers used to pass arguments, and explicitly
      noted as clobbered.  */
@@ -3345,7 +3434,7 @@ df_get_call_refs (struct df_collection_r
 	  if (REG_P (XEXP (XEXP (note, 0), 0)))
 	    {
 	      unsigned int regno = REGNO (XEXP (XEXP (note, 0), 0));
-	      if (!bitmap_bit_p (&defs_generated, regno))
+	      if (!TEST_HARD_REG_BIT (defs_generated, regno))
 		df_defs_record (collection_rec, XEXP (note, 0), bb,
 				insn_info, flags);
 	    }
@@ -3355,40 +3444,6 @@ df_get_call_refs (struct df_collection_r
 	}
     }
 
-  /* The stack ptr is used (honorarily) by a CALL insn.  */
-  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[STACK_POINTER_REGNUM],
-		 NULL, bb, insn_info, DF_REF_REG_USE,
-		 DF_REF_CALL_STACK_USAGE | flags);
-
-  /* Calls to const functions cannot access any global registers and calls to
-     pure functions cannot set them.  All other calls may reference any of the
-     global registers, so they are recorded as used.  */
-  if (!RTL_CONST_CALL_P (insn_info->insn))
-    for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-      if (global_regs[i])
-	{
-	  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
-			 NULL, bb, insn_info, DF_REF_REG_USE, flags);
-	  if (!RTL_PURE_CALL_P (insn_info->insn))
-	    df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
-			   NULL, bb, insn_info, DF_REF_REG_DEF, flags);
-	}
-
-  is_sibling_call = SIBLING_CALL_P (insn_info->insn);
-  EXECUTE_IF_SET_IN_BITMAP (regs_invalidated_by_call_regset, 0, ui, bi)
-    {
-      if (!global_regs[ui]
-	  && (!bitmap_bit_p (&defs_generated, ui))
-	  && (!is_sibling_call
-	      || !bitmap_bit_p (df->exit_block_uses, ui)
-	      || refers_to_regno_p (ui, ui+1,
-				    crtl->return_rtx, NULL)))
-        df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[ui],
-		       NULL, bb, insn_info, DF_REF_REG_DEF,
-		       DF_REF_MAY_CLOBBER | flags);
-    }
-
-  bitmap_clear (&defs_generated);
   return;
 }
 
@@ -3398,7 +3453,7 @@ df_get_call_refs (struct df_collection_r
    and reg chains. */
 
 static void
-df_insn_refs_collect (struct df_collection_rec* collection_rec,
+df_insn_refs_collect (struct df_collection_rec *collection_rec,
 		      basic_block bb, struct df_insn_info *insn_info)
 {
   rtx note;
@@ -3410,9 +3465,6 @@ df_insn_refs_collect (struct df_collecti
   VEC_truncate (df_ref, collection_rec->eq_use_vec, 0);
   VEC_truncate (df_mw_hardreg_ptr, collection_rec->mw_vec, 0);
 
-  /* Record register defs.  */
-  df_defs_record (collection_rec, PATTERN (insn_info->insn), bb, insn_info, 0);
-
   /* Process REG_EQUIV/REG_EQUAL notes.  */
   for (note = REG_NOTES (insn_info->insn); note;
        note = XEXP (note, 1))
@@ -3444,12 +3496,17 @@ df_insn_refs_collect (struct df_collecti
     }
 
   if (CALL_P (insn_info->insn))
+    /* Record DF_REF_BASE register defs for CALL_INSNs. */
     df_get_call_refs (collection_rec, bb, insn_info,
 		      (is_cond_exec) ? DF_REF_CONDITIONAL : 0);
 
+  /* Record DF_REF_REGULAR defs and uses.  */
+  df_defs_record (collection_rec, PATTERN (insn_info->insn),
+		  bb, insn_info, 0);
+
   /* Record the register uses.  */
-  df_uses_record (collection_rec,
-		  &PATTERN (insn_info->insn), DF_REF_REG_USE, bb, insn_info, 0);
+  df_uses_record (collection_rec, &PATTERN (insn_info->insn),
+		  DF_REF_REG_USE, bb, insn_info, 0);
 
   /* DF_REF_CONDITIONAL needs corresponding USES. */
   if (is_cond_exec)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
  2011-07-29 17:48 ` Paolo Bonzini
@ 2011-07-29 17:57   ` Kenneth Zadeck
  0 siblings, 0 replies; 11+ messages in thread
From: Kenneth Zadeck @ 2011-07-29 17:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Dimitrios Apostolou, gcc-patches, Steven Bosscher, seongbae.park,
	Manolis Marazakis

were these tested on any platform aside from x86?

On 07/29/2011 01:26 PM, Paolo Bonzini wrote:
> On 07/29/2011 07:23 PM, Dimitrios Apostolou wrote:
>>
>> 2011-07-29  Dimitrios Apostolou <jimis@gmx.net>
>>              Paolo Bonzini <bonzini@gnu.org>
>>
>>          (df_def_record_1): Assert a parallel must contain an 
>> EXPR_LIST at
>>      this point.  Receive the LOC and move its extraction...
>>          (df_defs_record): ... here. Rewrote logic with a switch 
>> statement
>>      instead of multiple if-else.
>>      (df_find_hard_reg_defs, df_find_hard_reg_defs_1): New functions
>>      that duplicate the logic of df_defs_record() and df_def_record_1()
>>      but without actually recording any DEFs, only marking them in
>>      the defs HARD_REG_SET.
>>      (df_get_call_refs): Call df_find_hard_reg_defs() to mark DEFs that
>>      are the result of the call. Record DF_REF_BASE DEFs in REGNO
>>      order. Use regs_invalidated_by_call HARD_REG_SET instead of
>>      regs_invalidated_by_call_regset bitmap.
>>      (df_insn_refs_collect): Record DF_REF_REGULAR DEFs after
>>      df_get_call_refs().
>
> Ok for mainline.  I will commit it for you after rebootstrapping (just 
> to be safe).
>
>> P.S. maraz: that's 4.3% improvement in instruction count, should you 
>> start worrying or is it too late?
>
> Now I'm curious!
>
> Paolo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
  2011-07-29 17:48 [DF] [performance] generate DF_REF_BASE REFs in REGNO order Dimitrios Apostolou
  2011-07-29 17:48 ` Paolo Bonzini
@ 2011-07-29 18:01 ` Dimitrios Apostolou
  2011-07-29 19:34   ` Kenneth Zadeck
  1 sibling, 1 reply; 11+ messages in thread
From: Dimitrios Apostolou @ 2011-07-29 18:01 UTC (permalink / raw)
  To: gcc-patches
  Cc: Steven Bosscher, Paolo Bonzini, seongbae.park, Kenneth Zadeck,
	Manolis Marazakis


Completely forgot it: Tested on i386, no regressions.


Dimitrios

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
  2011-07-29 18:01 ` Dimitrios Apostolou
@ 2011-07-29 19:34   ` Kenneth Zadeck
  2011-07-29 19:45     ` Dimitrios Apostolou
  0 siblings, 1 reply; 11+ messages in thread
From: Kenneth Zadeck @ 2011-07-29 19:34 UTC (permalink / raw)
  To: Dimitrios Apostolou
  Cc: gcc-patches, Steven Bosscher, Paolo Bonzini, seongbae.park,
	Manolis Marazakis

i really think that patches of this magnitude having to with the rtl 
level should be tested on more than one platform.

kenny

On 07/29/2011 01:39 PM, Dimitrios Apostolou wrote:
>
> Completely forgot it: Tested on i386, no regressions.
>
>
> Dimitrios

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
  2011-07-29 19:34   ` Kenneth Zadeck
@ 2011-07-29 19:45     ` Dimitrios Apostolou
  2011-07-29 22:46       ` Steven Bosscher
  0 siblings, 1 reply; 11+ messages in thread
From: Dimitrios Apostolou @ 2011-07-29 19:45 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: gcc-patches, Steven Bosscher, Paolo Bonzini, seongbae.park,
	Manolis Marazakis

On Fri, 29 Jul 2011, Kenneth Zadeck wrote:

> i really think that patches of this magnitude having to with the rtl level 
> should be tested on more than one platform.

I'd really appreciate further testing on alternate platforms from whoever 
does it casually, for me it would take too much time to setup my testing 
platform on GCC compile farm, and deadlines are approaching.


Thanks,
Dimitris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
  2011-07-29 19:45     ` Dimitrios Apostolou
@ 2011-07-29 22:46       ` Steven Bosscher
  2011-07-29 23:16         ` Kenneth Zadeck
       [not found]         ` <CABu31nP8MQyHHZ4fQO3vTeovppwezGg6Ey4UFmrYq0KeH5xgww@mail.gmail.com>
  0 siblings, 2 replies; 11+ messages in thread
From: Steven Bosscher @ 2011-07-29 22:46 UTC (permalink / raw)
  To: Dimitrios Apostolou
  Cc: Kenneth Zadeck, gcc-patches, Paolo Bonzini, seongbae.park,
	Manolis Marazakis

On Fri, Jul 29, 2011 at 7:57 PM, Dimitrios Apostolou wrote:
> On Fri, 29 Jul 2011, Kenneth Zadeck wrote:
>
>> i really think that patches of this magnitude having to with the rtl level
>> should be tested on more than one platform.
>
> I'd really appreciate further testing on alternate platforms from whoever
> does it casually, for me it would take too much time to setup my testing
> platform on GCC compile farm, and deadlines are approaching.

"I love deadlines. I love the whooshing sound they make as they go by."
--Douglas Adams

I'll see if I can test the patch on the compile farm this weekend,
just to be sure.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
  2011-07-29 22:46       ` Steven Bosscher
@ 2011-07-29 23:16         ` Kenneth Zadeck
       [not found]         ` <CABu31nP8MQyHHZ4fQO3vTeovppwezGg6Ey4UFmrYq0KeH5xgww@mail.gmail.com>
  1 sibling, 0 replies; 11+ messages in thread
From: Kenneth Zadeck @ 2011-07-29 23:16 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Dimitrios Apostolou, gcc-patches, Paolo Bonzini, seongbae.park,
	Manolis Marazakis

you are the best!!!!

kenny

On 07/29/2011 05:48 PM, Steven Bosscher wrote:
> On Fri, Jul 29, 2011 at 7:57 PM, Dimitrios Apostolou wrote:
>> On Fri, 29 Jul 2011, Kenneth Zadeck wrote:
>>
>>> i really think that patches of this magnitude having to with the rtl level
>>> should be tested on more than one platform.
>> I'd really appreciate further testing on alternate platforms from whoever
>> does it casually, for me it would take too much time to setup my testing
>> platform on GCC compile farm, and deadlines are approaching.
> "I love deadlines. I love the whooshing sound they make as they go by."
> --Douglas Adams
>
> I'll see if I can test the patch on the compile farm this weekend,
> just to be sure.
>
> Ciao!
> Steven

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
       [not found]           ` <CABu31nN=pSqOSfu=fo3dTxdFq0KJR_ZH6itEqQsgecvf4tJ+iA@mail.gmail.com>
@ 2011-08-22 14:12             ` Dimitrios Apostolou
  2011-08-22 15:59               ` Dimitrios Apostolou
  0 siblings, 1 reply; 11+ messages in thread
From: Dimitrios Apostolou @ 2011-08-22 14:12 UTC (permalink / raw)
  To: Steven Bosscher
  Cc: Dimitrios Apostolou, Kenneth Zadeck, gcc-patches, Paolo Bonzini,
	seongbae.park, Manolis Marazakis

Hi Steven,

On Mon, 1 Aug 2011, Steven Bosscher wrote:
> On Sun, Jul 31, 2011 at 11:59 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>> On Fri, Jul 29, 2011 at 11:48 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>>> I'll see if I can test the patch on the compile farm this weekend,
>>> just to be sure.
>>
>> Bootstrap on
>> ia64-unknown-linux-gnu is in stage2 but it is taking forever (on
>> gcc60)...
>
> Just to be clear, it is taking forever without the patch too. I did
> time -O2 non-bootstrap builds but there was no difference worth
> mentioning.

Did the testsuite finish with no regressions on IA64?

I'd test on a sparcstation but unfortunately the machine crashed before
finishing and I can't regain access to it. I'll hopefully test it in some
other platform by next week, I'm curious to actually measure runtime
there.


Thanks for helping!
Dimitris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
  2011-08-22 14:12             ` Dimitrios Apostolou
@ 2011-08-22 15:59               ` Dimitrios Apostolou
  2011-10-08 22:59                 ` Dimitrios Apostolou
  0 siblings, 1 reply; 11+ messages in thread
From: Dimitrios Apostolou @ 2011-08-22 15:59 UTC (permalink / raw)
  To: Dimitrios Apostolou
  Cc: Steven Bosscher, Kenneth Zadeck, gcc-patches, Paolo Bonzini,
	seongbae.park, Manolis Marazakis

[-- Attachment #1: Type: TEXT/PLAIN, Size: 1536 bytes --]

On Mon, 22 Aug 2011, Dimitrios Apostolou wrote:
> Hi Steven,
>
> On Mon, 1 Aug 2011, Steven Bosscher wrote:
>> On Sun, Jul 31, 2011 at 11:59 PM, Steven Bosscher <stevenb.gcc@gmail.com> 
>> wrote:
>>> On Fri, Jul 29, 2011 at 11:48 PM, Steven Bosscher <stevenb.gcc@gmail.com> 
>>> wrote:
>>>> I'll see if I can test the patch on the compile farm this weekend,
>>>> just to be sure.
>>> 
>>> Bootstrap on
>>> ia64-unknown-linux-gnu is in stage2 but it is taking forever (on
>>> gcc60)...
>> 
>> Just to be clear, it is taking forever without the patch too. I did
>> time -O2 non-bootstrap builds but there was no difference worth
>> mentioning.
>
> Did the testsuite finish with no regressions on IA64?
>
> I'd test on a sparcstation but unfortunately the machine crashed before
> finishing and I can't regain access to it. I'll hopefully test it in some
> other platform by next week, I'm curious to actually measure runtime
> there.

For the record I'm posting here the final version of this patch, in case 
it gets applied. It adds minor stylistic fixes, plus a small change in 
alloc_pool sizes. Any further testing I do will be posted under this 
thread.

  The previously posted Changelog applies, with the following addition:

 	(df_scan_alloc): Rounded up allocation pools size, reduced the
 	mw_reg_pool size, it was unnecessarily large.

Paolo, did I assume correctly that the mw_reg_pool is significantly 
smaller than the rest? That was the case on i386, I assumed it would be 
similar in other arch as well.


Thanks,
Dimitris

[-- Attachment #2: Type: TEXT/plain, Size: 11952 bytes --]

=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c	2011-02-02 20:08:06 +0000
+++ gcc/df-scan.c	2011-08-22 15:17:18 +0000
@@ -111,7 +111,7 @@ static void df_ref_record (enum df_ref_c
 			   rtx, rtx *,
 			   basic_block, struct df_insn_info *,
 			   enum df_ref_type, int ref_flags);
-static void df_def_record_1 (struct df_collection_rec *, rtx,
+static void df_def_record_1 (struct df_collection_rec *, rtx *,
 			     basic_block, struct df_insn_info *,
 			     int ref_flags);
 static void df_defs_record (struct df_collection_rec *, rtx,
@@ -318,7 +318,7 @@ df_scan_alloc (bitmap all_blocks ATTRIBU
 {
   struct df_scan_problem_data *problem_data;
   unsigned int insn_num = get_max_uid () + 1;
-  unsigned int block_size = 400;
+  unsigned int block_size = 512;
   basic_block bb;
 
   /* Given the number of pools, this is really faster than tearing
@@ -347,7 +347,7 @@ df_scan_alloc (bitmap all_blocks ATTRIBU
 			 sizeof (struct df_reg_info), block_size);
   problem_data->mw_reg_pool
     = create_alloc_pool ("df_scan mw_reg",
-			 sizeof (struct df_mw_hardreg), block_size);
+			 sizeof (struct df_mw_hardreg), block_size / 16);
 
   bitmap_obstack_initialize (&problem_data->reg_bitmaps);
   bitmap_obstack_initialize (&problem_data->insn_bitmaps);
@@ -2916,40 +2916,27 @@ df_read_modify_subreg_p (rtx x)
 }
 
 
-/* Process all the registers defined in the rtx, X.
+/* Process all the registers defined in the rtx pointed by LOC.
    Autoincrement/decrement definitions will be picked up by
    df_uses_record.  */
 
 static void
 df_def_record_1 (struct df_collection_rec *collection_rec,
-                 rtx x, basic_block bb, struct df_insn_info *insn_info,
+                 rtx *loc, basic_block bb, struct df_insn_info *insn_info,
 		 int flags)
 {
-  rtx *loc;
-  rtx dst;
-
- /* We may recursively call ourselves on EXPR_LIST when dealing with PARALLEL
-     construct.  */
-  if (GET_CODE (x) == EXPR_LIST || GET_CODE (x) == CLOBBER)
-    loc = &XEXP (x, 0);
-  else
-    loc = &SET_DEST (x);
-  dst = *loc;
+  rtx dst = *loc;
 
   /* It is legal to have a set destination be a parallel. */
   if (GET_CODE (dst) == PARALLEL)
     {
       int i;
-
       for (i = XVECLEN (dst, 0) - 1; i >= 0; i--)
 	{
 	  rtx temp = XVECEXP (dst, 0, i);
-	  if (GET_CODE (temp) == EXPR_LIST || GET_CODE (temp) == CLOBBER
-	      || GET_CODE (temp) == SET)
-	    df_def_record_1 (collection_rec,
-                             temp, bb, insn_info,
-			     GET_CODE (temp) == CLOBBER
-			     ? flags | DF_REF_MUST_CLOBBER : flags);
+	  gcc_assert (GET_CODE (temp) == EXPR_LIST);
+	  df_def_record_1 (collection_rec, &XEXP (temp, 0),
+			   bb, insn_info, flags);
 	}
       return;
     }
@@ -3003,26 +2990,98 @@ df_defs_record (struct df_collection_rec
 		int flags)
 {
   RTX_CODE code = GET_CODE (x);
+  int i;
 
-  if (code == SET || code == CLOBBER)
-    {
-      /* Mark the single def within the pattern.  */
-      int clobber_flags = flags;
-      clobber_flags |= (code == CLOBBER) ? DF_REF_MUST_CLOBBER : 0;
-      df_def_record_1 (collection_rec, x, bb, insn_info, clobber_flags);
-    }
-  else if (code == COND_EXEC)
+  switch (code)
     {
+    case SET:
+      df_def_record_1 (collection_rec, &SET_DEST (x), bb, insn_info, flags);
+      break;
+
+    case CLOBBER:
+      flags |= DF_REF_MUST_CLOBBER;
+      df_def_record_1 (collection_rec, &XEXP (x, 0), bb, insn_info, flags);
+      break;
+
+    case COND_EXEC:
       df_defs_record (collection_rec, COND_EXEC_CODE (x),
 		      bb, insn_info, DF_REF_CONDITIONAL);
+      break;
+
+    case PARALLEL:
+      for (i = XVECLEN (x, 0) - 1; i >= 0; i--)
+	df_defs_record (collection_rec, XVECEXP (x, 0, i),
+			bb, insn_info, flags);
+      break;
+    default:
+      /* No DEFs to record in other cases */
+      break;
     }
-  else if (code == PARALLEL)
+}
+
+/* Set the bits in *defs of registers defined in the pattern rtx */
+
+static void
+df_find_hard_reg_defs_1 (rtx *loc, basic_block bb,
+			 int flags, HARD_REG_SET *defs)
+{
+  rtx dst = *loc;
+
+  /* It is legal to have a set destination be a parallel. */
+  if (GET_CODE (dst) == PARALLEL)
     {
       int i;
+      for (i = XVECLEN (dst, 0) - 1; i >= 0; i--)
+	{
+	  rtx temp = XVECEXP (dst, 0, i);
+	  gcc_assert (GET_CODE (temp) == EXPR_LIST);
+	  df_find_hard_reg_defs_1 (&XEXP (temp, 0), bb, flags, defs);
+	}
+      return;
+    }
+
+  if (GET_CODE (dst) == STRICT_LOW_PART)
+      dst = XEXP (dst, 0);
+
+  if (GET_CODE (dst) == ZERO_EXTRACT)
+      dst = XEXP (dst, 0);
 
-      /* Mark the multiple defs within the pattern.  */
+  /* At this point if we do not have a reg or a subreg, just return.  */
+  if (REG_P (dst))
+    SET_HARD_REG_BIT (*defs, REGNO (dst));
+  else if (GET_CODE (dst) == SUBREG && REG_P (SUBREG_REG (dst)))
+    SET_HARD_REG_BIT (*defs, REGNO (SUBREG_REG (dst)));
+}
+
+static void
+df_find_hard_reg_defs (rtx x, basic_block bb, 
+		       int flags, HARD_REG_SET *defs)
+{
+  RTX_CODE code = GET_CODE (x);
+  int i;
+
+  switch (code)
+    {
+    case SET:
+      df_find_hard_reg_defs_1 (&SET_DEST (x), bb, flags, defs);
+      break;
+
+    case CLOBBER:
+      flags |= DF_REF_MUST_CLOBBER;
+      df_find_hard_reg_defs_1 (&XEXP (x, 0), bb, flags, defs);
+      break;
+
+    case COND_EXEC:
+      df_find_hard_reg_defs (COND_EXEC_CODE (x), bb, DF_REF_CONDITIONAL, defs);
+      break;
+
+    case PARALLEL:
       for (i = XVECLEN (x, 0) - 1; i >= 0; i--)
-	df_defs_record (collection_rec, XVECEXP (x, 0, i), bb, insn_info, flags);
+	df_find_hard_reg_defs (XVECEXP (x, 0, i), bb, flags, defs);
+      break;
+    default:
+      /* No DEFs to record in other cases */
+      break;
     }
 }
 
@@ -3308,7 +3367,7 @@ df_get_conditional_uses (struct df_colle
 }
 
 
-/* Get call's extra defs and uses. */
+/* Get call's extra defs and uses (track caller-saved registers). */
 
 static void
 df_get_call_refs (struct df_collection_rec * collection_rec,
@@ -3317,20 +3376,50 @@ df_get_call_refs (struct df_collection_r
                   int flags)
 {
   rtx note;
-  bitmap_iterator bi;
-  unsigned int ui;
   bool is_sibling_call;
   unsigned int i;
-  df_ref def;
-  bitmap_head defs_generated;
+  HARD_REG_SET defs_generated;
 
-  bitmap_initialize (&defs_generated, &df_bitmap_obstack);
+  CLEAR_HARD_REG_SET (defs_generated);
+  df_find_hard_reg_defs (PATTERN (insn_info->insn), bb, 
+			 0, &defs_generated);
 
-  /* Do not generate clobbers for registers that are the result of the
-     call.  This causes ordering problems in the chain building code
-     depending on which def is seen first.  */
-  FOR_EACH_VEC_ELT (df_ref, collection_rec->def_vec, i, def)
-    bitmap_set_bit (&defs_generated, DF_REF_REGNO (def));
+  is_sibling_call = SIBLING_CALL_P (insn_info->insn);
+
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    {
+      if (i == STACK_POINTER_REGNUM)
+	/* The stack ptr is used (honorarily) by a CALL insn.  */
+	df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+		       NULL, bb, insn_info, DF_REF_REG_USE,
+		       DF_REF_CALL_STACK_USAGE | flags);
+      else if (global_regs[i])
+	{
+	  /* Calls to const functions cannot access any global registers and
+	     calls to pure functions cannot set them.  All other calls may
+	     reference any of the global registers, so they are recorded as
+	     used. */
+	  if (!RTL_CONST_CALL_P (insn_info->insn))
+	    {
+	      df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			     NULL, bb, insn_info, DF_REF_REG_USE, flags);
+	      if (!RTL_PURE_CALL_P (insn_info->insn))
+		df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			       NULL, bb, insn_info, DF_REF_REG_DEF, flags);
+	    }
+	}
+      else
+	if (TEST_HARD_REG_BIT (regs_invalidated_by_call, i)
+	    /* no clobbers for regs that are the result of the call */
+	    && !TEST_HARD_REG_BIT (defs_generated, i)
+	    && (!is_sibling_call
+		|| !bitmap_bit_p (df->exit_block_uses, i)
+		|| refers_to_regno_p (i, i+1,
+				      crtl->return_rtx, NULL)))
+	  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			 NULL, bb, insn_info, DF_REF_REG_DEF,
+			 DF_REF_MAY_CLOBBER | flags);
+    }
 
   /* Record the registers used to pass arguments, and explicitly
      noted as clobbered.  */
@@ -3345,7 +3434,7 @@ df_get_call_refs (struct df_collection_r
 	  if (REG_P (XEXP (XEXP (note, 0), 0)))
 	    {
 	      unsigned int regno = REGNO (XEXP (XEXP (note, 0), 0));
-	      if (!bitmap_bit_p (&defs_generated, regno))
+	      if (!TEST_HARD_REG_BIT (defs_generated, regno))
 		df_defs_record (collection_rec, XEXP (note, 0), bb,
 				insn_info, flags);
 	    }
@@ -3355,40 +3444,6 @@ df_get_call_refs (struct df_collection_r
 	}
     }
 
-  /* The stack ptr is used (honorarily) by a CALL insn.  */
-  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[STACK_POINTER_REGNUM],
-		 NULL, bb, insn_info, DF_REF_REG_USE,
-		 DF_REF_CALL_STACK_USAGE | flags);
-
-  /* Calls to const functions cannot access any global registers and calls to
-     pure functions cannot set them.  All other calls may reference any of the
-     global registers, so they are recorded as used.  */
-  if (!RTL_CONST_CALL_P (insn_info->insn))
-    for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-      if (global_regs[i])
-	{
-	  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
-			 NULL, bb, insn_info, DF_REF_REG_USE, flags);
-	  if (!RTL_PURE_CALL_P (insn_info->insn))
-	    df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
-			   NULL, bb, insn_info, DF_REF_REG_DEF, flags);
-	}
-
-  is_sibling_call = SIBLING_CALL_P (insn_info->insn);
-  EXECUTE_IF_SET_IN_BITMAP (regs_invalidated_by_call_regset, 0, ui, bi)
-    {
-      if (!global_regs[ui]
-	  && (!bitmap_bit_p (&defs_generated, ui))
-	  && (!is_sibling_call
-	      || !bitmap_bit_p (df->exit_block_uses, ui)
-	      || refers_to_regno_p (ui, ui+1,
-				    crtl->return_rtx, NULL)))
-        df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[ui],
-		       NULL, bb, insn_info, DF_REF_REG_DEF,
-		       DF_REF_MAY_CLOBBER | flags);
-    }
-
-  bitmap_clear (&defs_generated);
   return;
 }
 
@@ -3398,7 +3453,7 @@ df_get_call_refs (struct df_collection_r
    and reg chains. */
 
 static void
-df_insn_refs_collect (struct df_collection_rec* collection_rec,
+df_insn_refs_collect (struct df_collection_rec *collection_rec,
 		      basic_block bb, struct df_insn_info *insn_info)
 {
   rtx note;
@@ -3410,9 +3465,6 @@ df_insn_refs_collect (struct df_collecti
   VEC_truncate (df_ref, collection_rec->eq_use_vec, 0);
   VEC_truncate (df_mw_hardreg_ptr, collection_rec->mw_vec, 0);
 
-  /* Record register defs.  */
-  df_defs_record (collection_rec, PATTERN (insn_info->insn), bb, insn_info, 0);
-
   /* Process REG_EQUIV/REG_EQUAL notes.  */
   for (note = REG_NOTES (insn_info->insn); note;
        note = XEXP (note, 1))
@@ -3444,12 +3496,17 @@ df_insn_refs_collect (struct df_collecti
     }
 
   if (CALL_P (insn_info->insn))
+    /* Record DF_REF_BASE register defs for CALL_INSNs. */
     df_get_call_refs (collection_rec, bb, insn_info,
 		      (is_cond_exec) ? DF_REF_CONDITIONAL : 0);
 
+  /* Record DF_REF_REGULAR defs and uses.  */
+  df_defs_record (collection_rec, PATTERN (insn_info->insn),
+		  bb, insn_info, 0);
+
   /* Record the register uses.  */
-  df_uses_record (collection_rec,
-		  &PATTERN (insn_info->insn), DF_REF_REG_USE, bb, insn_info, 0);
+  df_uses_record (collection_rec, &PATTERN (insn_info->insn),
+		  DF_REF_REG_USE, bb, insn_info, 0);
 
   /* DF_REF_CONDITIONAL needs corresponding USES. */
   if (is_cond_exec)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [DF] [performance] generate DF_REF_BASE REFs in REGNO order
  2011-08-22 15:59               ` Dimitrios Apostolou
@ 2011-10-08 22:59                 ` Dimitrios Apostolou
  0 siblings, 0 replies; 11+ messages in thread
From: Dimitrios Apostolou @ 2011-10-08 22:59 UTC (permalink / raw)
  To: Dimitrios Apostolou
  Cc: Steven Bosscher, Kenneth Zadeck, gcc-patches, Paolo Bonzini,
	seongbae.park, Jakub Jelinek, Richard Guenther,
	Manolis Marazakis

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2709 bytes --]

Hello all,

I received my GSOC t-shirt yesterday which reminded me I have a promise to 
keep... After realising that it can take forever to find enough free time 
to work on GCC, I decided to work a couple of hours whenever I can and 
post updates to my patches as time permits. Hopefully some of them will 
make it into 4.7?

On Mon, 22 Aug 2011, Dimitrios Apostolou wrote:
>
> For the record I'm posting here the final version of this patch, in case it 
> gets applied. It adds minor stylistic fixes, plus a small change in 
> alloc_pool sizes. Any further testing I do will be posted under this thread.
>
> The previously posted Changelog applies, with the following addition:
>
> 	(df_scan_alloc): Rounded up allocation pools size, reduced the
> 	mw_reg_pool size, it was unnecessarily large.
>
> Paolo, did I assume correctly that the mw_reg_pool is significantly smaller 
> than the rest? That was the case on i386, I assumed it would be similar in 
> other arch as well.
>

The attached patch (df2b.diff, exactly the same as the one in parent 
email) applies successfully to latest gcc snapshot. In addition to 
previous testing (i386,x86_64) I've just finished testing on 
sparc-linux-gnu at the GCC compile farm having no regressions. Finally I 
think Steven's tests on IA64 went ok. Wasn't testing the only thing 
holding this patch?

On sparc runtime of compiling df-scan.c seems to have been reduced from 
34s to 33s user time, for a debug build 
(--enable-checking=assert,misc,runtime,rtl,df). But measurements are too 
flaky since node is busy.



The complete changelog is the following:

2011-07-29  Dimitrios Apostolou  <jimis@gmx.net>
             Paolo Bonzini  <bonzini@gnu.org>

         (df_def_record_1): Assert a parallel must contain an EXPR_LIST at
         this point.  Receive the LOC and move its extraction...
         (df_defs_record): ... here. Rewrote logic with a switch statement
         instead of multiple if-else.
         (df_find_hard_reg_defs, df_find_hard_reg_defs_1): New functions
         that duplicate the logic of df_defs_record() and df_def_record_1()
         but without actually recording any DEFs, only marking them in
         the defs HARD_REG_SET.
         (df_get_call_refs): Call df_find_hard_reg_defs() to mark DEFs that
         are the result of the call. Record DF_REF_BASE DEFs in REGNO
         order. Use regs_invalidated_by_call HARD_REG_SET instead of
         regs_invalidated_by_call_regset bitmap.
         (df_insn_refs_collect): Record DF_REF_REGULAR DEFs after
         df_get_call_refs().
         (df_scan_alloc): Rounded up allocation pools size, reduced the
         mw_reg_pool size, it was unnecessarily large.


Thanks,
Dimitris

[-- Attachment #2: Type: TEXT/PLAIN, Size: 11952 bytes --]

=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c	2011-02-02 20:08:06 +0000
+++ gcc/df-scan.c	2011-08-22 15:17:18 +0000
@@ -111,7 +111,7 @@ static void df_ref_record (enum df_ref_c
 			   rtx, rtx *,
 			   basic_block, struct df_insn_info *,
 			   enum df_ref_type, int ref_flags);
-static void df_def_record_1 (struct df_collection_rec *, rtx,
+static void df_def_record_1 (struct df_collection_rec *, rtx *,
 			     basic_block, struct df_insn_info *,
 			     int ref_flags);
 static void df_defs_record (struct df_collection_rec *, rtx,
@@ -318,7 +318,7 @@ df_scan_alloc (bitmap all_blocks ATTRIBU
 {
   struct df_scan_problem_data *problem_data;
   unsigned int insn_num = get_max_uid () + 1;
-  unsigned int block_size = 400;
+  unsigned int block_size = 512;
   basic_block bb;
 
   /* Given the number of pools, this is really faster than tearing
@@ -347,7 +347,7 @@ df_scan_alloc (bitmap all_blocks ATTRIBU
 			 sizeof (struct df_reg_info), block_size);
   problem_data->mw_reg_pool
     = create_alloc_pool ("df_scan mw_reg",
-			 sizeof (struct df_mw_hardreg), block_size);
+			 sizeof (struct df_mw_hardreg), block_size / 16);
 
   bitmap_obstack_initialize (&problem_data->reg_bitmaps);
   bitmap_obstack_initialize (&problem_data->insn_bitmaps);
@@ -2916,40 +2916,27 @@ df_read_modify_subreg_p (rtx x)
 }
 
 
-/* Process all the registers defined in the rtx, X.
+/* Process all the registers defined in the rtx pointed by LOC.
    Autoincrement/decrement definitions will be picked up by
    df_uses_record.  */
 
 static void
 df_def_record_1 (struct df_collection_rec *collection_rec,
-                 rtx x, basic_block bb, struct df_insn_info *insn_info,
+                 rtx *loc, basic_block bb, struct df_insn_info *insn_info,
 		 int flags)
 {
-  rtx *loc;
-  rtx dst;
-
- /* We may recursively call ourselves on EXPR_LIST when dealing with PARALLEL
-     construct.  */
-  if (GET_CODE (x) == EXPR_LIST || GET_CODE (x) == CLOBBER)
-    loc = &XEXP (x, 0);
-  else
-    loc = &SET_DEST (x);
-  dst = *loc;
+  rtx dst = *loc;
 
   /* It is legal to have a set destination be a parallel. */
   if (GET_CODE (dst) == PARALLEL)
     {
       int i;
-
       for (i = XVECLEN (dst, 0) - 1; i >= 0; i--)
 	{
 	  rtx temp = XVECEXP (dst, 0, i);
-	  if (GET_CODE (temp) == EXPR_LIST || GET_CODE (temp) == CLOBBER
-	      || GET_CODE (temp) == SET)
-	    df_def_record_1 (collection_rec,
-                             temp, bb, insn_info,
-			     GET_CODE (temp) == CLOBBER
-			     ? flags | DF_REF_MUST_CLOBBER : flags);
+	  gcc_assert (GET_CODE (temp) == EXPR_LIST);
+	  df_def_record_1 (collection_rec, &XEXP (temp, 0),
+			   bb, insn_info, flags);
 	}
       return;
     }
@@ -3003,26 +2990,98 @@ df_defs_record (struct df_collection_rec
 		int flags)
 {
   RTX_CODE code = GET_CODE (x);
+  int i;
 
-  if (code == SET || code == CLOBBER)
-    {
-      /* Mark the single def within the pattern.  */
-      int clobber_flags = flags;
-      clobber_flags |= (code == CLOBBER) ? DF_REF_MUST_CLOBBER : 0;
-      df_def_record_1 (collection_rec, x, bb, insn_info, clobber_flags);
-    }
-  else if (code == COND_EXEC)
+  switch (code)
     {
+    case SET:
+      df_def_record_1 (collection_rec, &SET_DEST (x), bb, insn_info, flags);
+      break;
+
+    case CLOBBER:
+      flags |= DF_REF_MUST_CLOBBER;
+      df_def_record_1 (collection_rec, &XEXP (x, 0), bb, insn_info, flags);
+      break;
+
+    case COND_EXEC:
       df_defs_record (collection_rec, COND_EXEC_CODE (x),
 		      bb, insn_info, DF_REF_CONDITIONAL);
+      break;
+
+    case PARALLEL:
+      for (i = XVECLEN (x, 0) - 1; i >= 0; i--)
+	df_defs_record (collection_rec, XVECEXP (x, 0, i),
+			bb, insn_info, flags);
+      break;
+    default:
+      /* No DEFs to record in other cases */
+      break;
     }
-  else if (code == PARALLEL)
+}
+
+/* Set the bits in *defs of registers defined in the pattern rtx */
+
+static void
+df_find_hard_reg_defs_1 (rtx *loc, basic_block bb,
+			 int flags, HARD_REG_SET *defs)
+{
+  rtx dst = *loc;
+
+  /* It is legal to have a set destination be a parallel. */
+  if (GET_CODE (dst) == PARALLEL)
     {
       int i;
+      for (i = XVECLEN (dst, 0) - 1; i >= 0; i--)
+	{
+	  rtx temp = XVECEXP (dst, 0, i);
+	  gcc_assert (GET_CODE (temp) == EXPR_LIST);
+	  df_find_hard_reg_defs_1 (&XEXP (temp, 0), bb, flags, defs);
+	}
+      return;
+    }
+
+  if (GET_CODE (dst) == STRICT_LOW_PART)
+      dst = XEXP (dst, 0);
+
+  if (GET_CODE (dst) == ZERO_EXTRACT)
+      dst = XEXP (dst, 0);
 
-      /* Mark the multiple defs within the pattern.  */
+  /* At this point if we do not have a reg or a subreg, just return.  */
+  if (REG_P (dst))
+    SET_HARD_REG_BIT (*defs, REGNO (dst));
+  else if (GET_CODE (dst) == SUBREG && REG_P (SUBREG_REG (dst)))
+    SET_HARD_REG_BIT (*defs, REGNO (SUBREG_REG (dst)));
+}
+
+static void
+df_find_hard_reg_defs (rtx x, basic_block bb, 
+		       int flags, HARD_REG_SET *defs)
+{
+  RTX_CODE code = GET_CODE (x);
+  int i;
+
+  switch (code)
+    {
+    case SET:
+      df_find_hard_reg_defs_1 (&SET_DEST (x), bb, flags, defs);
+      break;
+
+    case CLOBBER:
+      flags |= DF_REF_MUST_CLOBBER;
+      df_find_hard_reg_defs_1 (&XEXP (x, 0), bb, flags, defs);
+      break;
+
+    case COND_EXEC:
+      df_find_hard_reg_defs (COND_EXEC_CODE (x), bb, DF_REF_CONDITIONAL, defs);
+      break;
+
+    case PARALLEL:
       for (i = XVECLEN (x, 0) - 1; i >= 0; i--)
-	df_defs_record (collection_rec, XVECEXP (x, 0, i), bb, insn_info, flags);
+	df_find_hard_reg_defs (XVECEXP (x, 0, i), bb, flags, defs);
+      break;
+    default:
+      /* No DEFs to record in other cases */
+      break;
     }
 }
 
@@ -3308,7 +3367,7 @@ df_get_conditional_uses (struct df_colle
 }
 
 
-/* Get call's extra defs and uses. */
+/* Get call's extra defs and uses (track caller-saved registers). */
 
 static void
 df_get_call_refs (struct df_collection_rec * collection_rec,
@@ -3317,20 +3376,50 @@ df_get_call_refs (struct df_collection_r
                   int flags)
 {
   rtx note;
-  bitmap_iterator bi;
-  unsigned int ui;
   bool is_sibling_call;
   unsigned int i;
-  df_ref def;
-  bitmap_head defs_generated;
+  HARD_REG_SET defs_generated;
 
-  bitmap_initialize (&defs_generated, &df_bitmap_obstack);
+  CLEAR_HARD_REG_SET (defs_generated);
+  df_find_hard_reg_defs (PATTERN (insn_info->insn), bb, 
+			 0, &defs_generated);
 
-  /* Do not generate clobbers for registers that are the result of the
-     call.  This causes ordering problems in the chain building code
-     depending on which def is seen first.  */
-  FOR_EACH_VEC_ELT (df_ref, collection_rec->def_vec, i, def)
-    bitmap_set_bit (&defs_generated, DF_REF_REGNO (def));
+  is_sibling_call = SIBLING_CALL_P (insn_info->insn);
+
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    {
+      if (i == STACK_POINTER_REGNUM)
+	/* The stack ptr is used (honorarily) by a CALL insn.  */
+	df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+		       NULL, bb, insn_info, DF_REF_REG_USE,
+		       DF_REF_CALL_STACK_USAGE | flags);
+      else if (global_regs[i])
+	{
+	  /* Calls to const functions cannot access any global registers and
+	     calls to pure functions cannot set them.  All other calls may
+	     reference any of the global registers, so they are recorded as
+	     used. */
+	  if (!RTL_CONST_CALL_P (insn_info->insn))
+	    {
+	      df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			     NULL, bb, insn_info, DF_REF_REG_USE, flags);
+	      if (!RTL_PURE_CALL_P (insn_info->insn))
+		df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			       NULL, bb, insn_info, DF_REF_REG_DEF, flags);
+	    }
+	}
+      else
+	if (TEST_HARD_REG_BIT (regs_invalidated_by_call, i)
+	    /* no clobbers for regs that are the result of the call */
+	    && !TEST_HARD_REG_BIT (defs_generated, i)
+	    && (!is_sibling_call
+		|| !bitmap_bit_p (df->exit_block_uses, i)
+		|| refers_to_regno_p (i, i+1,
+				      crtl->return_rtx, NULL)))
+	  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			 NULL, bb, insn_info, DF_REF_REG_DEF,
+			 DF_REF_MAY_CLOBBER | flags);
+    }
 
   /* Record the registers used to pass arguments, and explicitly
      noted as clobbered.  */
@@ -3345,7 +3434,7 @@ df_get_call_refs (struct df_collection_r
 	  if (REG_P (XEXP (XEXP (note, 0), 0)))
 	    {
 	      unsigned int regno = REGNO (XEXP (XEXP (note, 0), 0));
-	      if (!bitmap_bit_p (&defs_generated, regno))
+	      if (!TEST_HARD_REG_BIT (defs_generated, regno))
 		df_defs_record (collection_rec, XEXP (note, 0), bb,
 				insn_info, flags);
 	    }
@@ -3355,40 +3444,6 @@ df_get_call_refs (struct df_collection_r
 	}
     }
 
-  /* The stack ptr is used (honorarily) by a CALL insn.  */
-  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[STACK_POINTER_REGNUM],
-		 NULL, bb, insn_info, DF_REF_REG_USE,
-		 DF_REF_CALL_STACK_USAGE | flags);
-
-  /* Calls to const functions cannot access any global registers and calls to
-     pure functions cannot set them.  All other calls may reference any of the
-     global registers, so they are recorded as used.  */
-  if (!RTL_CONST_CALL_P (insn_info->insn))
-    for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-      if (global_regs[i])
-	{
-	  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
-			 NULL, bb, insn_info, DF_REF_REG_USE, flags);
-	  if (!RTL_PURE_CALL_P (insn_info->insn))
-	    df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
-			   NULL, bb, insn_info, DF_REF_REG_DEF, flags);
-	}
-
-  is_sibling_call = SIBLING_CALL_P (insn_info->insn);
-  EXECUTE_IF_SET_IN_BITMAP (regs_invalidated_by_call_regset, 0, ui, bi)
-    {
-      if (!global_regs[ui]
-	  && (!bitmap_bit_p (&defs_generated, ui))
-	  && (!is_sibling_call
-	      || !bitmap_bit_p (df->exit_block_uses, ui)
-	      || refers_to_regno_p (ui, ui+1,
-				    crtl->return_rtx, NULL)))
-        df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[ui],
-		       NULL, bb, insn_info, DF_REF_REG_DEF,
-		       DF_REF_MAY_CLOBBER | flags);
-    }
-
-  bitmap_clear (&defs_generated);
   return;
 }
 
@@ -3398,7 +3453,7 @@ df_get_call_refs (struct df_collection_r
    and reg chains. */
 
 static void
-df_insn_refs_collect (struct df_collection_rec* collection_rec,
+df_insn_refs_collect (struct df_collection_rec *collection_rec,
 		      basic_block bb, struct df_insn_info *insn_info)
 {
   rtx note;
@@ -3410,9 +3465,6 @@ df_insn_refs_collect (struct df_collecti
   VEC_truncate (df_ref, collection_rec->eq_use_vec, 0);
   VEC_truncate (df_mw_hardreg_ptr, collection_rec->mw_vec, 0);
 
-  /* Record register defs.  */
-  df_defs_record (collection_rec, PATTERN (insn_info->insn), bb, insn_info, 0);
-
   /* Process REG_EQUIV/REG_EQUAL notes.  */
   for (note = REG_NOTES (insn_info->insn); note;
        note = XEXP (note, 1))
@@ -3444,12 +3496,17 @@ df_insn_refs_collect (struct df_collecti
     }
 
   if (CALL_P (insn_info->insn))
+    /* Record DF_REF_BASE register defs for CALL_INSNs. */
     df_get_call_refs (collection_rec, bb, insn_info,
 		      (is_cond_exec) ? DF_REF_CONDITIONAL : 0);
 
+  /* Record DF_REF_REGULAR defs and uses.  */
+  df_defs_record (collection_rec, PATTERN (insn_info->insn),
+		  bb, insn_info, 0);
+
   /* Record the register uses.  */
-  df_uses_record (collection_rec,
-		  &PATTERN (insn_info->insn), DF_REF_REG_USE, bb, insn_info, 0);
+  df_uses_record (collection_rec, &PATTERN (insn_info->insn),
+		  DF_REF_REG_USE, bb, insn_info, 0);
 
   /* DF_REF_CONDITIONAL needs corresponding USES. */
   if (is_cond_exec)


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2011-10-08 21:58 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-29 17:48 [DF] [performance] generate DF_REF_BASE REFs in REGNO order Dimitrios Apostolou
2011-07-29 17:48 ` Paolo Bonzini
2011-07-29 17:57   ` Kenneth Zadeck
2011-07-29 18:01 ` Dimitrios Apostolou
2011-07-29 19:34   ` Kenneth Zadeck
2011-07-29 19:45     ` Dimitrios Apostolou
2011-07-29 22:46       ` Steven Bosscher
2011-07-29 23:16         ` Kenneth Zadeck
     [not found]         ` <CABu31nP8MQyHHZ4fQO3vTeovppwezGg6Ey4UFmrYq0KeH5xgww@mail.gmail.com>
     [not found]           ` <CABu31nN=pSqOSfu=fo3dTxdFq0KJR_ZH6itEqQsgecvf4tJ+iA@mail.gmail.com>
2011-08-22 14:12             ` Dimitrios Apostolou
2011-08-22 15:59               ` Dimitrios Apostolou
2011-10-08 22:59                 ` Dimitrios Apostolou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).