public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
@ 2011-07-08  3:33 Dimitrios Apostolou
  2011-07-08  3:52 ` Dimitrios Apostolou
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08  3:33 UTC (permalink / raw)
  To: gcc-patches; +Cc: Paolo Bonzini, Steven Bosscher

[-- Attachment #1: Type: TEXT/PLAIN, Size: 786 bytes --]

Hello list,

The attached patch does two things for df_get_call_refs():
* First it uses HARD_REG_SETs for defs_generated and 
regs_invalidated_by_call, instead of bitmaps. Replacing in total more 
than 400K calls (for my testcase) to bitmap_bit_p() with the much faster 
TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to 
1.5M.
* Second it produces the REFs in REGNO order, which is important to keep 
the collection_rec sorted most times, and avoid expensive calls to 
qsort(). Thanks to Paolo Bonzini for idea and mentoring.

The second part makes a big difference if accompanied with another patch 
in df_insn_refs_collect(). I'll post a followup patch, that is 
unfortunately unstable for some of my tests, so I'd appreciate any 
comments.


Thanks,
Dimitris

[-- Attachment #2: Type: TEXT/plain, Size: 4342 bytes --]

=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c	2011-02-02 20:08:06 +0000
+++ gcc/df-scan.c	2011-07-08 01:28:55 +0000
@@ -3317,20 +3317,56 @@
                   int flags)
 {
   rtx note;
-  bitmap_iterator bi;
-  unsigned int ui;
   bool is_sibling_call;
   unsigned int i;
   df_ref def;
-  bitmap_head defs_generated;
+  HARD_REG_SET defs_generated;
 
-  bitmap_initialize (&defs_generated, &df_bitmap_obstack);
+  CLEAR_HARD_REG_SET(defs_generated);
 
   /* Do not generate clobbers for registers that are the result of the
      call.  This causes ordering problems in the chain building code
      depending on which def is seen first.  */
   FOR_EACH_VEC_ELT (df_ref, collection_rec->def_vec, i, def)
-    bitmap_set_bit (&defs_generated, DF_REF_REGNO (def));
+    SET_HARD_REG_BIT (defs_generated, DF_REF_REGNO (def));
+
+  is_sibling_call = SIBLING_CALL_P (insn_info->insn);
+
+  for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+    {
+      if (i == STACK_POINTER_REGNUM)
+	/* The stack ptr is used (honorarily) by a CALL insn.  */
+	df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+		       NULL, bb, insn_info, DF_REF_REG_USE,
+		       DF_REF_CALL_STACK_USAGE | flags);
+      else if (global_regs[i])
+	{
+	  /* Calls to const functions cannot access any global registers and
+	     calls to pure functions cannot set them.  All other calls may
+	     reference any of the global registers, so they are recorded as
+	     used. */
+	  if (!RTL_CONST_CALL_P (insn_info->insn))
+	    {
+	      df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			     NULL, bb, insn_info, DF_REF_REG_USE, flags);
+	      if (!RTL_PURE_CALL_P (insn_info->insn))
+		df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			       NULL, bb, insn_info, DF_REF_REG_DEF, flags);
+	    }
+	}
+      /* TODO HARD_REG_SET set intersection! */
+      else			/* !global_regs[i] */
+	/* track Caller-Saved registers */
+	if (TEST_HARD_REG_BIT(regs_invalidated_by_call, i)
+	    && !TEST_HARD_REG_BIT (defs_generated, i)
+	    && (!is_sibling_call
+		|| !bitmap_bit_p (df->exit_block_uses, i)
+		|| refers_to_regno_p (i, i+1,
+				      crtl->return_rtx, NULL)))
+	  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+			 NULL, bb, insn_info, DF_REF_REG_DEF,
+			 DF_REF_MAY_CLOBBER | flags);
+    }
 
   /* Record the registers used to pass arguments, and explicitly
      noted as clobbered.  */
@@ -3345,7 +3381,7 @@
 	  if (REG_P (XEXP (XEXP (note, 0), 0)))
 	    {
 	      unsigned int regno = REGNO (XEXP (XEXP (note, 0), 0));
-	      if (!bitmap_bit_p (&defs_generated, regno))
+	      if (!TEST_HARD_REG_BIT (defs_generated, regno))
 		df_defs_record (collection_rec, XEXP (note, 0), bb,
 				insn_info, flags);
 	    }
@@ -3355,40 +3391,6 @@
 	}
     }
 
-  /* The stack ptr is used (honorarily) by a CALL insn.  */
-  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[STACK_POINTER_REGNUM],
-		 NULL, bb, insn_info, DF_REF_REG_USE,
-		 DF_REF_CALL_STACK_USAGE | flags);
-
-  /* Calls to const functions cannot access any global registers and calls to
-     pure functions cannot set them.  All other calls may reference any of the
-     global registers, so they are recorded as used.  */
-  if (!RTL_CONST_CALL_P (insn_info->insn))
-    for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
-      if (global_regs[i])
-	{
-	  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
-			 NULL, bb, insn_info, DF_REF_REG_USE, flags);
-	  if (!RTL_PURE_CALL_P (insn_info->insn))
-	    df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
-			   NULL, bb, insn_info, DF_REF_REG_DEF, flags);
-	}
-
-  is_sibling_call = SIBLING_CALL_P (insn_info->insn);
-  EXECUTE_IF_SET_IN_BITMAP (regs_invalidated_by_call_regset, 0, ui, bi)
-    {
-      if (!global_regs[ui]
-	  && (!bitmap_bit_p (&defs_generated, ui))
-	  && (!is_sibling_call
-	      || !bitmap_bit_p (df->exit_block_uses, ui)
-	      || refers_to_regno_p (ui, ui+1,
-				    crtl->return_rtx, NULL)))
-        df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[ui],
-		       NULL, bb, insn_info, DF_REF_REG_DEF,
-		       DF_REF_MAY_CLOBBER | flags);
-    }
-
-  bitmap_clear (&defs_generated);
   return;
 }
 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
@ 2011-07-08  3:52 ` Dimitrios Apostolou
  2011-07-08  6:36 ` Dimitrios Apostolou
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08  3:52 UTC (permalink / raw)
  To: gcc-patches; +Cc: Paolo Bonzini, Steven Bosscher

To document the gains from the bitmaps, here is (part of) the annotated 
source from callgrind profiler, showing instruction count. Before:


  1,154,400      if (bitmap_bit_p(regs_invalidated_by_call_regset, i)
  8,080,800  => bitmap.c:bitmap_bit_p (192400x)
  1,021,200          && !bitmap_bit_p (&defs_generated, i)
  5,106,000  => bitmap.c:bitmap_bit_p (170200x)
    340,400          && (!is_sibling_call
          .              || !bitmap_bit_p (df->exit_block_uses, i)
          .              || refers_to_regno_p (i, i+1,
          .                                    crtl->return_rtx, NULL)))
  2,053,500        df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i
35,279,934  => df-scan.c:df_ref_record (170200x)
          .                       NULL, bb, insn_info, DF_REF_REG_DEF,
          .                       DF_REF_MAY_CLOBBER | flags);
          .      }


After:


  1,346,800      if (TEST_HARD_REG_BIT(regs_invalidated_by_call, i)
    510,600          && !TEST_HARD_REG_BIT (defs_generated, i)
    340,400          && (!is_sibling_call
          .              || !bitmap_bit_p (df->exit_block_uses, i)
          .              || refers_to_regno_p (i, i+1,
          .                                    crtl->return_rtx, NULL)))
  2,057,200        df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i
35,279,934  => df-scan.c:df_ref_record (170200x)
          .                       NULL, bb, insn_info, DF_REF_REG_DEF,
          .                       DF_REF_MAY_CLOBBER | flags);
          .      }



Dimitris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
  2011-07-08  3:52 ` Dimitrios Apostolou
@ 2011-07-08  6:36 ` Dimitrios Apostolou
  2011-07-08  7:12   ` Paolo Bonzini
  2011-07-08  6:37 ` Steven Bosscher
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08  6:36 UTC (permalink / raw)
  To: gcc-patches; +Cc: Paolo Bonzini, Steven Bosscher

[-- Attachment #1: Type: TEXT/PLAIN, Size: 441 bytes --]

And here is the patch that breaks things. By moving df_defs_record() 
*after* df_get_call_refs() most times collection_rec remains sorted, and 
about 50M instructions are avoided in qsort() 
calls of df_canonize_collection_rec().

Unfortunately this does not work. Sometimes cc1 crashes, for example 
because regstack is empty in subst_stack_regs_in_debug_insn().

Any ideas for ensuring proper ordering of collection_rec?


Thanks,
Dimitris

[-- Attachment #2: Type: TEXT/plain, Size: 2585 bytes --]

=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c	2011-07-08 01:28:55 +0000
+++ gcc/df-scan.c	2011-07-08 03:38:38 +0000
@@ -3412,8 +3412,9 @@
   VEC_truncate (df_ref, collection_rec->eq_use_vec, 0);
   VEC_truncate (df_mw_hardreg_ptr, collection_rec->mw_vec, 0);
 
-  /* Record register defs.  */
-  df_defs_record (collection_rec, PATTERN (insn_info->insn), bb, insn_info, 0);
+  if (CALL_P (insn_info->insn))
+    df_get_call_refs (collection_rec, bb, insn_info,
+		      (is_cond_exec) ? DF_REF_CONDITIONAL : 0);
 
   /* Process REG_EQUIV/REG_EQUAL notes.  */
   for (note = REG_NOTES (insn_info->insn); note;
@@ -3421,33 +3422,33 @@
     {
       switch (REG_NOTE_KIND (note))
         {
+	  /* first write DF_REF_BASE */
+        case REG_NON_LOCAL_GOTO:
+          /* The frame ptr is used by a non-local goto.  */
+          df_ref_record (DF_REF_BASE, collection_rec,
+                         regno_reg_rtx[FRAME_POINTER_REGNUM],
+                         NULL, bb, insn_info,
+                         DF_REF_REG_USE, 0);
+#if !HARD_FRAME_POINTER_IS_FRAME_POINTER
+          df_ref_record (DF_REF_BASE, collection_rec,
+                         regno_reg_rtx[HARD_FRAME_POINTER_REGNUM],
+                         NULL, bb, insn_info,
+                         DF_REF_REG_USE, 0);
+#endif
+          break;
         case REG_EQUIV:
         case REG_EQUAL:
           df_uses_record (collection_rec,
                           &XEXP (note, 0), DF_REF_REG_USE,
                           bb, insn_info, DF_REF_IN_NOTE);
           break;
-        case REG_NON_LOCAL_GOTO:
-          /* The frame ptr is used by a non-local goto.  */
-          df_ref_record (DF_REF_BASE, collection_rec,
-                         regno_reg_rtx[FRAME_POINTER_REGNUM],
-                         NULL, bb, insn_info,
-                         DF_REF_REG_USE, 0);
-#if !HARD_FRAME_POINTER_IS_FRAME_POINTER
-          df_ref_record (DF_REF_BASE, collection_rec,
-                         regno_reg_rtx[HARD_FRAME_POINTER_REGNUM],
-                         NULL, bb, insn_info,
-                         DF_REF_REG_USE, 0);
-#endif
-          break;
         default:
           break;
         }
     }
 
-  if (CALL_P (insn_info->insn))
-    df_get_call_refs (collection_rec, bb, insn_info,
-		      (is_cond_exec) ? DF_REF_CONDITIONAL : 0);
+  /* Record register defs.  */
+  df_defs_record (collection_rec, PATTERN (insn_info->insn), bb, insn_info, 0);
 
   /* Record the register uses.  */
   df_uses_record (collection_rec,


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
  2011-07-08  3:52 ` Dimitrios Apostolou
  2011-07-08  6:36 ` Dimitrios Apostolou
@ 2011-07-08  6:37 ` Steven Bosscher
  2011-07-08  7:04   ` Dimitrios Apostolou
  2011-07-08  6:59 ` Jakub Jelinek
  2011-07-08  8:42 ` Richard Guenther
  4 siblings, 1 reply; 13+ messages in thread
From: Steven Bosscher @ 2011-07-08  6:37 UTC (permalink / raw)
  To: Dimitrios Apostolou; +Cc: gcc-patches, Paolo Bonzini

On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
> The attached patch does two things for df_get_call_refs():

How did you test this patch?

Normally, a patch submission comes with text like, "Bootstrapped &
tested on ..., no regressions.". Also, you chould write a ChangeLog
entry, best included in your mail somewhere at the end ;-)

Ciao!
Steven

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
                   ` (2 preceding siblings ...)
  2011-07-08  6:37 ` Steven Bosscher
@ 2011-07-08  6:59 ` Jakub Jelinek
  2011-07-08  9:09   ` Dimitrios Apostolou
  2011-07-08  8:42 ` Richard Guenther
  4 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2011-07-08  6:59 UTC (permalink / raw)
  To: Dimitrios Apostolou; +Cc: gcc-patches, Paolo Bonzini, Steven Bosscher

On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
> The attached patch does two things for df_get_call_refs():
> * First it uses HARD_REG_SETs for defs_generated and
> regs_invalidated_by_call, instead of bitmaps. Replacing in total
> more than 400K calls (for my testcase) to bitmap_bit_p() with the
> much faster TEST_HARD_REG_BIT, reduces the total instruction count
> from about 13M to 1.5M.

Have you verified that collection_rec->def_vec never contains pseudo
register references?  Otherwise you couldn't use
HARD_REG_SET... gcc_checking_assert might be useful.

Also, a nit, space should be added before (.
  CLEAR_HARD_REG_SET(defs_generated);

	Jakub

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  6:37 ` Steven Bosscher
@ 2011-07-08  7:04   ` Dimitrios Apostolou
  0 siblings, 0 replies; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08  7:04 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc-patches, Paolo Bonzini

On Fri, 8 Jul 2011, Steven Bosscher wrote:
> On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
>> The attached patch does two things for df_get_call_refs():
>
> How did you test this patch?
>
> Normally, a patch submission comes with text like, "Bootstrapped &
> tested on ..., no regressions.". Also, you chould write a ChangeLog
> entry, best included in your mail somewhere at the end ;-)

Hi Steven, thanks for the instructions. I've not run the mandatory tests 
you have told me about, only done some minor testing due to lack of time. 
I'm not yet posting patches for inclusion, but more as an RFC. Should such 
patches be sent to gcc instead of gcc-patches?


Thanks,
Dimitris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  6:36 ` Dimitrios Apostolou
@ 2011-07-08  7:12   ` Paolo Bonzini
  2011-07-09  1:29     ` Dimitrios Apostolou
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2011-07-08  7:12 UTC (permalink / raw)
  To: Dimitrios Apostolou; +Cc: gcc-patches, Steven Bosscher

On 07/08/2011 05:51 AM, Dimitrios Apostolou wrote:
> +	  /* first write DF_REF_BASE */

This is not necessary.  These uses are written to use_vec, while the 
uses from REG_EQUIV and REG_EQUAL are written to eq_use_vec (see 
df_ref_create_structure).

Also, anyway this wouldn't work because you would have to split the loop 
in two.  I'll attribute that to the time of day when you were writing 
the message. :)

> +        case REG_NON_LOCAL_GOTO:
> +          /* The frame ptr is used by a non-local goto.  */
> +          df_ref_record (DF_REF_BASE, collection_rec,
> +                         regno_reg_rtx[FRAME_POINTER_REGNUM],
> +                         NULL, bb, insn_info,
> +                         DF_REF_REG_USE, 0);
> +#if !HARD_FRAME_POINTER_IS_FRAME_POINTER
> +          df_ref_record (DF_REF_BASE, collection_rec,
> +                         regno_reg_rtx[HARD_FRAME_POINTER_REGNUM],
> +                         NULL, bb, insn_info,
> +                         DF_REF_REG_USE, 0);
> +#endif
> +          break;

Also note that you have to check which of FRAME_POINTER_REGNUM and 
HARD_FRAME_POINTER_REGNUM comes first here, if you want to ensure the 
DF_REF_BASE refs are created sorted.  But it's likely better to _not_ 
create them sorted and just replace qsort with an insertion sort, as 
discussed offlist.  It will cost a single swap in a pretty rare case.

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
                   ` (3 preceding siblings ...)
  2011-07-08  6:59 ` Jakub Jelinek
@ 2011-07-08  8:42 ` Richard Guenther
  2011-07-08 10:20   ` Dimitrios Apostolou
  4 siblings, 1 reply; 13+ messages in thread
From: Richard Guenther @ 2011-07-08  8:42 UTC (permalink / raw)
  To: Dimitrios Apostolou; +Cc: gcc-patches, Paolo Bonzini, Steven Bosscher

On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
> Hello list,
>
> The attached patch does two things for df_get_call_refs():
> * First it uses HARD_REG_SETs for defs_generated and
> regs_invalidated_by_call, instead of bitmaps. Replacing in total more than
> 400K calls (for my testcase) to bitmap_bit_p() with the much faster
> TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to
> 1.5M.
> * Second it produces the REFs in REGNO order, which is important to keep the
> collection_rec sorted most times, and avoid expensive calls to qsort().
> Thanks to Paolo Bonzini for idea and mentoring.
>
> The second part makes a big difference if accompanied with another patch in
> df_insn_refs_collect(). I'll post a followup patch, that is unfortunately
> unstable for some of my tests, so I'd appreciate any comments.

Did you check the impact on memory usage?  I suppose on targets
with not many hard registers it should even improve, but do we expect
memory usage to be worse in any case?

Thanks,
Richard.

>
> Thanks,
> Dimitris
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  6:59 ` Jakub Jelinek
@ 2011-07-08  9:09   ` Dimitrios Apostolou
  2011-07-08 14:22     ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08  9:09 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: gcc-patches, Paolo Bonzini, Steven Bosscher

On Fri, 8 Jul 2011, Jakub Jelinek wrote:
> On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
>> The attached patch does two things for df_get_call_refs():
>> * First it uses HARD_REG_SETs for defs_generated and
>> regs_invalidated_by_call, instead of bitmaps. Replacing in total
>> more than 400K calls (for my testcase) to bitmap_bit_p() with the
>> much faster TEST_HARD_REG_BIT, reduces the total instruction count
>> from about 13M to 1.5M.
>
> Have you verified that collection_rec->def_vec never contains pseudo
> register references?  Otherwise you couldn't use
> HARD_REG_SET... gcc_checking_assert might be useful.
>

Hi Jakub, Steve pointed me to the following from GCC Internals Manual:

call_insn insns have the same extra fields as insn insns, accessed in the 
same way and in addition contain a field CALL_INSN_FUNCTION_USAGE, which 
contains a list (chain of expr_list expressions) containing use and 
clobber expressions that denote hard registers and MEMs used or clobbered 
by the called function.


So doesn't that mean that for CALL insns it should contain only HARD_REG 
DEFs? I will ofcourse use an assert to be sure.


Thanks,
Dimitris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  8:42 ` Richard Guenther
@ 2011-07-08 10:20   ` Dimitrios Apostolou
  0 siblings, 0 replies; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08 10:20 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, Paolo Bonzini, Steven Bosscher

On Fri, 8 Jul 2011, Richard Guenther wrote:
> On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
>> Hello list,
>>
>> The attached patch does two things for df_get_call_refs():
>> * First it uses HARD_REG_SETs for defs_generated and
>> regs_invalidated_by_call, instead of bitmaps. Replacing in total more than
>> 400K calls (for my testcase) to bitmap_bit_p() with the much faster
>> TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to
>> 1.5M.
>> * Second it produces the REFs in REGNO order, which is important to keep the
>> collection_rec sorted most times, and avoid expensive calls to qsort().
>> Thanks to Paolo Bonzini for idea and mentoring.
>>
>> The second part makes a big difference if accompanied with another patch in
>> df_insn_refs_collect(). I'll post a followup patch, that is unfortunately
>> unstable for some of my tests, so I'd appreciate any comments.
>
> Did you check the impact on memory usage?  I suppose on targets
> with not many hard registers it should even improve, but do we expect
> memory usage to be worse in any case?

Hi Richard, I didn't check memory usage, is that important? Since the 
struct bitmap is fairly bulky, it should take an arch with lots of hard 
regs (which one has the most?).

But still a few bytes tradeoff wouldn't be acceptable for a much faster 
type? And IMHO it makes the code better to understand, since once you see 
HARD_REG_SET you know you can't expect else. FWIW I'm now in the process 
of converting all other bitmap uses for hard regs, to HARD_REG_SETs, at 
least within DF. I'm not sure whether performance gains will be visible, 
however, not much code is as hot as df_get_call_refs().


Thanks,
Dimitris

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  9:09   ` Dimitrios Apostolou
@ 2011-07-08 14:22     ` Paolo Bonzini
  2011-07-08 14:25       ` Paolo Bonzini
  0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2011-07-08 14:22 UTC (permalink / raw)
  To: Dimitrios Apostolou; +Cc: Jakub Jelinek, gcc-patches, Steven Bosscher

On 07/08/2011 11:05 AM, Dimitrios Apostolou wrote:
> On Fri, 8 Jul 2011, Jakub Jelinek wrote:
>> On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
>>> The attached patch does two things for df_get_call_refs():
>>> * First it uses HARD_REG_SETs for defs_generated and
>>> regs_invalidated_by_call, instead of bitmaps. Replacing in total
>>> more than 400K calls (for my testcase) to bitmap_bit_p() with the
>>> much faster TEST_HARD_REG_BIT, reduces the total instruction count
>>> from about 13M to 1.5M.
>>
>> Have you verified that collection_rec->def_vec never contains pseudo
>> register references? Otherwise you couldn't use
>> HARD_REG_SET... gcc_checking_assert might be useful.
>>
>
> Hi Jakub, Steve pointed me to the following from GCC Internals Manual:
>
> call_insn insns have the same extra fields as insn insns, accessed in
> the same way and in addition contain a field CALL_INSN_FUNCTION_USAGE,
> which contains a list (chain of expr_list expressions) containing use
> and clobber expressions that denote hard registers and MEMs used or
> clobbered by the called function.
>
> So doesn't that mean that for CALL insns it should contain only HARD_REG
> DEFs? I will ofcourse use an assert to be sure.

That part is only for CALL_INSN_FUNCTION_USAGE, which is what 
df_get_call_refs handles.  However, if you rewrite the handling of 
defs_generated as required by your second patch, you'll then be sure 
that you will only have hard registers.

BTW, what testcase are you using?  I suggest that you try building 
stage1 with CFLAGS=--save-temps, and get some of the largest 
preprocessed .i files from there (combine and fold-const for example). 
You can then time them very easily from the old and new build 
directories, with "./cc1 /path/to/file.i -O2".

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08 14:22     ` Paolo Bonzini
@ 2011-07-08 14:25       ` Paolo Bonzini
  0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2011-07-08 14:25 UTC (permalink / raw)
  To: gcc-patches; +Cc: Jakub Jelinek, gcc-patches, Steven Bosscher

On 07/08/2011 11:05 AM, Dimitrios Apostolou wrote:
> On Fri, 8 Jul 2011, Jakub Jelinek wrote:
>> On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
>>> The attached patch does two things for df_get_call_refs():
>>> * First it uses HARD_REG_SETs for defs_generated and
>>> regs_invalidated_by_call, instead of bitmaps. Replacing in total
>>> more than 400K calls (for my testcase) to bitmap_bit_p() with the
>>> much faster TEST_HARD_REG_BIT, reduces the total instruction count
>>> from about 13M to 1.5M.
>>
>> Have you verified that collection_rec->def_vec never contains pseudo
>> register references? Otherwise you couldn't use
>> HARD_REG_SET... gcc_checking_assert might be useful.
>>
>
> Hi Jakub, Steve pointed me to the following from GCC Internals Manual:
>
> call_insn insns have the same extra fields as insn insns, accessed in
> the same way and in addition contain a field CALL_INSN_FUNCTION_USAGE,
> which contains a list (chain of expr_list expressions) containing use
> and clobber expressions that denote hard registers and MEMs used or
> clobbered by the called function.
>
> So doesn't that mean that for CALL insns it should contain only HARD_REG
> DEFs? I will ofcourse use an assert to be sure.

That part is only for CALL_INSN_FUNCTION_USAGE, which is what 
df_get_call_refs handles.  However, if you rewrite the handling of 
defs_generated as required by your second patch, you'll then be sure 
that you will only have hard registers.

BTW, what testcase are you using?  I suggest that you try building 
stage1 with CFLAGS=--save-temps, and get some of the largest 
preprocessed .i files from there (combine and fold-const for example). 
You can then time them very easily from the old and new build 
directories, with "./cc1 /path/to/file.i -O2".

Paolo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
  2011-07-08  7:12   ` Paolo Bonzini
@ 2011-07-09  1:29     ` Dimitrios Apostolou
  0 siblings, 0 replies; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-09  1:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc-patches, Steven Bosscher

On Fri, 8 Jul 2011, Paolo Bonzini wrote:
> On 07/08/2011 05:51 AM, Dimitrios Apostolou wrote:
>> +	  /* first write DF_REF_BASE */
>
> This is not necessary.  These uses are written to use_vec, while the uses 
> from REG_EQUIV and REG_EQUAL are written to eq_use_vec (see 
> df_ref_create_structure).

Thanks for pointing this out, I missed it, it's complex to follow 
what is written where. Perhaps there is meaning in changing the interface 
of df_ref_create_structure() to accept the particular vector.

>
> Also, anyway this wouldn't work because you would have to split the loop in 
> two.  I'll attribute that to the time of day when you were writing the 
> message. :)

Even though I'm having my own disputes with Morpheus, sleep deprivation 
wasn't tha culprit for this. :-) My intention was to keep DF_REF_BASEs as 
close to each other, because a close-to-be-sorted array will be sorted 
faster with the sorting function I'll write... :-)

But anyway as you've explained it's irrelevant since they are going 
to different vectors.


Thanks,
Dimitris

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-07-09  1:03 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-08  3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
2011-07-08  3:52 ` Dimitrios Apostolou
2011-07-08  6:36 ` Dimitrios Apostolou
2011-07-08  7:12   ` Paolo Bonzini
2011-07-09  1:29     ` Dimitrios Apostolou
2011-07-08  6:37 ` Steven Bosscher
2011-07-08  7:04   ` Dimitrios Apostolou
2011-07-08  6:59 ` Jakub Jelinek
2011-07-08  9:09   ` Dimitrios Apostolou
2011-07-08 14:22     ` Paolo Bonzini
2011-07-08 14:25       ` Paolo Bonzini
2011-07-08  8:42 ` Richard Guenther
2011-07-08 10:20   ` Dimitrios Apostolou

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).