* [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
@ 2011-07-08 3:33 Dimitrios Apostolou
2011-07-08 3:52 ` Dimitrios Apostolou
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08 3:33 UTC (permalink / raw)
To: gcc-patches; +Cc: Paolo Bonzini, Steven Bosscher
[-- Attachment #1: Type: TEXT/PLAIN, Size: 786 bytes --]
Hello list,
The attached patch does two things for df_get_call_refs():
* First it uses HARD_REG_SETs for defs_generated and
regs_invalidated_by_call, instead of bitmaps. Replacing in total more
than 400K calls (for my testcase) to bitmap_bit_p() with the much faster
TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to
1.5M.
* Second it produces the REFs in REGNO order, which is important to keep
the collection_rec sorted most times, and avoid expensive calls to
qsort(). Thanks to Paolo Bonzini for idea and mentoring.
The second part makes a big difference if accompanied with another patch
in df_insn_refs_collect(). I'll post a followup patch, that is
unfortunately unstable for some of my tests, so I'd appreciate any
comments.
Thanks,
Dimitris
[-- Attachment #2: Type: TEXT/plain, Size: 4342 bytes --]
=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c 2011-02-02 20:08:06 +0000
+++ gcc/df-scan.c 2011-07-08 01:28:55 +0000
@@ -3317,20 +3317,56 @@
int flags)
{
rtx note;
- bitmap_iterator bi;
- unsigned int ui;
bool is_sibling_call;
unsigned int i;
df_ref def;
- bitmap_head defs_generated;
+ HARD_REG_SET defs_generated;
- bitmap_initialize (&defs_generated, &df_bitmap_obstack);
+ CLEAR_HARD_REG_SET(defs_generated);
/* Do not generate clobbers for registers that are the result of the
call. This causes ordering problems in the chain building code
depending on which def is seen first. */
FOR_EACH_VEC_ELT (df_ref, collection_rec->def_vec, i, def)
- bitmap_set_bit (&defs_generated, DF_REF_REGNO (def));
+ SET_HARD_REG_BIT (defs_generated, DF_REF_REGNO (def));
+
+ is_sibling_call = SIBLING_CALL_P (insn_info->insn);
+
+ for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
+ {
+ if (i == STACK_POINTER_REGNUM)
+ /* The stack ptr is used (honorarily) by a CALL insn. */
+ df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+ NULL, bb, insn_info, DF_REF_REG_USE,
+ DF_REF_CALL_STACK_USAGE | flags);
+ else if (global_regs[i])
+ {
+ /* Calls to const functions cannot access any global registers and
+ calls to pure functions cannot set them. All other calls may
+ reference any of the global registers, so they are recorded as
+ used. */
+ if (!RTL_CONST_CALL_P (insn_info->insn))
+ {
+ df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+ NULL, bb, insn_info, DF_REF_REG_USE, flags);
+ if (!RTL_PURE_CALL_P (insn_info->insn))
+ df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+ NULL, bb, insn_info, DF_REF_REG_DEF, flags);
+ }
+ }
+ /* TODO HARD_REG_SET set intersection! */
+ else /* !global_regs[i] */
+ /* track Caller-Saved registers */
+ if (TEST_HARD_REG_BIT(regs_invalidated_by_call, i)
+ && !TEST_HARD_REG_BIT (defs_generated, i)
+ && (!is_sibling_call
+ || !bitmap_bit_p (df->exit_block_uses, i)
+ || refers_to_regno_p (i, i+1,
+ crtl->return_rtx, NULL)))
+ df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
+ NULL, bb, insn_info, DF_REF_REG_DEF,
+ DF_REF_MAY_CLOBBER | flags);
+ }
/* Record the registers used to pass arguments, and explicitly
noted as clobbered. */
@@ -3345,7 +3381,7 @@
if (REG_P (XEXP (XEXP (note, 0), 0)))
{
unsigned int regno = REGNO (XEXP (XEXP (note, 0), 0));
- if (!bitmap_bit_p (&defs_generated, regno))
+ if (!TEST_HARD_REG_BIT (defs_generated, regno))
df_defs_record (collection_rec, XEXP (note, 0), bb,
insn_info, flags);
}
@@ -3355,40 +3391,6 @@
}
}
- /* The stack ptr is used (honorarily) by a CALL insn. */
- df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[STACK_POINTER_REGNUM],
- NULL, bb, insn_info, DF_REF_REG_USE,
- DF_REF_CALL_STACK_USAGE | flags);
-
- /* Calls to const functions cannot access any global registers and calls to
- pure functions cannot set them. All other calls may reference any of the
- global registers, so they are recorded as used. */
- if (!RTL_CONST_CALL_P (insn_info->insn))
- for (i = 0; i < FIRST_PSEUDO_REGISTER; i++)
- if (global_regs[i])
- {
- df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
- NULL, bb, insn_info, DF_REF_REG_USE, flags);
- if (!RTL_PURE_CALL_P (insn_info->insn))
- df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
- NULL, bb, insn_info, DF_REF_REG_DEF, flags);
- }
-
- is_sibling_call = SIBLING_CALL_P (insn_info->insn);
- EXECUTE_IF_SET_IN_BITMAP (regs_invalidated_by_call_regset, 0, ui, bi)
- {
- if (!global_regs[ui]
- && (!bitmap_bit_p (&defs_generated, ui))
- && (!is_sibling_call
- || !bitmap_bit_p (df->exit_block_uses, ui)
- || refers_to_regno_p (ui, ui+1,
- crtl->return_rtx, NULL)))
- df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[ui],
- NULL, bb, insn_info, DF_REF_REG_DEF,
- DF_REF_MAY_CLOBBER | flags);
- }
-
- bitmap_clear (&defs_generated);
return;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
@ 2011-07-08 3:52 ` Dimitrios Apostolou
2011-07-08 6:36 ` Dimitrios Apostolou
` (3 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08 3:52 UTC (permalink / raw)
To: gcc-patches; +Cc: Paolo Bonzini, Steven Bosscher
To document the gains from the bitmaps, here is (part of) the annotated
source from callgrind profiler, showing instruction count. Before:
1,154,400 if (bitmap_bit_p(regs_invalidated_by_call_regset, i)
8,080,800 => bitmap.c:bitmap_bit_p (192400x)
1,021,200 && !bitmap_bit_p (&defs_generated, i)
5,106,000 => bitmap.c:bitmap_bit_p (170200x)
340,400 && (!is_sibling_call
. || !bitmap_bit_p (df->exit_block_uses, i)
. || refers_to_regno_p (i, i+1,
. crtl->return_rtx, NULL)))
2,053,500 df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i
35,279,934 => df-scan.c:df_ref_record (170200x)
. NULL, bb, insn_info, DF_REF_REG_DEF,
. DF_REF_MAY_CLOBBER | flags);
. }
After:
1,346,800 if (TEST_HARD_REG_BIT(regs_invalidated_by_call, i)
510,600 && !TEST_HARD_REG_BIT (defs_generated, i)
340,400 && (!is_sibling_call
. || !bitmap_bit_p (df->exit_block_uses, i)
. || refers_to_regno_p (i, i+1,
. crtl->return_rtx, NULL)))
2,057,200 df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i
35,279,934 => df-scan.c:df_ref_record (170200x)
. NULL, bb, insn_info, DF_REF_REG_DEF,
. DF_REF_MAY_CLOBBER | flags);
. }
Dimitris
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
2011-07-08 3:52 ` Dimitrios Apostolou
@ 2011-07-08 6:36 ` Dimitrios Apostolou
2011-07-08 7:12 ` Paolo Bonzini
2011-07-08 6:37 ` Steven Bosscher
` (2 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08 6:36 UTC (permalink / raw)
To: gcc-patches; +Cc: Paolo Bonzini, Steven Bosscher
[-- Attachment #1: Type: TEXT/PLAIN, Size: 441 bytes --]
And here is the patch that breaks things. By moving df_defs_record()
*after* df_get_call_refs() most times collection_rec remains sorted, and
about 50M instructions are avoided in qsort()
calls of df_canonize_collection_rec().
Unfortunately this does not work. Sometimes cc1 crashes, for example
because regstack is empty in subst_stack_regs_in_debug_insn().
Any ideas for ensuring proper ordering of collection_rec?
Thanks,
Dimitris
[-- Attachment #2: Type: TEXT/plain, Size: 2585 bytes --]
=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c 2011-07-08 01:28:55 +0000
+++ gcc/df-scan.c 2011-07-08 03:38:38 +0000
@@ -3412,8 +3412,9 @@
VEC_truncate (df_ref, collection_rec->eq_use_vec, 0);
VEC_truncate (df_mw_hardreg_ptr, collection_rec->mw_vec, 0);
- /* Record register defs. */
- df_defs_record (collection_rec, PATTERN (insn_info->insn), bb, insn_info, 0);
+ if (CALL_P (insn_info->insn))
+ df_get_call_refs (collection_rec, bb, insn_info,
+ (is_cond_exec) ? DF_REF_CONDITIONAL : 0);
/* Process REG_EQUIV/REG_EQUAL notes. */
for (note = REG_NOTES (insn_info->insn); note;
@@ -3421,33 +3422,33 @@
{
switch (REG_NOTE_KIND (note))
{
+ /* first write DF_REF_BASE */
+ case REG_NON_LOCAL_GOTO:
+ /* The frame ptr is used by a non-local goto. */
+ df_ref_record (DF_REF_BASE, collection_rec,
+ regno_reg_rtx[FRAME_POINTER_REGNUM],
+ NULL, bb, insn_info,
+ DF_REF_REG_USE, 0);
+#if !HARD_FRAME_POINTER_IS_FRAME_POINTER
+ df_ref_record (DF_REF_BASE, collection_rec,
+ regno_reg_rtx[HARD_FRAME_POINTER_REGNUM],
+ NULL, bb, insn_info,
+ DF_REF_REG_USE, 0);
+#endif
+ break;
case REG_EQUIV:
case REG_EQUAL:
df_uses_record (collection_rec,
&XEXP (note, 0), DF_REF_REG_USE,
bb, insn_info, DF_REF_IN_NOTE);
break;
- case REG_NON_LOCAL_GOTO:
- /* The frame ptr is used by a non-local goto. */
- df_ref_record (DF_REF_BASE, collection_rec,
- regno_reg_rtx[FRAME_POINTER_REGNUM],
- NULL, bb, insn_info,
- DF_REF_REG_USE, 0);
-#if !HARD_FRAME_POINTER_IS_FRAME_POINTER
- df_ref_record (DF_REF_BASE, collection_rec,
- regno_reg_rtx[HARD_FRAME_POINTER_REGNUM],
- NULL, bb, insn_info,
- DF_REF_REG_USE, 0);
-#endif
- break;
default:
break;
}
}
- if (CALL_P (insn_info->insn))
- df_get_call_refs (collection_rec, bb, insn_info,
- (is_cond_exec) ? DF_REF_CONDITIONAL : 0);
+ /* Record register defs. */
+ df_defs_record (collection_rec, PATTERN (insn_info->insn), bb, insn_info, 0);
/* Record the register uses. */
df_uses_record (collection_rec,
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
2011-07-08 3:52 ` Dimitrios Apostolou
2011-07-08 6:36 ` Dimitrios Apostolou
@ 2011-07-08 6:37 ` Steven Bosscher
2011-07-08 7:04 ` Dimitrios Apostolou
2011-07-08 6:59 ` Jakub Jelinek
2011-07-08 8:42 ` Richard Guenther
4 siblings, 1 reply; 13+ messages in thread
From: Steven Bosscher @ 2011-07-08 6:37 UTC (permalink / raw)
To: Dimitrios Apostolou; +Cc: gcc-patches, Paolo Bonzini
On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
> The attached patch does two things for df_get_call_refs():
How did you test this patch?
Normally, a patch submission comes with text like, "Bootstrapped &
tested on ..., no regressions.". Also, you chould write a ChangeLog
entry, best included in your mail somewhere at the end ;-)
Ciao!
Steven
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
` (2 preceding siblings ...)
2011-07-08 6:37 ` Steven Bosscher
@ 2011-07-08 6:59 ` Jakub Jelinek
2011-07-08 9:09 ` Dimitrios Apostolou
2011-07-08 8:42 ` Richard Guenther
4 siblings, 1 reply; 13+ messages in thread
From: Jakub Jelinek @ 2011-07-08 6:59 UTC (permalink / raw)
To: Dimitrios Apostolou; +Cc: gcc-patches, Paolo Bonzini, Steven Bosscher
On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
> The attached patch does two things for df_get_call_refs():
> * First it uses HARD_REG_SETs for defs_generated and
> regs_invalidated_by_call, instead of bitmaps. Replacing in total
> more than 400K calls (for my testcase) to bitmap_bit_p() with the
> much faster TEST_HARD_REG_BIT, reduces the total instruction count
> from about 13M to 1.5M.
Have you verified that collection_rec->def_vec never contains pseudo
register references? Otherwise you couldn't use
HARD_REG_SET... gcc_checking_assert might be useful.
Also, a nit, space should be added before (.
CLEAR_HARD_REG_SET(defs_generated);
Jakub
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 6:37 ` Steven Bosscher
@ 2011-07-08 7:04 ` Dimitrios Apostolou
0 siblings, 0 replies; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08 7:04 UTC (permalink / raw)
To: Steven Bosscher; +Cc: gcc-patches, Paolo Bonzini
On Fri, 8 Jul 2011, Steven Bosscher wrote:
> On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
>> The attached patch does two things for df_get_call_refs():
>
> How did you test this patch?
>
> Normally, a patch submission comes with text like, "Bootstrapped &
> tested on ..., no regressions.". Also, you chould write a ChangeLog
> entry, best included in your mail somewhere at the end ;-)
Hi Steven, thanks for the instructions. I've not run the mandatory tests
you have told me about, only done some minor testing due to lack of time.
I'm not yet posting patches for inclusion, but more as an RFC. Should such
patches be sent to gcc instead of gcc-patches?
Thanks,
Dimitris
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 6:36 ` Dimitrios Apostolou
@ 2011-07-08 7:12 ` Paolo Bonzini
2011-07-09 1:29 ` Dimitrios Apostolou
0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2011-07-08 7:12 UTC (permalink / raw)
To: Dimitrios Apostolou; +Cc: gcc-patches, Steven Bosscher
On 07/08/2011 05:51 AM, Dimitrios Apostolou wrote:
> + /* first write DF_REF_BASE */
This is not necessary. These uses are written to use_vec, while the
uses from REG_EQUIV and REG_EQUAL are written to eq_use_vec (see
df_ref_create_structure).
Also, anyway this wouldn't work because you would have to split the loop
in two. I'll attribute that to the time of day when you were writing
the message. :)
> + case REG_NON_LOCAL_GOTO:
> + /* The frame ptr is used by a non-local goto. */
> + df_ref_record (DF_REF_BASE, collection_rec,
> + regno_reg_rtx[FRAME_POINTER_REGNUM],
> + NULL, bb, insn_info,
> + DF_REF_REG_USE, 0);
> +#if !HARD_FRAME_POINTER_IS_FRAME_POINTER
> + df_ref_record (DF_REF_BASE, collection_rec,
> + regno_reg_rtx[HARD_FRAME_POINTER_REGNUM],
> + NULL, bb, insn_info,
> + DF_REF_REG_USE, 0);
> +#endif
> + break;
Also note that you have to check which of FRAME_POINTER_REGNUM and
HARD_FRAME_POINTER_REGNUM comes first here, if you want to ensure the
DF_REF_BASE refs are created sorted. But it's likely better to _not_
create them sorted and just replace qsort with an insertion sort, as
discussed offlist. It will cost a single swap in a pretty rare case.
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
` (3 preceding siblings ...)
2011-07-08 6:59 ` Jakub Jelinek
@ 2011-07-08 8:42 ` Richard Guenther
2011-07-08 10:20 ` Dimitrios Apostolou
4 siblings, 1 reply; 13+ messages in thread
From: Richard Guenther @ 2011-07-08 8:42 UTC (permalink / raw)
To: Dimitrios Apostolou; +Cc: gcc-patches, Paolo Bonzini, Steven Bosscher
On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
> Hello list,
>
> The attached patch does two things for df_get_call_refs():
> * First it uses HARD_REG_SETs for defs_generated and
> regs_invalidated_by_call, instead of bitmaps. Replacing in total more than
> 400K calls (for my testcase) to bitmap_bit_p() with the much faster
> TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to
> 1.5M.
> * Second it produces the REFs in REGNO order, which is important to keep the
> collection_rec sorted most times, and avoid expensive calls to qsort().
> Thanks to Paolo Bonzini for idea and mentoring.
>
> The second part makes a big difference if accompanied with another patch in
> df_insn_refs_collect(). I'll post a followup patch, that is unfortunately
> unstable for some of my tests, so I'd appreciate any comments.
Did you check the impact on memory usage? I suppose on targets
with not many hard registers it should even improve, but do we expect
memory usage to be worse in any case?
Thanks,
Richard.
>
> Thanks,
> Dimitris
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 6:59 ` Jakub Jelinek
@ 2011-07-08 9:09 ` Dimitrios Apostolou
2011-07-08 14:22 ` Paolo Bonzini
0 siblings, 1 reply; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08 9:09 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: gcc-patches, Paolo Bonzini, Steven Bosscher
On Fri, 8 Jul 2011, Jakub Jelinek wrote:
> On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
>> The attached patch does two things for df_get_call_refs():
>> * First it uses HARD_REG_SETs for defs_generated and
>> regs_invalidated_by_call, instead of bitmaps. Replacing in total
>> more than 400K calls (for my testcase) to bitmap_bit_p() with the
>> much faster TEST_HARD_REG_BIT, reduces the total instruction count
>> from about 13M to 1.5M.
>
> Have you verified that collection_rec->def_vec never contains pseudo
> register references? Otherwise you couldn't use
> HARD_REG_SET... gcc_checking_assert might be useful.
>
Hi Jakub, Steve pointed me to the following from GCC Internals Manual:
call_insn insns have the same extra fields as insn insns, accessed in the
same way and in addition contain a field CALL_INSN_FUNCTION_USAGE, which
contains a list (chain of expr_list expressions) containing use and
clobber expressions that denote hard registers and MEMs used or clobbered
by the called function.
So doesn't that mean that for CALL insns it should contain only HARD_REG
DEFs? I will ofcourse use an assert to be sure.
Thanks,
Dimitris
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 8:42 ` Richard Guenther
@ 2011-07-08 10:20 ` Dimitrios Apostolou
0 siblings, 0 replies; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-08 10:20 UTC (permalink / raw)
To: Richard Guenther; +Cc: gcc-patches, Paolo Bonzini, Steven Bosscher
On Fri, 8 Jul 2011, Richard Guenther wrote:
> On Fri, Jul 8, 2011 at 5:20 AM, Dimitrios Apostolou <jimis@gmx.net> wrote:
>> Hello list,
>>
>> The attached patch does two things for df_get_call_refs():
>> * First it uses HARD_REG_SETs for defs_generated and
>> regs_invalidated_by_call, instead of bitmaps. Replacing in total more than
>> 400K calls (for my testcase) to bitmap_bit_p() with the much faster
>> TEST_HARD_REG_BIT, reduces the total instruction count from about 13M to
>> 1.5M.
>> * Second it produces the REFs in REGNO order, which is important to keep the
>> collection_rec sorted most times, and avoid expensive calls to qsort().
>> Thanks to Paolo Bonzini for idea and mentoring.
>>
>> The second part makes a big difference if accompanied with another patch in
>> df_insn_refs_collect(). I'll post a followup patch, that is unfortunately
>> unstable for some of my tests, so I'd appreciate any comments.
>
> Did you check the impact on memory usage? I suppose on targets
> with not many hard registers it should even improve, but do we expect
> memory usage to be worse in any case?
Hi Richard, I didn't check memory usage, is that important? Since the
struct bitmap is fairly bulky, it should take an arch with lots of hard
regs (which one has the most?).
But still a few bytes tradeoff wouldn't be acceptable for a much faster
type? And IMHO it makes the code better to understand, since once you see
HARD_REG_SET you know you can't expect else. FWIW I'm now in the process
of converting all other bitmap uses for hard regs, to HARD_REG_SETs, at
least within DF. I'm not sure whether performance gains will be visible,
however, not much code is as hot as df_get_call_refs().
Thanks,
Dimitris
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 9:09 ` Dimitrios Apostolou
@ 2011-07-08 14:22 ` Paolo Bonzini
2011-07-08 14:25 ` Paolo Bonzini
0 siblings, 1 reply; 13+ messages in thread
From: Paolo Bonzini @ 2011-07-08 14:22 UTC (permalink / raw)
To: Dimitrios Apostolou; +Cc: Jakub Jelinek, gcc-patches, Steven Bosscher
On 07/08/2011 11:05 AM, Dimitrios Apostolou wrote:
> On Fri, 8 Jul 2011, Jakub Jelinek wrote:
>> On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
>>> The attached patch does two things for df_get_call_refs():
>>> * First it uses HARD_REG_SETs for defs_generated and
>>> regs_invalidated_by_call, instead of bitmaps. Replacing in total
>>> more than 400K calls (for my testcase) to bitmap_bit_p() with the
>>> much faster TEST_HARD_REG_BIT, reduces the total instruction count
>>> from about 13M to 1.5M.
>>
>> Have you verified that collection_rec->def_vec never contains pseudo
>> register references? Otherwise you couldn't use
>> HARD_REG_SET... gcc_checking_assert might be useful.
>>
>
> Hi Jakub, Steve pointed me to the following from GCC Internals Manual:
>
> call_insn insns have the same extra fields as insn insns, accessed in
> the same way and in addition contain a field CALL_INSN_FUNCTION_USAGE,
> which contains a list (chain of expr_list expressions) containing use
> and clobber expressions that denote hard registers and MEMs used or
> clobbered by the called function.
>
> So doesn't that mean that for CALL insns it should contain only HARD_REG
> DEFs? I will ofcourse use an assert to be sure.
That part is only for CALL_INSN_FUNCTION_USAGE, which is what
df_get_call_refs handles. However, if you rewrite the handling of
defs_generated as required by your second patch, you'll then be sure
that you will only have hard registers.
BTW, what testcase are you using? I suggest that you try building
stage1 with CFLAGS=--save-temps, and get some of the largest
preprocessed .i files from there (combine and fold-const for example).
You can then time them very easily from the old and new build
directories, with "./cc1 /path/to/file.i -O2".
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 14:22 ` Paolo Bonzini
@ 2011-07-08 14:25 ` Paolo Bonzini
0 siblings, 0 replies; 13+ messages in thread
From: Paolo Bonzini @ 2011-07-08 14:25 UTC (permalink / raw)
To: gcc-patches; +Cc: Jakub Jelinek, gcc-patches, Steven Bosscher
On 07/08/2011 11:05 AM, Dimitrios Apostolou wrote:
> On Fri, 8 Jul 2011, Jakub Jelinek wrote:
>> On Fri, Jul 08, 2011 at 06:20:04AM +0300, Dimitrios Apostolou wrote:
>>> The attached patch does two things for df_get_call_refs():
>>> * First it uses HARD_REG_SETs for defs_generated and
>>> regs_invalidated_by_call, instead of bitmaps. Replacing in total
>>> more than 400K calls (for my testcase) to bitmap_bit_p() with the
>>> much faster TEST_HARD_REG_BIT, reduces the total instruction count
>>> from about 13M to 1.5M.
>>
>> Have you verified that collection_rec->def_vec never contains pseudo
>> register references? Otherwise you couldn't use
>> HARD_REG_SET... gcc_checking_assert might be useful.
>>
>
> Hi Jakub, Steve pointed me to the following from GCC Internals Manual:
>
> call_insn insns have the same extra fields as insn insns, accessed in
> the same way and in addition contain a field CALL_INSN_FUNCTION_USAGE,
> which contains a list (chain of expr_list expressions) containing use
> and clobber expressions that denote hard registers and MEMs used or
> clobbered by the called function.
>
> So doesn't that mean that for CALL insns it should contain only HARD_REG
> DEFs? I will ofcourse use an assert to be sure.
That part is only for CALL_INSN_FUNCTION_USAGE, which is what
df_get_call_refs handles. However, if you rewrite the handling of
defs_generated as required by your second patch, you'll then be sure
that you will only have hard registers.
BTW, what testcase are you using? I suggest that you try building
stage1 with CFLAGS=--save-temps, and get some of the largest
preprocessed .i files from there (combine and fold-const for example).
You can then time them very easily from the old and new build
directories, with "./cc1 /path/to/file.i -O2".
Paolo
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps
2011-07-08 7:12 ` Paolo Bonzini
@ 2011-07-09 1:29 ` Dimitrios Apostolou
0 siblings, 0 replies; 13+ messages in thread
From: Dimitrios Apostolou @ 2011-07-09 1:29 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: gcc-patches, Steven Bosscher
On Fri, 8 Jul 2011, Paolo Bonzini wrote:
> On 07/08/2011 05:51 AM, Dimitrios Apostolou wrote:
>> + /* first write DF_REF_BASE */
>
> This is not necessary. These uses are written to use_vec, while the uses
> from REG_EQUIV and REG_EQUAL are written to eq_use_vec (see
> df_ref_create_structure).
Thanks for pointing this out, I missed it, it's complex to follow
what is written where. Perhaps there is meaning in changing the interface
of df_ref_create_structure() to accept the particular vector.
>
> Also, anyway this wouldn't work because you would have to split the loop in
> two. I'll attribute that to the time of day when you were writing the
> message. :)
Even though I'm having my own disputes with Morpheus, sleep deprivation
wasn't tha culprit for this. :-) My intention was to keep DF_REF_BASEs as
close to each other, because a close-to-be-sorted array will be sorted
faster with the sorting function I'll write... :-)
But anyway as you've explained it's irrelevant since they are going
to different vectors.
Thanks,
Dimitris
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2011-07-09 1:03 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-08 3:33 [df-scan.c] Optimise DF_REFs ordering in collection_rec, use HARD_REG_SETs instead of bitmaps Dimitrios Apostolou
2011-07-08 3:52 ` Dimitrios Apostolou
2011-07-08 6:36 ` Dimitrios Apostolou
2011-07-08 7:12 ` Paolo Bonzini
2011-07-09 1:29 ` Dimitrios Apostolou
2011-07-08 6:37 ` Steven Bosscher
2011-07-08 7:04 ` Dimitrios Apostolou
2011-07-08 6:59 ` Jakub Jelinek
2011-07-08 9:09 ` Dimitrios Apostolou
2011-07-08 14:22 ` Paolo Bonzini
2011-07-08 14:25 ` Paolo Bonzini
2011-07-08 8:42 ` Richard Guenther
2011-07-08 10:20 ` Dimitrios Apostolou
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).