public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch PR middle-end/39326 - compile time and memory explosion in combine
@ 2013-03-09 22:24 Steven Bosscher
  2013-03-11  9:23 ` Richard Biener
  0 siblings, 1 reply; 4+ messages in thread
From: Steven Bosscher @ 2013-03-09 22:24 UTC (permalink / raw)
  To: GCC Patches; +Cc: Richard Biener, Jakub Jelinek

[-- Attachment #1: Type: text/plain, Size: 1185 bytes --]

Hello,

The attached patch fixes one of the (at least) three scalability
problems reported in PR middle-end/39326. This problem is that combine
takes forever and increases the memory footprint from ~650MB to >7GB.
The cause is DSE creating a lot of new registers in replace_read, and
those registers are not copy-prop'd out between dse1 and combine. The
result is many overlapping live ranges and single-set-single-use
registers that combine is made to work on.

The fix is to just not create so many new registers in DSE in the
first place. It is wasteful and unnecessary if an existing register
can be re-used.

With this patch, for the test case of the PR the combine time in
combine goes down from ~350s to ~4.5s, and the memory footprint
explosion is avoided. For my set of cc1-i files this also helps reduce
compile time a modest amount, especially for the larger files of
course.

Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu.
OK for trunk?

Ciao!
Steven


        PR middle-end/39326
        * dse.c (replace_read): If the stored value is a pseudo-register
        that is set only once, re-use it to replace the load instead of
        creating a new register.

[-- Attachment #2: PR39326_RTLDSE.diff --]
[-- Type: application/octet-stream, Size: 4681 bytes --]

	PR middle-end/39326
	* dse.c (replace_read): If the stored value is a pseudo-register
	that is set only once, re-use it to replace the load instead of
	creating a new register.

Index: dse.c
===================================================================
--- dse.c	(revision 196576)
+++ dse.c	(working copy)
@@ -2043,10 +2043,20 @@ replace_read (store_info_t store_info, insn_info_t
 	fprintf (dump_file, " -- could not extract bits of stored value\n");
       return false;
     }
-  /* Force the value into a new register so that it won't be clobbered
-     between the store and the load.  */
-  read_reg = copy_to_mode_reg (read_mode, read_reg);
-  insns = get_insns ();
+
+  /* If READ_REG is a pseudo that is only set once, re-use the register.
+     Otherwise, force the value into a new register so that it won't be
+     clobbered between the store and the load.  */
+  if (! REG_P (read_reg) || HARD_REGISTER_P (read_reg)
+      || REGNO (read_reg) >= DF_REG_SIZE (df)
+      || DF_REG_DEF_COUNT (REGNO (read_reg)) != 1)
+    {
+      read_reg = copy_to_mode_reg (read_mode, read_reg);
+      insns = get_insns ();
+    }
+  else
+    insns = NULL_RTX;
+
   end_sequence ();
 
   if (insns != NULL_RTX)
@@ -2079,41 +2089,45 @@ replace_read (store_info_t store_info, insn_info_t
 
   if (validate_change (read_insn->insn, loc, read_reg, 0))
     {
-      deferred_change_t deferred_change =
-	(deferred_change_t) pool_alloc (deferred_change_pool);
-
-      /* Insert this right before the store insn where it will be safe
-	 from later insns that might change it before the read.  */
-      emit_insn_before (insns, store_insn->insn);
-
-      /* And now for the kludge part: cselib croaks if you just
-	 return at this point.  There are two reasons for this:
-
-	 1) Cselib has an idea of how many pseudos there are and
-	 that does not include the new ones we just added.
-
-	 2) Cselib does not know about the move insn we added
-	 above the store_info, and there is no way to tell it
-	 about it, because it has "moved on".
-
-	 Problem (1) is fixable with a certain amount of engineering.
-	 Problem (2) is requires starting the bb from scratch.  This
-	 could be expensive.
-
-	 So we are just going to have to lie.  The move/extraction
-	 insns are not really an issue, cselib did not see them.  But
-	 the use of the new pseudo read_insn is a real problem because
-	 cselib has not scanned this insn.  The way that we solve this
-	 problem is that we are just going to put the mem back for now
-	 and when we are finished with the block, we undo this.  We
-	 keep a table of mems to get rid of.  At the end of the basic
-	 block we can put them back.  */
-
-      *loc = read_info->mem;
-      deferred_change->next = deferred_change_list;
-      deferred_change_list = deferred_change;
-      deferred_change->loc = loc;
-      deferred_change->reg = read_reg;
+      /* Did we emit new insns and/or created new pseudos?  */
+      if (insns != NULL_RTX)
+	{
+	  deferred_change_t deferred_change =
+	    (deferred_change_t) pool_alloc (deferred_change_pool);
+
+	  /* Insert this right before the store insn where it will be safe
+	     from later insns that might change it before the read.  */
+	  emit_insn_before (insns, store_insn->insn);
+
+	  /* And now for the kludge part: cselib croaks if you just
+	     return at this point.  There are two reasons for this:
+
+	     1) Cselib has an idea of how many pseudos there are and
+	     that does not include the new ones we just added.
+
+	     2) Cselib does not know about the move insn we added
+	     above the store_info, and there is no way to tell it
+	     about it, because it has "moved on".
+
+	     Problem (1) is fixable with a certain amount of engineering.
+	     Problem (2) is requires starting the bb from scratch.  This
+	     could be expensive.
+
+	     So we are just going to have to lie.  The move/extraction
+	     insns are not really an issue, cselib did not see them.  But
+	     the use of the new pseudo read_insn is a real problem because
+	     cselib has not scanned this insn.  The way that we solve this
+	     problem is that we are just going to put the mem back for now
+	     and when we are finished with the block, we undo this.  We
+	     keep a table of mems to get rid of.  At the end of the basic
+	     block we can put them back.  */
+
+	  *loc = read_info->mem;
+	  deferred_change->next = deferred_change_list;
+	  deferred_change_list = deferred_change;
+	  deferred_change->loc = loc;
+	  deferred_change->reg = read_reg;
+	}
 
       /* Get rid of the read_info, from the point of view of the
 	 rest of dse, play like this read never happened.  */

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch PR middle-end/39326 - compile time and memory explosion in combine
  2013-03-09 22:24 [patch PR middle-end/39326 - compile time and memory explosion in combine Steven Bosscher
@ 2013-03-11  9:23 ` Richard Biener
  2013-03-11 18:05   ` Jeff Law
  2013-03-11 18:27   ` Steven Bosscher
  0 siblings, 2 replies; 4+ messages in thread
From: Richard Biener @ 2013-03-11  9:23 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: GCC Patches, Jakub Jelinek

On Sat, Mar 9, 2013 at 11:23 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> Hello,
>
> The attached patch fixes one of the (at least) three scalability
> problems reported in PR middle-end/39326. This problem is that combine
> takes forever and increases the memory footprint from ~650MB to >7GB.
> The cause is DSE creating a lot of new registers in replace_read, and
> those registers are not copy-prop'd out between dse1 and combine. The
> result is many overlapping live ranges and single-set-single-use
> registers that combine is made to work on.
>
> The fix is to just not create so many new registers in DSE in the
> first place. It is wasteful and unnecessary if an existing register
> can be re-used.
>
> With this patch, for the test case of the PR the combine time in
> combine goes down from ~350s to ~4.5s, and the memory footprint
> explosion is avoided. For my set of cc1-i files this also helps reduce
> compile time a modest amount, especially for the larger files of
> course.
>
> Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu.
> OK for trunk?

Not sure on the patch details - but in general I wonder why _DSE_
performs full redundancy elimination at all ... which replace_read
seems to be.

Anyway, for this one I'd say we wait for stage1 and consider backporting
for 4.8.1 given we want to do a 4.8 RC1 soon.

Thanks,
Richard.

> Ciao!
> Steven
>
>
>         PR middle-end/39326
>         * dse.c (replace_read): If the stored value is a pseudo-register
>         that is set only once, re-use it to replace the load instead of
>         creating a new register.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch PR middle-end/39326 - compile time and memory explosion in combine
  2013-03-11  9:23 ` Richard Biener
@ 2013-03-11 18:05   ` Jeff Law
  2013-03-11 18:27   ` Steven Bosscher
  1 sibling, 0 replies; 4+ messages in thread
From: Jeff Law @ 2013-03-11 18:05 UTC (permalink / raw)
  To: Richard Biener; +Cc: Steven Bosscher, GCC Patches, Jakub Jelinek

On 03/11/2013 03:23 AM, Richard Biener wrote:
> On Sat, Mar 9, 2013 at 11:23 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>> Hello,
>>
>> The attached patch fixes one of the (at least) three scalability
>> problems reported in PR middle-end/39326. This problem is that combine
>> takes forever and increases the memory footprint from ~650MB to >7GB.
>> The cause is DSE creating a lot of new registers in replace_read, and
>> those registers are not copy-prop'd out between dse1 and combine. The
>> result is many overlapping live ranges and single-set-single-use
>> registers that combine is made to work on.
>>
>> The fix is to just not create so many new registers in DSE in the
>> first place. It is wasteful and unnecessary if an existing register
>> can be re-used.
>>
>> With this patch, for the test case of the PR the combine time in
>> combine goes down from ~350s to ~4.5s, and the memory footprint
>> explosion is avoided. For my set of cc1-i files this also helps reduce
>> compile time a modest amount, especially for the larger files of
>> course.
>>
>> Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu.
>> OK for trunk?
>
> Not sure on the patch details - but in general I wonder why _DSE_
> performs full redundancy elimination at all ... which replace_read
> seems to be.
>
> Anyway, for this one I'd say we wait for stage1 and consider backporting
> for 4.8.1 given we want to do a 4.8 RC1 soon.
Agreed on the defer until 4.8.1.

jeff

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch PR middle-end/39326 - compile time and memory explosion in combine
  2013-03-11  9:23 ` Richard Biener
  2013-03-11 18:05   ` Jeff Law
@ 2013-03-11 18:27   ` Steven Bosscher
  1 sibling, 0 replies; 4+ messages in thread
From: Steven Bosscher @ 2013-03-11 18:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Jakub Jelinek

On Mon, Mar 11, 2013 at 10:23 AM, Richard Biener wrote:
> On Sat, Mar 9, 2013 at 11:23 PM, Steven Bosscher wrote:
>> The attached patch fixes one of the (at least) three scalability
>> problems reported in PR middle-end/39326. This problem is that combine
>> takes forever and increases the memory footprint from ~650MB to >7GB.
>> The cause is DSE creating a lot of new registers in replace_read, and
>> those registers are not copy-prop'd out between dse1 and combine. The
>> result is many overlapping live ranges and single-set-single-use
>> registers that combine is made to work on.
>>
>> The fix is to just not create so many new registers in DSE in the
>> first place. It is wasteful and unnecessary if an existing register
>> can be re-used.
>>
>> With this patch, for the test case of the PR the combine time in
>> combine goes down from ~350s to ~4.5s, and the memory footprint
>> explosion is avoided. For my set of cc1-i files this also helps reduce
>> compile time a modest amount, especially for the larger files of
>> course.
>>
>> Bootstrapped&tested on {x86_64,powerpc64}-unknown-linux-gnu.
>> OK for trunk?
>
> Not sure on the patch details - but in general I wonder why _DSE_
> performs full redundancy elimination at all ... which replace_read
> seems to be.

Yes. No idea why, it's always been like that.

> Anyway, for this one I'd say we wait for stage1 and consider backporting
> for 4.8.1 given we want to do a 4.8 RC1 soon.

OK. I've found a buglet in the patch, too, so I'll post an updated
patch for trunk GCC 4.9.

Ciao!
Steven

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-03-11 18:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-09 22:24 [patch PR middle-end/39326 - compile time and memory explosion in combine Steven Bosscher
2013-03-11  9:23 ` Richard Biener
2013-03-11 18:05   ` Jeff Law
2013-03-11 18:27   ` Steven Bosscher

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).