public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [patch] Use simple LRA algorithm at -O0
@ 2019-12-17 18:04 Eric Botcazou
  2019-12-18 14:01 ` Vladimir Makarov
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Botcazou @ 2019-12-17 18:04 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 551 bytes --]

Hi,

LRA is getting measurably slower since GCC 8, at least on x86, and things are 
worsening since GCC 9.  While this might be legitimate when optimization is 
enabled, it's a pure waste of cycles at -O0 so the attached patch switches LRA 
over to using the simple algorithm when optimization is disabled.  The effect 
on code size is tiny (typically 0.2% on x86).

Tested on x86_64-suse-linux, OK for the mainline?


2019-12-17  Eric Botcazou  <ebotcazou@adacore.com>

	* ira.c (ira): Use simple LRA algorithm when not optimizing.

-- 
Eric Botcazou

[-- Attachment #2: p.diff --]
[-- Type: text/x-patch, Size: 1809 bytes --]

Index: ira.c
===================================================================
--- ira.c	(revision 279442)
+++ ira.c	(working copy)
@@ -5192,8 +5192,6 @@ ira (FILE *f)
   int ira_max_point_before_emit;
   bool saved_flag_caller_saves = flag_caller_saves;
   enum ira_region saved_flag_ira_region = flag_ira_region;
-  unsigned int i;
-  int num_used_regs = 0;
 
   clear_bb_flags ();
 
@@ -5207,18 +5205,28 @@ ira (FILE *f)
   /* Perform target specific PIC register initialization.  */
   targetm.init_pic_reg ();
 
-  ira_conflicts_p = optimize > 0;
+  if (optimize)
+    {
+      ira_conflicts_p = true;
 
-  /* Determine the number of pseudos actually requiring coloring.  */
-  for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
-    num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
-
-  /* If there are too many pseudos and/or basic blocks (e.g. 10K
-     pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
-     use simplified and faster algorithms in LRA.  */
-  lra_simple_p
-    = (ira_use_lra_p
-       && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
+      /* Determine the number of pseudos actually requiring coloring.  */
+      unsigned int num_used_regs = 0;
+      for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
+	if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
+	  num_used_regs++;
+
+      /* If there are too many pseudos and/or basic blocks (e.g. 10K
+	 pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
+	 use simplified and faster algorithms in LRA.  */
+      lra_simple_p
+	= ira_use_lra_p
+	  && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
+    }
+  else
+    {
+      ira_conflicts_p = false;
+      lra_simple_p = ira_use_lra_p;
+    }
 
   if (lra_simple_p)
     {

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch] Use simple LRA algorithm at -O0
  2019-12-17 18:04 [patch] Use simple LRA algorithm at -O0 Eric Botcazou
@ 2019-12-18 14:01 ` Vladimir Makarov
  2019-12-19 12:13   ` Eric Botcazou
  0 siblings, 1 reply; 4+ messages in thread
From: Vladimir Makarov @ 2019-12-18 14:01 UTC (permalink / raw)
  To: Eric Botcazou, gcc-patches


On 2019-12-17 1:02 p.m., Eric Botcazou wrote:
> Hi,
>
> LRA is getting measurably slower since GCC 8, at least on x86, and things are
> worsening since GCC 9.  While this might be legitimate when optimization is
> enabled, it's a pure waste of cycles at -O0 so the attached patch switches LRA
> over to using the simple algorithm when optimization is disabled.  The effect
> on code size is tiny (typically 0.2% on x86).
>
> Tested on x86_64-suse-linux, OK for the mainline?
>
Eric, thank you for reporting this issue and providing the patch.   
Simple LRA algorithms switch off hard register splitting, so there might 
a slightly bigger chance for occurring "can find reload register" error 
(e.g. when -O0 -fschedule-insns is used). But this error is still not 
solved in general case and in my experience the chance for this error is 
even bigger for optimized modes than for -O0 with simple LRA algorithms.

Saying that I believe the patch is OK for the trunk.

> 2019-12-17  Eric Botcazou  <ebotcazou@adacore.com>
>
> 	* ira.c (ira): Use simple LRA algorithm when not optimizing.
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch] Use simple LRA algorithm at -O0
  2019-12-18 14:01 ` Vladimir Makarov
@ 2019-12-19 12:13   ` Eric Botcazou
  2019-12-19 23:05     ` Vladimir Makarov
  0 siblings, 1 reply; 4+ messages in thread
From: Eric Botcazou @ 2019-12-19 12:13 UTC (permalink / raw)
  To: Vladimir Makarov; +Cc: gcc-patches

> Simple LRA algorithms switch off hard register splitting, so there might
> a slightly bigger chance for occurring "can find reload register" error
> (e.g. when -O0 -fschedule-insns is used). But this error is still not
> solved in general case and in my experience the chance for this error is
> even bigger for optimized modes than for -O0 with simple LRA algorithms.

I see, thanks for the explanation.  So this could occur for register varuables 
or something along these lines?

> Saying that I believe the patch is OK for the trunk.

OK, let's see how it fares.  We have been using it with a GCC 9 compiler for 
some time, without any problem so far.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [patch] Use simple LRA algorithm at -O0
  2019-12-19 12:13   ` Eric Botcazou
@ 2019-12-19 23:05     ` Vladimir Makarov
  0 siblings, 0 replies; 4+ messages in thread
From: Vladimir Makarov @ 2019-12-19 23:05 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches

On 12/19/19 6:29 AM, Eric Botcazou wrote:
>> Simple LRA algorithms switch off hard register splitting, so there might
>> a slightly bigger chance for occurring "can find reload register" error
>> (e.g. when -O0 -fschedule-insns is used). But this error is still not
>> solved in general case and in my experience the chance for this error is
>> even bigger for optimized modes than for -O0 with simple LRA algorithms.
> I see, thanks for the explanation.  So this could occur for register varuables
> or something along these lines?

It might occur when when liveness of hard registers explicitly present 
in RTL are expanded. A typical example is a move of hard register (e.g. 
x86-64 dx used as function call argument) through insn always requiring 
this hard register (e.g. a x86-64 div insn using ax/dx hard register).  
Also there are more complicated cases.  Reload pass never tried to solve 
this problem.  LRA tries to solve it but still in general case this 
problem is also not solved.  Therefore 1st insn scheduler on some 
targets is switched off by default.  Still GCC users can switch it on 
and ran into the problem with or without the patch.

>> Saying that I believe the patch is OK for the trunk.
> OK, let's see how it fares.  We have been using it with a GCC 9 compiler for
> some time, without any problem so far.
>
As I wrote for typical GCC use the patch will not create any problem.  
But GCC users (or running automatically generated tests with artificial 
option set) still can ran into the problem as it was before the patch.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-12-19 22:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-17 18:04 [patch] Use simple LRA algorithm at -O0 Eric Botcazou
2019-12-18 14:01 ` Vladimir Makarov
2019-12-19 12:13   ` Eric Botcazou
2019-12-19 23:05     ` Vladimir Makarov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).