* [patch] Use simple LRA algorithm at -O0
@ 2019-12-17 18:04 Eric Botcazou
2019-12-18 14:01 ` Vladimir Makarov
0 siblings, 1 reply; 4+ messages in thread
From: Eric Botcazou @ 2019-12-17 18:04 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 551 bytes --]
Hi,
LRA is getting measurably slower since GCC 8, at least on x86, and things are
worsening since GCC 9. While this might be legitimate when optimization is
enabled, it's a pure waste of cycles at -O0 so the attached patch switches LRA
over to using the simple algorithm when optimization is disabled. The effect
on code size is tiny (typically 0.2% on x86).
Tested on x86_64-suse-linux, OK for the mainline?
2019-12-17 Eric Botcazou <ebotcazou@adacore.com>
* ira.c (ira): Use simple LRA algorithm when not optimizing.
--
Eric Botcazou
[-- Attachment #2: p.diff --]
[-- Type: text/x-patch, Size: 1809 bytes --]
Index: ira.c
===================================================================
--- ira.c (revision 279442)
+++ ira.c (working copy)
@@ -5192,8 +5192,6 @@ ira (FILE *f)
int ira_max_point_before_emit;
bool saved_flag_caller_saves = flag_caller_saves;
enum ira_region saved_flag_ira_region = flag_ira_region;
- unsigned int i;
- int num_used_regs = 0;
clear_bb_flags ();
@@ -5207,18 +5205,28 @@ ira (FILE *f)
/* Perform target specific PIC register initialization. */
targetm.init_pic_reg ();
- ira_conflicts_p = optimize > 0;
+ if (optimize)
+ {
+ ira_conflicts_p = true;
- /* Determine the number of pseudos actually requiring coloring. */
- for (i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
- num_used_regs += !!(DF_REG_USE_COUNT (i) + DF_REG_DEF_COUNT (i));
-
- /* If there are too many pseudos and/or basic blocks (e.g. 10K
- pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
- use simplified and faster algorithms in LRA. */
- lra_simple_p
- = (ira_use_lra_p
- && num_used_regs >= (1 << 26) / last_basic_block_for_fn (cfun));
+ /* Determine the number of pseudos actually requiring coloring. */
+ unsigned int num_used_regs = 0;
+ for (unsigned int i = FIRST_PSEUDO_REGISTER; i < DF_REG_SIZE (df); i++)
+ if (DF_REG_DEF_COUNT (i) || DF_REG_USE_COUNT (i))
+ num_used_regs++;
+
+ /* If there are too many pseudos and/or basic blocks (e.g. 10K
+ pseudos and 10K blocks or 100K pseudos and 1K blocks), we will
+ use simplified and faster algorithms in LRA. */
+ lra_simple_p
+ = ira_use_lra_p
+ && num_used_regs >= (1U << 26) / last_basic_block_for_fn (cfun);
+ }
+ else
+ {
+ ira_conflicts_p = false;
+ lra_simple_p = ira_use_lra_p;
+ }
if (lra_simple_p)
{
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] Use simple LRA algorithm at -O0
2019-12-17 18:04 [patch] Use simple LRA algorithm at -O0 Eric Botcazou
@ 2019-12-18 14:01 ` Vladimir Makarov
2019-12-19 12:13 ` Eric Botcazou
0 siblings, 1 reply; 4+ messages in thread
From: Vladimir Makarov @ 2019-12-18 14:01 UTC (permalink / raw)
To: Eric Botcazou, gcc-patches
On 2019-12-17 1:02 p.m., Eric Botcazou wrote:
> Hi,
>
> LRA is getting measurably slower since GCC 8, at least on x86, and things are
> worsening since GCC 9. While this might be legitimate when optimization is
> enabled, it's a pure waste of cycles at -O0 so the attached patch switches LRA
> over to using the simple algorithm when optimization is disabled. The effect
> on code size is tiny (typically 0.2% on x86).
>
> Tested on x86_64-suse-linux, OK for the mainline?
>
Eric, thank you for reporting this issue and providing the patch.
Simple LRA algorithms switch off hard register splitting, so there might
a slightly bigger chance for occurring "can find reload register" error
(e.g. when -O0 -fschedule-insns is used). But this error is still not
solved in general case and in my experience the chance for this error is
even bigger for optimized modes than for -O0 with simple LRA algorithms.
Saying that I believe the patch is OK for the trunk.
> 2019-12-17 Eric Botcazou <ebotcazou@adacore.com>
>
> * ira.c (ira): Use simple LRA algorithm when not optimizing.
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] Use simple LRA algorithm at -O0
2019-12-18 14:01 ` Vladimir Makarov
@ 2019-12-19 12:13 ` Eric Botcazou
2019-12-19 23:05 ` Vladimir Makarov
0 siblings, 1 reply; 4+ messages in thread
From: Eric Botcazou @ 2019-12-19 12:13 UTC (permalink / raw)
To: Vladimir Makarov; +Cc: gcc-patches
> Simple LRA algorithms switch off hard register splitting, so there might
> a slightly bigger chance for occurring "can find reload register" error
> (e.g. when -O0 -fschedule-insns is used). But this error is still not
> solved in general case and in my experience the chance for this error is
> even bigger for optimized modes than for -O0 with simple LRA algorithms.
I see, thanks for the explanation. So this could occur for register varuables
or something along these lines?
> Saying that I believe the patch is OK for the trunk.
OK, let's see how it fares. We have been using it with a GCC 9 compiler for
some time, without any problem so far.
--
Eric Botcazou
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [patch] Use simple LRA algorithm at -O0
2019-12-19 12:13 ` Eric Botcazou
@ 2019-12-19 23:05 ` Vladimir Makarov
0 siblings, 0 replies; 4+ messages in thread
From: Vladimir Makarov @ 2019-12-19 23:05 UTC (permalink / raw)
To: Eric Botcazou; +Cc: gcc-patches
On 12/19/19 6:29 AM, Eric Botcazou wrote:
>> Simple LRA algorithms switch off hard register splitting, so there might
>> a slightly bigger chance for occurring "can find reload register" error
>> (e.g. when -O0 -fschedule-insns is used). But this error is still not
>> solved in general case and in my experience the chance for this error is
>> even bigger for optimized modes than for -O0 with simple LRA algorithms.
> I see, thanks for the explanation. So this could occur for register varuables
> or something along these lines?
It might occur when when liveness of hard registers explicitly present
in RTL are expanded. A typical example is a move of hard register (e.g.
x86-64 dx used as function call argument) through insn always requiring
this hard register (e.g. a x86-64 div insn using ax/dx hard register).
Also there are more complicated cases. Reload pass never tried to solve
this problem. LRA tries to solve it but still in general case this
problem is also not solved. Therefore 1st insn scheduler on some
targets is switched off by default. Still GCC users can switch it on
and ran into the problem with or without the patch.
>> Saying that I believe the patch is OK for the trunk.
> OK, let's see how it fares. We have been using it with a GCC 9 compiler for
> some time, without any problem so far.
>
As I wrote for typical GCC use the patch will not create any problem.
But GCC users (or running automatically generated tests with artificial
option set) still can ran into the problem as it was before the patch.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-12-19 22:34 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-17 18:04 [patch] Use simple LRA algorithm at -O0 Eric Botcazou
2019-12-18 14:01 ` Vladimir Makarov
2019-12-19 12:13 ` Eric Botcazou
2019-12-19 23:05 ` Vladimir Makarov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).