[Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)?

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)?
@ 2013-12-17  9:30 rearnsha at gcc dot gnu.org
  2013-12-17  9:32 ` [Bug rtl-optimization/59535] " rearnsha at gcc dot gnu.org
                   ` (18 more replies)
  0 siblings, 19 replies; 20+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-12-17  9:30 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

            Bug ID: 59535
           Summary: [4.9 regression] -Os code size regressions for
                    Thumb1/Thumb2 (with LRA)?
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rearnsha at gcc dot gnu.org
                CC: ramana.radhakrishnan at arm dot com, vmakarov at redhat dot com,
                    yvan.roux at linaro dot org
            Target: arm

Running the CSiBE benchmark for -Os with Thumb1 and Thumb2 code generation
shows significant regressions since 11/12/13 (the day LRA was turned on by
default for ARM).  These cause code size to grow back to the size it was in
2011.

I'll try to find some simple examples to use as test cases.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)?
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
@ 2013-12-17  9:32 ` rearnsha at gcc dot gnu.org
  2013-12-17  9:53 ` rearnsha at gcc dot gnu.org
                   ` (17 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-12-17  9:32 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #1 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
CSiBE code size results for Thumb2
2013/12/09 2543786
2013/12/11 2563522


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)?
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
  2013-12-17  9:32 ` [Bug rtl-optimization/59535] " rearnsha at gcc dot gnu.org
@ 2013-12-17  9:53 ` rearnsha at gcc dot gnu.org
  2013-12-17 11:14 ` [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA rearnsha at gcc dot gnu.org
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-12-17  9:53 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #2 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
CSiBE code size results for Thumb1

2013/12/09 2634640
2013/12/11 2683980

=1.8% size regression.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
  2013-12-17  9:32 ` [Bug rtl-optimization/59535] " rearnsha at gcc dot gnu.org
  2013-12-17  9:53 ` rearnsha at gcc dot gnu.org
@ 2013-12-17 11:14 ` rearnsha at gcc dot gnu.org
  2013-12-17 11:22 ` rearnsha at gcc dot gnu.org
                   ` (15 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-12-17 11:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #4 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Created attachment 31455
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=31455&action=edit
testcase

Compile with -Os -mthumb -mcpu=arm7tdmi -fno-short-enums and either -mlra or
-mno-lra


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2013-12-17 11:14 ` [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA rearnsha at gcc dot gnu.org
@ 2013-12-17 11:22 ` rearnsha at gcc dot gnu.org
  2013-12-17 22:47 ` yvan.roux at linaro dot org
                   ` (14 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-12-17 11:22 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #5 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Number of register-register move operations in the testcase
lra:    208
no-lra: 105


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2013-12-17 11:22 ` rearnsha at gcc dot gnu.org
@ 2013-12-17 22:47 ` yvan.roux at linaro dot org
  2013-12-18  9:29 ` yvan.roux at linaro dot org
                   ` (13 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: yvan.roux at linaro dot org @ 2013-12-17 22:47 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #7 from Yvan Roux <yvan.roux at linaro dot org> ---
I'm not able to reproduce with today's trunk or rev #205887 built for
arm-none-linux-gnueabi, as the compiler ICEs in assign_by_spills
(lra-assign.c). What is your configuration ?


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2013-12-17 22:47 ` yvan.roux at linaro dot org
@ 2013-12-18  9:29 ` yvan.roux at linaro dot org
  2013-12-18 16:06 ` vmakarov at gcc dot gnu.org
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: yvan.roux at linaro dot org @ 2013-12-18  9:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #8 from Yvan Roux <yvan.roux at linaro dot org> ---
Forget my previous comment, I can reproduce it after a rebase.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2013-12-18  9:29 ` yvan.roux at linaro dot org
@ 2013-12-18 16:06 ` vmakarov at gcc dot gnu.org
  2013-12-18 22:14 ` vmakarov at gcc dot gnu.org
                   ` (11 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2013-12-18 16:06 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

Vladimir Makarov <vmakarov at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org

--- Comment #9 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Richard Earnshaw from comment #5)
> Number of register-register move operations in the testcase
> lra:    208
> no-lra: 105

That is huge degradation.  And I guess it is not only because of 2-ops insns. 
It works adequate on x86/x86-64.  Although I know some issues with 2-ops
commutative insns (I am trying to fix it too).

I think major problem is in wrong alternative choices as thumb lo/hi reg usage
is complicated.

I take this bug very seriously.  If I cannot fix it till end of Jan (sorry, it
is a vacation time), probably we should switch to reload pass for thumb.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2013-12-18 16:06 ` vmakarov at gcc dot gnu.org
@ 2013-12-18 22:14 ` vmakarov at gcc dot gnu.org
  2013-12-19  9:26 ` ramana at gcc dot gnu.org
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2013-12-18 22:14 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #10 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
./cc1 -Os -mthumb -mcpu=arm7tdmi -fno-short-enums test.i -m{no-}lra
-fno-schedule-insns2

        original reload  reload with change  lra with change
reg moves       104             86              101
all moves       210             192             207
loads           561             569             541
stores          234             231             210

overall        1005             992             958
mov/ldr/str

  First of all, I can not compile the test with LRA because of its
crash.  The problem is in choosing alternatives different from reload.
For some insn alternatives, LRA needs two hi regs for reloads.  But
there is only one 12, all others hi-regs are fixed.

 To be honest, I don't know why 12 is not fixed.  It results in using
12 by IRA and bigger code when even reload is used.  I believe it
should fixed too.

 Second, big number of moves is not a bad thing.  It should considered
with # ld/st.  LRA generates 15 more reg moves but 28 less loads and
21 less stores.  It means LRA with the patch generates smaller and
faster code than reload.

Here is the patch improving code size for reload and fixing LRA crash
and improving code for LRA.

  If arm part of the patch is ok, please let me know.  I'll commit the
patch then.  The rest of the patch does not affect the test code
generation but it might affect other code generated with -Os.

Index: config/arm/arm.c
===================================================================
--- config/arm/arm.c    (revision 206089)
+++ config/arm/arm.c    (working copy)
@@ -29236,6 +29236,7 @@ arm_conditional_register_usage (void)
       for (regno = FIRST_HI_REGNUM;
           regno <= LAST_HI_REGNUM; ++regno)
        fixed_regs[regno] = call_used_regs[regno] = 1;
+      fixed_regs[12] = call_used_regs[12] = 1;
     }

   /* The link register can be clobbered by any branch insn,
Index: lra-assigns.c
===================================================================
--- lra-assigns.c       (revision 206089)
+++ lra-assigns.c       (working copy)
@@ -612,7 +612,9 @@ find_hard_regno_for (int regno, int *cos
                && ! df_regs_ever_live_p (hard_regno + j))
              /* It needs save restore.  */
              hard_regno_costs[hard_regno]
-               += 2 * ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb->frequency + 1;
+               += (2
+                   * REG_FREQ_FROM_BB (ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb)
+                   + 1);
          priority = targetm.register_priority (hard_regno);
          if (best_hard_regno < 0 || hard_regno_costs[hard_regno] < best_cost
              || (hard_regno_costs[hard_regno] == best_cost
Index: lra-coalesce.c
===================================================================
--- lra-coalesce.c      (revision 206089)
+++ lra-coalesce.c      (working copy)
@@ -79,8 +79,8 @@ move_freq_compare_func (const void *v1p,
   rtx mv2 = *(const rtx *) v2p;
   int pri1, pri2;

-  pri1 = BLOCK_FOR_INSN (mv1)->frequency;
-  pri2 = BLOCK_FOR_INSN (mv2)->frequency;
+  pri1 = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (mv1));
+  pri2 = REG_FREQ_FROM_BB (BLOCK_FOR_INSN (mv2));
   if (pri2 - pri1)
     return pri2 - pri1;

@@ -277,7 +277,7 @@ lra_coalesce (void)
            fprintf
              (lra_dump_file, "      Coalescing move %i:r%d-r%d (freq=%d)\n",
               INSN_UID (mv), sregno, dregno,
-              BLOCK_FOR_INSN (mv)->frequency);
+              REG_FREQ_FROM_BB (BLOCK_FOR_INSN (mv)));
          /* We updated involved_insns_bitmap when doing the merge.  */
        }
       else if (!(lra_intersected_live_ranges_p
@@ -291,7 +291,7 @@ lra_coalesce (void)
               "  Coalescing move %i:r%d(%d)-r%d(%d) (freq=%d)\n",
               INSN_UID (mv), sregno, ORIGINAL_REGNO (SET_SRC (set)),
               dregno, ORIGINAL_REGNO (SET_DEST (set)),
-              BLOCK_FOR_INSN (mv)->frequency);
+              REG_FREQ_FROM_BB (BLOCK_FOR_INSN (mv)));
          bitmap_ior_into (&involved_insns_bitmap,
                           &lra_reg_info[sregno].insn_bitmap);
          bitmap_ior_into (&involved_insns_bitmap,
@@ -316,7 +316,8 @@ lra_coalesce (void)
                /* Coalesced move.  */
                if (lra_dump_file != NULL)
                  fprintf (lra_dump_file, "      Removing move %i (freq=%d)\n",
-                        INSN_UID (insn), BLOCK_FOR_INSN (insn)->frequency);
+                          INSN_UID (insn),
+                          REG_FREQ_FROM_BB (BLOCK_FOR_INSN (insn)));
                lra_set_insn_deleted (insn);
              }
          }
Index: lra-constraints.c
===================================================================
--- lra-constraints.c   (revision 206089)
+++ lra-constraints.c   (working copy)
@@ -4077,7 +4077,7 @@ lra_constraints (bool first_p)
                      fprintf (lra_dump_file,
                               "      Removing equiv init insn %i (freq=%d)\n",
                               INSN_UID (curr_insn),
-                              BLOCK_FOR_INSN (curr_insn)->frequency);
+                              REG_FREQ_FROM_BB (BLOCK_FOR_INSN (curr_insn)));
                      dump_insn_slim (lra_dump_file, curr_insn);
                    }
                  if (contains_reg_p (x, true, false))


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2013-12-18 22:14 ` vmakarov at gcc dot gnu.org
@ 2013-12-19  9:26 ` ramana at gcc dot gnu.org
  2013-12-19  9:54 ` ramana at gcc dot gnu.org
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ramana at gcc dot gnu.org @ 2013-12-19  9:26 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

Ramana Radhakrishnan <ramana at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2013-12-19
                 CC|                            |ramana at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #11 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
(In reply to Vladimir Makarov from comment #9)
> (In reply to Richard Earnshaw from comment #5)

> I think major problem is in wrong alternative choices as thumb lo/hi reg
> usage is complicated.

That is probably going to be the reason.

> 
> I take this bug very seriously.  If I cannot fix it till end of Jan (sorry,
> it is a vacation time), probably we should switch to reload pass for thumb.

Thanks very much for taking this so seriously.


> 
>  To be honest, I don't know why 12 is not fixed.  It results in using
> 12 by IRA and bigger code when even reload is used.  I believe it
> should fixed too.


According to the ABI r12 is a caller saved register. Am I correct in
understanding that we are taking out a caller-saved register to allow for lra
to be able to choose other low regs and thereby prevent movements to and from
r12 ? 

Taking out r12 will have performance implications, so any change here needs to
be benchmarked very carefully on all cores with Thumb1 and Thumb2 to make sure
that the impact. Additionally we shouldn't be needing to do this in ARM state,
so this patchlet as it stands is not enough.


regards
Ramana


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2013-12-19  9:26 ` ramana at gcc dot gnu.org
@ 2013-12-19  9:54 ` ramana at gcc dot gnu.org
  2013-12-19 12:58 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: ramana at gcc dot gnu.org @ 2013-12-19  9:54 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #12 from Ramana Radhakrishnan <ramana at gcc dot gnu.org> ---
(In reply to Ramana Radhakrishnan from comment #11)
> (In reply to Vladimir Makarov from comment #9)
> > (In reply to Richard Earnshaw from comment #5)
> 
> > I think major problem is in wrong alternative choices as thumb lo/hi reg
> > usage is complicated.
> 
> That is probably going to be the reason.
> 
> > 
> > I take this bug very seriously.  If I cannot fix it till end of Jan (sorry,
> > it is a vacation time), probably we should switch to reload pass for thumb.
> 
> Thanks very much for taking this so seriously.
> 
> 
> > 
> >  To be honest, I don't know why 12 is not fixed.  It results in using
> > 12 by IRA and bigger code when even reload is used.  I believe it
> > should fixed too.
> 
> 
> According to the ABI r12 is a caller saved register. Am I correct in
> understanding that we are taking out a caller-saved register to allow for
> lra to be able to choose other low regs and thereby prevent movements to and
> from r12 ? 
> 

Scratch that  : it's already for t16 and Os. Changing LAST_HI_REGNUM to 14 in
arm.h should be enough.

regards
Ramana


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (9 preceding siblings ...)
  2013-12-19  9:54 ` ramana at gcc dot gnu.org
@ 2013-12-19 12:58 ` rguenth at gcc dot gnu.org
  2013-12-19 17:12 ` rearnsha at gcc dot gnu.org
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rguenth at gcc dot gnu.org @ 2013-12-19 12:58 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |missed-optimization, ra
           Priority|P3                          |P1
   Target Milestone|---                         |4.9.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (10 preceding siblings ...)
  2013-12-19 12:58 ` rguenth at gcc dot gnu.org
@ 2013-12-19 17:12 ` rearnsha at gcc dot gnu.org
  2013-12-19 21:42 ` vmakarov at gcc dot gnu.org
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2013-12-19 17:12 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #13 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
The original reason we took most high registers out of the available registers
list for -Os is because saving them (they're mostly callee-saved) is quite
expensive -- they have to be copied to low registers first.  IP was the
exception, since it's call-clobbered and there's thus no code cost to make it
available.

While I'm happy to take IP out of the available registers list for now as a
short-term expedient work-around, I see this as being indicative a more
fundamental problem.

When not optimizing for size, the register allocator should be able to use the
high registers as alternatives to spill slots (copy the value to a high
register rather than spill it to the stack).  Once we have that, it would make
sense for the few cases where high registers are permitted to legitimately use
the copy that is up there rather than moving it back again to a low register
first, so I wouldn't want to change the insn patterns to remove high registers
entirely from operations like add, sub and compare.

On the other hand, LRA does need to be aware that once a value is in a high
register, really spilling it to memory, or loading it from memory is an
expensive operation, since such transfers have to go via a low register.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (11 preceding siblings ...)
  2013-12-19 17:12 ` rearnsha at gcc dot gnu.org
@ 2013-12-19 21:42 ` vmakarov at gcc dot gnu.org
  2014-01-03 14:44 ` rearnsha at gcc dot gnu.org
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2013-12-19 21:42 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #14 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
(In reply to Richard Earnshaw from comment #13)

> When not optimizing for size, the register allocator should be able to use
> the high registers as alternatives to spill slots (copy the value to a high
> register rather than spill it to the stack).  Once we have that, it would
> make sense for the few cases where high registers are permitted to
> legitimately use the copy that is up there rather than moving it back again
> to a low register first, so I wouldn't want to change the insn patterns to
> remove high registers entirely from operations like add, sub and compare.
> 

I'll continue my work on this problem in 2 weeks (after my vacation). I'd like
to implement the proposed spilling into high regs and try to tune choosing insn
alternatives closer to reload (but it is very sensitive area as it affects code
generation for other LRA targets).

> On the other hand, LRA does need to be aware that once a value is in a high
> register, really spilling it to memory, or loading it from memory is an
> expensive operation, since such transfers have to go via a low register.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (12 preceding siblings ...)
  2013-12-19 21:42 ` vmakarov at gcc dot gnu.org
@ 2014-01-03 14:44 ` rearnsha at gcc dot gnu.org
  2014-02-14 16:19 ` vmakarov at gcc dot gnu.org
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: rearnsha at gcc dot gnu.org @ 2014-01-03 14:44 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #15 from Richard Earnshaw <rearnsha at gcc dot gnu.org> ---
Another testcase where the thumb1 code is poor is
gcc.c-torture/execute/pr28982b.c

With LRA we often get sequences such as:

        mov     r3, sp
        ldr     r2, .L8+16
        add     r3, r3, r2         // r2 dead.
        str     r0, [r3]

Instead of:

        ldr     r2, .L8+16
        add     r2, r2, sp
        str     r0, [r2]

Which is both shorter and needs fewer registers.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (13 preceding siblings ...)
  2014-01-03 14:44 ` rearnsha at gcc dot gnu.org
@ 2014-02-14 16:19 ` vmakarov at gcc dot gnu.org
  2014-06-12  7:47 ` fredrik.hederstierna@securitas-direct.com
                   ` (3 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: vmakarov at gcc dot gnu.org @ 2014-02-14 16:19 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #16 from Vladimir Makarov <vmakarov at gcc dot gnu.org> ---
Author: vmakarov
Date: Fri Feb 14 16:18:29 2014
New Revision: 207787

URL: http://gcc.gnu.org/viewcvs?rev=207787&root=gcc&view=rev
Log:
2014-02-14  Vladimir Makarov  <vmakarov@redhat.com>
        Richard Earnshaw  <rearnsha@arm.com>

    PR rtl-optimization/59535
    * lra-constraints.c (process_alt_operands): Encourage alternative
    when unassigned pseudo class is superset of the alternative class.
    (inherit_reload_reg): Don't inherit when optimizing for code size.
    * config/arm/arm.h (MODE_BASE_REG_CLASS): Add version for LRA
    returning CORE_REGS for anything but Thumb1 and BASE_REGS for
    modes not less than 4 for Thumb1.


Modified:
    trunk/gcc/ChangeLog
    trunk/gcc/config/arm/arm.h
    trunk/gcc/lra-constraints.c


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (14 preceding siblings ...)
  2014-02-14 16:19 ` vmakarov at gcc dot gnu.org
@ 2014-06-12  7:47 ` fredrik.hederstierna@securitas-direct.com
  2014-06-13 11:44 ` ramana.radhakrishnan at arm dot com
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 20+ messages in thread
From: fredrik.hederstierna@securitas-direct.com @ 2014-06-12  7:47 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

Fredrik Hederstierna <fredrik.hederstierna@securitas-direct.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fredrik.hederstierna@securi
                   |                            |tas-direct.com

--- Comment #18 from Fredrik Hederstierna <fredrik.hederstierna@securitas-direct.com> ---
I compared GCC 4.8.3 and GCC 4.9.0 for arm-none-eabi, and I still see a code
size increase for thumb1 (and thumb2) for both my arm966e and my cortex-m4
targets.

GCC 4.8.3
  RAM used     93812 
  Flash used   515968

GCC 4.9.0
  RAM used     93812 (same)
  Flash used   522608 (+1.3%)

Then I tried to disable LRA and results got better:

GCC 4.9.0 : added flag "-mno-lra"
  RAM used     93812 (same)
  Flash used   519624 (+0.7%)

Flags used are otherwise identical for both tests:

  -Os -g3 -ggdb3 -gdwarf-4
  -fvar-tracking-assignments -fverbose-asm -fno-common -ffunction-sections  
-fdata-sections -fno-exceptions -fno-asynchronous-unwind-tables
-fno-unwind-tables
  -mthumb -mcpu=arm966e-s -msoft-float -mno-unaligned-access

Generally GCC 4.9.0 seems to produce larger code, I tried to experiement with
LTO (-flto -flto-fat-objects), but then code size increased even more for both
GCC 4.8.3 and GCC 4.9.0, I was expecting a code decrease though.

Sorry I cannot share exact sources used for compilation here, I can share
toolchain build script though on request, or try to setup a small test case.

I first just wanted to confirm that this bug really is fixed and resolved, so
its not a new bug or another known issue.

BR /Fredrik


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (15 preceding siblings ...)
  2014-06-12  7:47 ` fredrik.hederstierna@securitas-direct.com
@ 2014-06-13 11:44 ` ramana.radhakrishnan at arm dot com
  2014-09-03  6:17 ` zhenqiang.chen at arm dot com
  2014-09-03  7:45 ` fredrik.hederstierna@securitas-direct.com
  18 siblings, 0 replies; 20+ messages in thread
From: ramana.radhakrishnan at arm dot com @ 2014-06-13 11:44 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #19 from ramana.radhakrishnan at arm dot com <ramana.radhakrishnan at arm dot com> ---
On 06/12/14 08:46, fredrik.hederstierna@securitas-direct.com wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535
>
> Fredrik Hederstierna <fredrik.hederstierna@securitas-direct.com> changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>                   CC|                            |fredrik.hederstierna@securi
>                     |                            |tas-direct.com
>
> --- Comment #18 from Fredrik Hederstierna <fredrik.hederstierna@securitas-direct.com> ---
> I compared GCC 4.8.3 and GCC 4.9.0 for arm-none-eabi, and I still see a code
> size increase for thumb1 (and thumb2) for both my arm966e and my cortex-m4
> targets.
>
> GCC 4.8.3
>    RAM used     93812
>    Flash used   515968
>
> GCC 4.9.0
>    RAM used     93812 (same)
>    Flash used   522608 (+1.3%)
>
> Then I tried to disable LRA and results got better:
>
> GCC 4.9.0 : added flag "-mno-lra"
>    RAM used     93812 (same)
>    Flash used   519624 (+0.7%)
>
> Flags used are otherwise identical for both tests:
>
>    -Os -g3 -ggdb3 -gdwarf-4
>    -fvar-tracking-assignments -fverbose-asm -fno-common -ffunction-sections
> -fdata-sections -fno-exceptions -fno-asynchronous-unwind-tables
> -fno-unwind-tables
>    -mthumb -mcpu=arm966e-s -msoft-float -mno-unaligned-access
>
> Generally GCC 4.9.0 seems to produce larger code, I tried to experiement with
> LTO (-flto -flto-fat-objects), but then code size increased even more for both
> GCC 4.8.3 and GCC 4.9.0, I was expecting a code decrease though.
>
> Sorry I cannot share exact sources used for compilation here, I can share
> toolchain build script though on request, or try to setup a small test case.
>
> I first just wanted to confirm that this bug really is fixed and resolved, so
> its not a new bug or another known issue.

It might be another issue or it may well be an issue with LRA not many 
could tell for certain unless we could get a small testcase to look at.

What we'd like is a small testcase that shows the problem compared with 
gcc 4.8.3 to progress this further.

Please file a new bug report following the instructions in 
https://gcc.gnu.org/bugs/#report

in this particular case we'd be interested in all command line options 
that were used.


regards
Ramana


>
> BR /Fredrik
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (16 preceding siblings ...)
  2014-06-13 11:44 ` ramana.radhakrishnan at arm dot com
@ 2014-09-03  6:17 ` zhenqiang.chen at arm dot com
  2014-09-03  7:45 ` fredrik.hederstierna@securitas-direct.com
  18 siblings, 0 replies; 20+ messages in thread
From: zhenqiang.chen at arm dot com @ 2014-09-03  6:17 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

Zhenqiang Chen <zhenqiang.chen at arm dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |zhenqiang.chen at arm dot com

--- Comment #20 from Zhenqiang Chen <zhenqiang.chen at arm dot com> ---
Here is a small case to show lra introduces one more register copy (tested with
trunk and 4.9).

int isascii (int c)
{
  return c >= 0 && c < 128;
}
With options: -Os -mthumb -mcpu=cortex-m0, I got

isascii:
    mov    r3, #0
    mov    r2, #127
    mov    r1, r3   //???
    cmp    r2, r0
    adc    r1, r1, r3
    mov    r0, r1
    bx    lr

With options: -Os -mthumb -mcpu=cortex-m0 -mno-lra, I got

isascii:
    mov    r2, #127
    mov    r3, #0
    cmp    r2, r0
    adc    r3, r3, r3
    mov    r0, r3
    bx    lr


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA
  2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
                   ` (17 preceding siblings ...)
  2014-09-03  6:17 ` zhenqiang.chen at arm dot com
@ 2014-09-03  7:45 ` fredrik.hederstierna@securitas-direct.com
  18 siblings, 0 replies; 20+ messages in thread
From: fredrik.hederstierna@securitas-direct.com @ 2014-09-03  7:45 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59535

--- Comment #21 from Fredrik Hederstierna <fredrik.hederstierna@securitas-direct.com> ---
I filed this previously, maybe its duplicate

Bug 61578 - Code size increase for ARM thumb compared to 4.8.x when compiling
with -Os

BR Fredrik


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2014-09-03  7:45 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-17  9:30 [Bug rtl-optimization/59535] New: [4.9 regression] -Os code size regressions for Thumb1/Thumb2 (with LRA)? rearnsha at gcc dot gnu.org
2013-12-17  9:32 ` [Bug rtl-optimization/59535] " rearnsha at gcc dot gnu.org
2013-12-17  9:53 ` rearnsha at gcc dot gnu.org
2013-12-17 11:14 ` [Bug rtl-optimization/59535] [4.9 regression] -Os code size regressions for Thumb1/Thumb2 with LRA rearnsha at gcc dot gnu.org
2013-12-17 11:22 ` rearnsha at gcc dot gnu.org
2013-12-17 22:47 ` yvan.roux at linaro dot org
2013-12-18  9:29 ` yvan.roux at linaro dot org
2013-12-18 16:06 ` vmakarov at gcc dot gnu.org
2013-12-18 22:14 ` vmakarov at gcc dot gnu.org
2013-12-19  9:26 ` ramana at gcc dot gnu.org
2013-12-19  9:54 ` ramana at gcc dot gnu.org
2013-12-19 12:58 ` rguenth at gcc dot gnu.org
2013-12-19 17:12 ` rearnsha at gcc dot gnu.org
2013-12-19 21:42 ` vmakarov at gcc dot gnu.org
2014-01-03 14:44 ` rearnsha at gcc dot gnu.org
2014-02-14 16:19 ` vmakarov at gcc dot gnu.org
2014-06-12  7:47 ` fredrik.hederstierna@securitas-direct.com
2014-06-13 11:44 ` ramana.radhakrishnan at arm dot com
2014-09-03  6:17 ` zhenqiang.chen at arm dot com
2014-09-03  7:45 ` fredrik.hederstierna@securitas-direct.com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).