public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Help: Register allocator sets up frame at low register pressure (PR 50775)
@ 2011-10-25 13:19 Georg-Johann Lay
  2011-10-30 23:28 ` Denis Chertykov
  0 siblings, 1 reply; 2+ messages in thread
From: Georg-Johann Lay @ 2011-10-25 13:19 UTC (permalink / raw)
  To: gcc

[-- Attachment #1: Type: text/plain, Size: 2707 bytes --]

With the following, small C test program


typedef struct
{
    unsigned char a, b, c, d;
} s_t;

unsigned char func1 (s_t *x, s_t *y, s_t *z)
{
    unsigned char s = 0;
    s += x->a;
    s += y->a;
    s += z->a;

    s += x->b;
    s += y->b;
    s += z->b;

    s += x->c;
    s += y->c;
    s += z->c;

    return s;
}

there is a frame pointer set up for no apparent reason.

The machine for which this code is compiled for (AVR) has just few pointer
registers and taking away one of them to use it as frame pointer leads to
severe performance degradation in many real-world programs: moving from/to
memory is more expensive than movon around registers, setting up a frame is
expensive and taking away 1 of 2 address registers is expensive.

What I tried and what did not fix it:

- increase targetm.memory_move_cost (up to unsane value)
- play around with targetm.class_likely_spilled_p

The program is compiled with

$ avr-gcc in.c -S -Os -fdump-rtl-ira-details -fdump-rtl-postreload-details
-mmcu=avr4 -mstrict-X

with avr-gcc from current trunk SVN r180399.


The issue is that AVR has only 3 pointer registers X, Y, and Z with the
following addressing capabilities:

 *X, *X++, *--X             (R27:R26, call-clobbered)
 *Y, *Y++, *--Y, *(Y+const) (R28:R29, call-saved, frame pointer)
 *Z, *Z++, *--Z, *(Z+const) (R30:R31, call-clobbered)

Older version of the compiler prior to 4.7 trunk r179993 allowed a fake
addressing mode *(X+const) and emulated it by emitting appropriate instructions
sequence like

  X = X + const
  r = *X
  X = X - const

which was only a rare corner case in the old register allocator, but in the new
allocator this sequence is seen very often leading to code bloat of +50% for
some real-world functions.

This is the reason why the command line option -mstrict-X has been added to the
AVR backend, see PR46278.

This option denies fake *(X+const) addressing but leads to the mentioned spills
from register allocator and to code even worse as compared to without setting
-mstrict-X, i.e. register allocator sabotages a smart usage of the address
registers.

All I see is that reload1.c:alter_reg() generates the spill because
ira_conflicts_p is true.

With the option -morder1 turn on (affects ADJUST_REG_ALLOC_ORDER) there is
still a frame set up even though never accessed.

Can anyone give me some advice how to proceed with this issue?

Can be said if this is a target issue or IRA/reload flaw?

FYI, you find attached the IRA dump as generated with the above command line

Thanks for any hints!

Johann

GCC configured with:

--target=avr --disable-nls --disable-shared --enable-languages=c,c++
--with-dwarf2 --disable-lto --enable-checking=yes,rtl --enable-doc

[-- Attachment #2: in.c.192r.ira --]
[-- Type: text/plain, Size: 25453 bytes --]


;; Function func1 (func1, funcdef_no=0, decl_uid=1230, cgraph_uid=0)

starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
Building IRA IR
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called

Pass 0 for finding pseudo/allocno costs

    a0 (r72,l0) best GENERAL_REGS, allocno GENERAL_REGS
    a3 (r70,l0) best GENERAL_REGS, allocno GENERAL_REGS
    a5 (r69,l0) best GENERAL_REGS, allocno GENERAL_REGS
    a7 (r68,l0) best GENERAL_REGS, allocno GENERAL_REGS
    a8 (r67,l0) best GENERAL_REGS, allocno GENERAL_REGS
    a9 (r66,l0) best GENERAL_REGS, allocno GENERAL_REGS
    a10 (r65,l0) best GENERAL_REGS, allocno GENERAL_REGS
    a11 (r64,l0) best GENERAL_REGS, allocno GENERAL_REGS
    a2 (r62,l0) best BASE_POINTER_REGS, allocno BASE_POINTER_REGS
    a4 (r61,l0) best BASE_POINTER_REGS, allocno BASE_POINTER_REGS
    a6 (r60,l0) best BASE_POINTER_REGS, allocno BASE_POINTER_REGS
    a1 (r56,l0) best GENERAL_REGS, allocno GENERAL_REGS

  a0(r72,l0) costs: POINTER_X_REGS:2000,2000 POINTER_Y_REGS:2000,2000 POINTER_Z_REGS:2000,2000 BASE_POINTER_REGS:2000,2000 POINTER_REGS:2000,2000 ADDW_REGS:2000,2000 SIMPLE_LD_REGS:2000,2000 LD_REGS:2000,2000 NO_LD_REGS:2000,2000 GENERAL_REGS:2000,2000 ALL_REGS:2000,2000 MEM:6000,6000
  a1(r56,l0) costs: POINTER_X_REGS:2000,2000 POINTER_Y_REGS:2000,2000 POINTER_Z_REGS:2000,2000 BASE_POINTER_REGS:2000,2000 POINTER_REGS:2000,2000 ADDW_REGS:2000,2000 SIMPLE_LD_REGS:2000,2000 LD_REGS:2000,2000 NO_LD_REGS:2000,2000 GENERAL_REGS:2000,2000 ALL_REGS:2000,2000 MEM:34000,34000
  a2(r62,l0) costs: POINTER_X_REGS:4000,4000 POINTER_Y_REGS:0,0 POINTER_Z_REGS:0,0 BASE_POINTER_REGS:0,0 POINTER_REGS:4000,4000 ADDW_REGS:6000,6000 SIMPLE_LD_REGS:6000,6000 LD_REGS:6000,6000 NO_LD_REGS:6000,6000 GENERAL_REGS:6000,6000 ALL_REGS:6000,6000 MEM:16000,16000
  a3(r70,l0) costs: POINTER_X_REGS:0,0 POINTER_Y_REGS:0,0 POINTER_Z_REGS:0,0 BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0 SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:4000,4000
  a4(r61,l0) costs: POINTER_X_REGS:4000,4000 POINTER_Y_REGS:0,0 POINTER_Z_REGS:0,0 BASE_POINTER_REGS:0,0 POINTER_REGS:4000,4000 ADDW_REGS:6000,6000 SIMPLE_LD_REGS:6000,6000 LD_REGS:6000,6000 NO_LD_REGS:6000,6000 GENERAL_REGS:6000,6000 ALL_REGS:6000,6000 MEM:16000,16000
  a5(r69,l0) costs: POINTER_X_REGS:0,0 POINTER_Y_REGS:0,0 POINTER_Z_REGS:0,0 BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0 SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:4000,4000
  a6(r60,l0) costs: POINTER_X_REGS:4000,4000 POINTER_Y_REGS:0,0 POINTER_Z_REGS:0,0 BASE_POINTER_REGS:0,0 POINTER_REGS:4000,4000 ADDW_REGS:6000,6000 SIMPLE_LD_REGS:6000,6000 LD_REGS:6000,6000 NO_LD_REGS:6000,6000 GENERAL_REGS:6000,6000 ALL_REGS:6000,6000 MEM:16000,16000
  a7(r68,l0) costs: POINTER_X_REGS:0,0 POINTER_Y_REGS:0,0 POINTER_Z_REGS:0,0 BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0 SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:4000,4000
  a8(r67,l0) costs: POINTER_X_REGS:0,0 POINTER_Y_REGS:0,0 POINTER_Z_REGS:0,0 BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0 SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:4000,4000
  a9(r66,l0) costs: POINTER_X_REGS:0,0 POINTER_Y_REGS:0,0 POINTER_Z_REGS:0,0 BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0 SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:4000,4000
  a10(r65,l0) costs: POINTER_X_REGS:0,0 POINTER_Y_REGS:0,0 POINTER_Z_REGS:0,0 BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0 SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:4000,4000
  a11(r64,l0) costs: POINTER_X_REGS:0,0 POINTER_Y_REGS:0,0 POINTER_Z_REGS:0,0 BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0 SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 ALL_REGS:0,0 MEM:4000,4000


Pass 1 for finding pseudo/allocno costs

    r72: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
    r70: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
    r69: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
    r68: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
    r67: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
    r66: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
    r65: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
    r64: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS
    r62: preferred BASE_POINTER_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS
    r61: preferred BASE_POINTER_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS
    r60: preferred BASE_POINTER_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS
    r56: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS

  a0(r72,l0) costs: GENERAL_REGS:2000,2000 MEM:6000,6000
  a1(r56,l0) costs: GENERAL_REGS:2000,2000 MEM:34000,34000
  a2(r62,l0) costs: POINTER_X_REGS:4000,4000 BASE_POINTER_REGS:0,0 POINTER_REGS:4000,4000 ADDW_REGS:6000,6000 SIMPLE_LD_REGS:6000,6000 LD_REGS:6000,6000 NO_LD_REGS:6000,6000 GENERAL_REGS:6000,6000 ALL_REGS:6000,6000 MEM:16000,16000
  a3(r70,l0) costs: GENERAL_REGS:0,0 MEM:4000,4000
  a4(r61,l0) costs: POINTER_X_REGS:4000,4000 BASE_POINTER_REGS:0,0 POINTER_REGS:4000,4000 ADDW_REGS:6000,6000 SIMPLE_LD_REGS:6000,6000 LD_REGS:6000,6000 NO_LD_REGS:6000,6000 GENERAL_REGS:6000,6000 ALL_REGS:6000,6000 MEM:16000,16000
  a5(r69,l0) costs: GENERAL_REGS:0,0 MEM:4000,4000
  a6(r60,l0) costs: POINTER_X_REGS:4000,4000 BASE_POINTER_REGS:0,0 POINTER_REGS:4000,4000 ADDW_REGS:6000,6000 SIMPLE_LD_REGS:6000,6000 LD_REGS:6000,6000 NO_LD_REGS:6000,6000 GENERAL_REGS:6000,6000 ALL_REGS:6000,6000 MEM:16000,16000
  a7(r68,l0) costs: GENERAL_REGS:0,0 MEM:4000,4000
  a8(r67,l0) costs: GENERAL_REGS:0,0 MEM:4000,4000
  a9(r66,l0) costs: GENERAL_REGS:0,0 MEM:4000,4000
  a10(r65,l0) costs: GENERAL_REGS:0,0 MEM:4000,4000
  a11(r64,l0) costs: GENERAL_REGS:0,0 MEM:4000,4000

   Insn 32(l0): point = 0
   Insn 29(l0): point = 2
   Insn 23(l0): point = 4
   Insn 22(l0): point = 6
   Insn 21(l0): point = 8
   Insn 20(l0): point = 10
   Insn 19(l0): point = 12
   Insn 18(l0): point = 14
   Insn 17(l0): point = 16
   Insn 16(l0): point = 18
   Insn 15(l0): point = 20
   Insn 14(l0): point = 22
   Insn 13(l0): point = 24
   Insn 12(l0): point = 26
   Insn 11(l0): point = 28
   Insn 10(l0): point = 30
   Insn 9(l0): point = 32
   Insn 8(l0): point = 34
   Insn 4(l0): point = 36
   Insn 3(l0): point = 38
   Insn 2(l0): point = 40
 a0(r72): [3..4]
 a1(r56): [3..34]
 a2(r62 [0]): [5..36]
 a2(r62 [1]): [5..36]
 a3(r70): [7..8]
 a4(r61 [0]): [9..38]
 a4(r61 [1]): [9..38]
 a5(r69): [11..12]
 a6(r60 [0]): [13..40]
 a6(r60 [1]): [13..40]
 a7(r68): [15..16]
 a8(r67): [19..20]
 a9(r66): [23..24]
 a10(r65): [27..28]
 a11(r64): [31..32]
Compressing live ranges: from 43 to 16 - 37%
Ranges after the compression:
 a0(r72): [0..1]
 a1(r56): [0..15]
 a2(r62 [0]): [2..15]
 a2(r62 [1]): [2..15]
 a3(r70): [2..3]
 a4(r61 [0]): [4..15]
 a4(r61 [1]): [4..15]
 a5(r69): [4..5]
 a6(r60 [0]): [6..15]
 a6(r60 [1]): [6..15]
 a7(r68): [6..7]
 a8(r67): [8..9]
 a9(r66): [10..11]
 a10(r65): [12..13]
 a11(r64): [14..15]
+++Allocating 60 bytes for conflict table (uncompressed size 60)
;; a0(r72,l0) conflicts: a1(r56,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a1(r56,l0) conflicts: a0(r72,l0) a3(r70,l0) a2(r62,w0,l0) a2(r62,w1,l0) a5(r69,l0) a4(r61,w0,l0) a4(r61,w1,l0) a7(r68,l0) a6(r60,w0,l0) a6(r60,w1,l0) a8(r67,l0) a9(r66,l0) a10(r65,l0) a11(r64,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a2(r62,l0) conflicts:
;;   subobject 0: a1(r56,l0) a3(r70,l0) a5(r69,l0) a4(r61,w0,l0) a4(r61,w1,l0) a7(r68,l0) a6(r60,w0,l0) a6(r60,w1,l0) a8(r67,l0) a9(r66,l0) a10(r65,l0) a11(r64,l0)
;;     total conflict hard regs:
;;     conflict hard regs:


;;   subobject 1: a1(r56,l0) a3(r70,l0) a5(r69,l0) a4(r61,w0,l0) a7(r68,l0) a6(r60,w0,l0) a8(r67,l0) a9(r66,l0) a10(r65,l0) a11(r64,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a3(r70,l0) conflicts: a1(r56,l0) a2(r62,w0,l0) a2(r62,w1,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a4(r61,l0) conflicts:
;;   subobject 0: a1(r56,l0) a2(r62,w0,l0) a2(r62,w1,l0) a5(r69,l0) a7(r68,l0) a6(r60,w0,l0) a6(r60,w1,l0) a8(r67,l0) a9(r66,l0) a10(r65,l0) a11(r64,l0)
;;     total conflict hard regs: 20 21
;;     conflict hard regs: 20 21


;;   subobject 1: a1(r56,l0) a2(r62,w0,l0) a5(r69,l0) a7(r68,l0) a6(r60,w0,l0) a8(r67,l0) a9(r66,l0) a10(r65,l0) a11(r64,l0)
;;     total conflict hard regs: 20 21
;;     conflict hard regs: 20 21

;; a5(r69,l0) conflicts: a1(r56,l0) a2(r62,w0,l0) a2(r62,w1,l0) a4(r61,w0,l0) a4(r61,w1,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a6(r60,l0) conflicts:
;;   subobject 0: a1(r56,l0) a2(r62,w0,l0) a2(r62,w1,l0) a4(r61,w0,l0) a4(r61,w1,l0) a7(r68,l0) a8(r67,l0) a9(r66,l0) a10(r65,l0) a11(r64,l0)
;;     total conflict hard regs: 20-23
;;     conflict hard regs: 20-23


;;   subobject 1: a1(r56,l0) a2(r62,w0,l0) a4(r61,w0,l0) a7(r68,l0) a8(r67,l0) a9(r66,l0) a10(r65,l0) a11(r64,l0)
;;     total conflict hard regs: 20-23
;;     conflict hard regs: 20-23

;; a7(r68,l0) conflicts: a1(r56,l0) a2(r62,w0,l0) a2(r62,w1,l0) a4(r61,w0,l0) a4(r61,w1,l0) a6(r60,w0,l0) a6(r60,w1,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a8(r67,l0) conflicts: a1(r56,l0) a2(r62,w0,l0) a2(r62,w1,l0) a4(r61,w0,l0) a4(r61,w1,l0) a6(r60,w0,l0) a6(r60,w1,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a9(r66,l0) conflicts: a1(r56,l0) a2(r62,w0,l0) a2(r62,w1,l0) a4(r61,w0,l0) a4(r61,w1,l0) a6(r60,w0,l0) a6(r60,w1,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a10(r65,l0) conflicts: a1(r56,l0) a2(r62,w0,l0) a2(r62,w1,l0) a4(r61,w0,l0) a4(r61,w1,l0) a6(r60,w0,l0) a6(r60,w1,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

;; a11(r64,l0) conflicts: a1(r56,l0) a2(r62,w0,l0) a2(r62,w1,l0) a4(r61,w0,l0) a4(r61,w1,l0) a6(r60,w0,l0) a6(r60,w1,l0)
;;     total conflict hard regs:
;;     conflict hard regs:

  regions=1, blocks=3, points=16
    allocnos=12 (big 3), copies=0, conflicts=0, ranges=15

**** Allocnos coloring:


  Loop 0 (parent -1, header bb0, depth 0)
    bbs: 2
    all: 0r72 1r56 2r62 3r70 4r61 5r69 6r60 7r68 8r67 9r66 10r65 11r64
    modified regnos: 56 60 61 62 64 65 66 67 68 69 70 72
    border:
    Pressure: GENERAL_REGS=8
    Hard reg set forest:
      0:( 2-31)@200000
        1:( 2-19 22-31)@64000
          2:( 2-19 24-31)@64000
      Allocno a0r72 of GENERAL_REGS(30) has 30 avail. regs 2-31 (confl regs =  0 1 32-34 ) node:  2-31
      Allocno a1r56 of GENERAL_REGS(30) has 30 avail. regs 2-31 (confl regs =  0 1 32-34 ) node:  2-31
      Allocno a2r62 of GENERAL_REGS(30) has 29 avail. regs obj 0 2-31 (confl regs =  0 1 32-34 ) node:  2-31,  obj 1 2-31 (confl regs =  0 1 32-34 ) node:  2-31
      Allocno a3r70 of GENERAL_REGS(30) has 30 avail. regs 2-31 (confl regs =  0 1 32-34 ) node:  2-31
      Allocno a4r61 of GENERAL_REGS(30) has 26 avail. regs obj 0 2-19 22-31 (confl regs =  0 1 20 21 32-34 ) node:  2-19 22-31,  obj 1 2-19 22-31 (confl regs =  0 1 20 21 32-34 ) node:  2-19 22-31
      Allocno a5r69 of GENERAL_REGS(30) has 30 avail. regs 2-31 (confl regs =  0 1 32-34 ) node:  2-31
      Allocno a6r60 of GENERAL_REGS(30) has 24 avail. regs obj 0 2-19 24-31 (confl regs =  0 1 20-23 32-34 ) node:  2-19 24-31,  obj 1 2-19 24-31 (confl regs =  0 1 20-23 32-34 ) node:  2-19 24-31
      Allocno a7r68 of GENERAL_REGS(30) has 30 avail. regs 2-31 (confl regs =  0 1 32-34 ) node:  2-31
      Allocno a8r67 of GENERAL_REGS(30) has 30 avail. regs 2-31 (confl regs =  0 1 32-34 ) node:  2-31
      Allocno a9r66 of GENERAL_REGS(30) has 30 avail. regs 2-31 (confl regs =  0 1 32-34 ) node:  2-31
      Allocno a10r65 of GENERAL_REGS(30) has 30 avail. regs 2-31 (confl regs =  0 1 32-34 ) node:  2-31
      Allocno a11r64 of GENERAL_REGS(30) has 30 avail. regs 2-31 (confl regs =  0 1 32-34 ) node:  2-31
      Pushing a11(r64,l0)(cost 0)
      Pushing a10(r65,l0)(cost 0)
      Pushing a9(r66,l0)(cost 0)
      Pushing a8(r67,l0)(cost 0)
      Pushing a7(r68,l0)(cost 0)
      Pushing a5(r69,l0)(cost 0)
      Pushing a3(r70,l0)(cost 0)
      Pushing a0(r72,l0)(cost 0)
      Pushing a2(r62,l0)(cost 0)
      Pushing a4(r61,l0)(cost 0)
      Pushing a6(r60,l0)(cost 0)
      Pushing a1(r56,l0)(cost 0)
      Popping a1(r56,l0)  -- assign reg 25
      Popping a6(r60,l0)  -- assign reg 30
      Popping a4(r61,l0)  -- assign reg 28
      Popping a2(r62,l0)  -- assign reg 20
      Popping a0(r72,l0)  -- assign reg 24
      Popping a3(r70,l0)  -- assign reg 24
      Popping a5(r69,l0)  -- assign reg 24
      Popping a7(r68,l0)  -- assign reg 24
      Popping a8(r67,l0)  -- assign reg 24
      Popping a9(r66,l0)  -- assign reg 24
      Popping a10(r65,l0)  -- assign reg 24
      Popping a11(r64,l0)  -- assign reg 24
Disposition:
    1:r56  l0    25    6:r60  l0    30    4:r61  l0    28    2:r62  l0    20
   11:r64  l0    24   10:r65  l0    24    9:r66  l0    24    8:r67  l0    24
    7:r68  l0    24    5:r69  l0    24    3:r70  l0    24    0:r72  l0    24
New iteration of spill/restore move
+++Costs: overall 4000, reg 4000, mem 0, ld 0, st 0, move 0
+++       move loops 0, new jumps 0
insn=2, live_throughout: 20, 21, 22, 23, 32, dead_or_set: 24, 25, 60
insn=3, live_throughout: 20, 21, 32, 60, dead_or_set: 22, 23, 61
insn=4, live_throughout: 32, 60, 61, dead_or_set: 20, 21, 62
insn=8, live_throughout: 32, 60, 61, 62, dead_or_set: 56
insn=9, live_throughout: 32, 56, 60, 61, 62, dead_or_set: 64
insn=10, live_throughout: 32, 60, 61, 62, dead_or_set: 56, 64
insn=11, live_throughout: 32, 56, 60, 61, 62, dead_or_set: 65
insn=12, live_throughout: 32, 60, 61, 62, dead_or_set: 56, 65
insn=13, live_throughout: 32, 56, 60, 61, 62, dead_or_set: 66
insn=14, live_throughout: 32, 60, 61, 62, dead_or_set: 56, 66
insn=15, live_throughout: 32, 56, 60, 61, 62, dead_or_set: 67
insn=16, live_throughout: 32, 60, 61, 62, dead_or_set: 56, 67
insn=17, live_throughout: 32, 56, 60, 61, 62, dead_or_set: 68
insn=18, live_throughout: 32, 60, 61, 62, dead_or_set: 56, 68
insn=19, live_throughout: 32, 56, 61, 62, dead_or_set: 60, 69
insn=20, live_throughout: 32, 61, 62, dead_or_set: 56, 69
insn=21, live_throughout: 32, 56, 62, dead_or_set: 61, 70
insn=22, live_throughout: 32, 62, dead_or_set: 56, 70
insn=23, live_throughout: 32, 56, dead_or_set: 62, 72
insn=29, live_throughout: 32, dead_or_set: 24, 56, 72
insn=32, live_throughout: 24, 32, dead_or_set: 
changing reg in insn 10
changing reg in insn 8
changing reg in insn 12
changing reg in insn 14
changing reg in insn 16
changing reg in insn 18
changing reg in insn 20
changing reg in insn 22
changing reg in insn 10
changing reg in insn 12
changing reg in insn 14
changing reg in insn 16
changing reg in insn 18
changing reg in insn 20
changing reg in insn 22
changing reg in insn 29
changing reg in insn 2
changing reg in insn 19
changing reg in insn 13
changing reg in insn 9
changing reg in insn 3
changing reg in insn 8
changing reg in insn 21
changing reg in insn 15
changing reg in insn 4
changing reg in insn 23
changing reg in insn 17
changing reg in insn 11
changing reg in insn 9
changing reg in insn 10
changing reg in insn 11
changing reg in insn 12
changing reg in insn 13
changing reg in insn 14
changing reg in insn 15
changing reg in insn 16
changing reg in insn 17
changing reg in insn 18
changing reg in insn 19
changing reg in insn 20
changing reg in insn 21
changing reg in insn 22
changing reg in insn 23
changing reg in insn 29
Spilling for insn 11.
Using reg 26 for reload 0
Spilling for insn 17.
Using reg 30 for reload 0
Spilling for insn 23.
Using reg 30 for reload 0
      Try Assign 60(a6), cost=16000
changing reg in insn 2
changing reg in insn 9
changing reg in insn 13
changing reg in insn 19
      Assigning 60(freq=4000) a new slot 0
 Register 60 now on stack.

Spilling for insn 2.
Using reg 30 for reload 0
Spilling for insn 9.
Using reg 30 for reload 0
Using reg 30 for reload 1
Spilling for insn 11.
Using reg 30 for reload 0
Spilling for insn 13.
Using reg 30 for reload 1
Using reg 30 for reload 0
Spilling for insn 17.
Using reg 30 for reload 0
Spilling for insn 19.
Using reg 30 for reload 1
Using reg 30 for reload 0
Spilling for insn 23.
Using reg 30 for reload 0
      Try Assign 60(a6), cost=16000
      Try Assign 61(a4), cost=16000
changing reg in insn 3
changing reg in insn 15
changing reg in insn 21
changing reg in insn 8
      Assigning 61(freq=4000) a new slot 1
 Register 61 now on stack.

Spilling for insn 2.
Spilling for insn 3.
Spilling for insn 8.
Using reg 30 for reload 0
Spilling for insn 9.
Using reg 30 for reload 0
Spilling for insn 11.
Using reg 30 for reload 0
Spilling for insn 13.
Using reg 30 for reload 0
Spilling for insn 15.
Using reg 30 for reload 0
Spilling for insn 17.
Using reg 30 for reload 0
Spilling for insn 19.
Using reg 30 for reload 0
Spilling for insn 21.
Using reg 30 for reload 0
Spilling for insn 23.
Using reg 30 for reload 0
Spilling for insn 2.
Spilling for insn 3.
Spilling for insn 8.
Using reg 30 for reload 0
Spilling for insn 9.
Using reg 30 for reload 0
Spilling for insn 11.
Using reg 30 for reload 0
Spilling for insn 13.
Using reg 30 for reload 0
Spilling for insn 15.
Using reg 30 for reload 0
Spilling for insn 17.
Using reg 30 for reload 0
Spilling for insn 19.
Using reg 30 for reload 0
Spilling for insn 21.
Using reg 30 for reload 0
Spilling for insn 23.
Using reg 30 for reload 0

Reloads for insn # 2
Reload 0: reload_out (HI) = (reg/v/f:HI 60 [ x ])
	NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
	reload_out_reg: (reg/v/f:HI 60 [ x ])

Reloads for insn # 3
Reload 0: reload_out (HI) = (reg/v/f:HI 61 [ y ])
	NO_REGS, RELOAD_FOR_OUTPUT (opnum = 0), optional
	reload_out_reg: (reg/v/f:HI 61 [ y ])

Reloads for insn # 8
Reload 0: reload_in (HI) = (reg/v/f:HI 61 [ y ])
	POINTER_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
	reload_in_reg: (reg/v/f:HI 61 [ y ])
	reload_reg_rtx: (reg:HI 26 r26)

Reloads for insn # 9
Reload 0: reload_in (HI) = (reg/v/f:HI 60 [ x ])
	POINTER_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
	reload_in_reg: (reg/v/f:HI 60 [ x ])
	reload_reg_rtx: (reg:HI 30 r30)

Reloads for insn # 11
Reload 0: reload_in (HI) = (reg/v/f:HI 20 r20 [orig:62 z ] [62])
	POINTER_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
	reload_in_reg: (reg/v/f:HI 20 r20 [orig:62 z ] [62])
	reload_reg_rtx: (reg:HI 26 r26)

Reloads for insn # 13
Reload 0: reload_in (HI) = (reg/v/f:HI 60 [ x ])
	BASE_POINTER_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
	reload_in_reg: (reg/v/f:HI 60 [ x ])
	reload_reg_rtx: (reg:HI 30 r30)

Reloads for insn # 15
Reload 0: reload_in (HI) = (reg/v/f:HI 61 [ y ])
	BASE_POINTER_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
	reload_in_reg: (reg/v/f:HI 61 [ y ])
	reload_reg_rtx: (reg:HI 30 r30)

Reloads for insn # 17
Reload 0: reload_in (HI) = (reg/v/f:HI 20 r20 [orig:62 z ] [62])
	BASE_POINTER_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
	reload_in_reg: (reg/v/f:HI 20 r20 [orig:62 z ] [62])
	reload_reg_rtx: (reg:HI 30 r30)

Reloads for insn # 19
Reload 0: reload_in (HI) = (reg/v/f:HI 60 [ x ])
	BASE_POINTER_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
	reload_in_reg: (reg/v/f:HI 60 [ x ])
	reload_reg_rtx: (reg:HI 30 r30)

Reloads for insn # 21
Reload 0: reload_in (HI) = (reg/v/f:HI 61 [ y ])
	BASE_POINTER_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
	reload_in_reg: (reg/v/f:HI 61 [ y ])
	reload_reg_rtx: (reg:HI 30 r30)

Reloads for insn # 23
Reload 0: reload_in (HI) = (reg/v/f:HI 20 r20 [orig:62 z ] [62])
	BASE_POINTER_REGS, RELOAD_FOR_OPERAND_ADDRESS (opnum = 1)
	reload_in_reg: (reg/v/f:HI 20 r20 [orig:62 z ] [62])
	reload_reg_rtx: (reg:HI 30 r30)
deleting insn with uid = 4.
+++Overall after reload 52000


try_optimize_cfg iteration 1

starting the processing of deferred insns
ending the processing of deferred insns
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
df_worklist_dataflow_doublequeue:n_basic_blocks 3 n_edges 2 count 3 (    1)
df_worklist_dataflow_doublequeue:n_basic_blocks 3 n_edges 2 count 3 (    1)
(note 1 0 6 NOTE_INSN_DELETED)

(note 6 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(insn 2 6 3 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28)
                (const_int 1 [0x1])) [4 %sfp+1 S2 A8])
        (reg:HI 24 r24 [ x ])) in.c:7 6 {*movhi}
     (nil))

(insn 3 2 5 2 (set (mem/c:HI (plus:HI (reg/f:HI 28 r28)
                (const_int 3 [0x3])) [4 %sfp+3 S2 A8])
        (reg:HI 22 r22 [ y ])) in.c:7 6 {*movhi}
     (nil))

(note 5 3 37 2 NOTE_INSN_FUNCTION_BEG)

(insn 37 5 8 2 (set (reg:HI 26 r26)
        (mem/c:HI (plus:HI (reg/f:HI 28 r28)
                (const_int 3 [0x3])) [4 %sfp+3 S2 A8])) in.c:10 6 {*movhi}
     (nil))

(insn 8 37 38 2 (set (reg/v:QI 25 r25 [orig:56 s ] [56])
        (mem/s:QI (reg:HI 26 r26) [0 y_5(D)->a+0 S1 A8])) in.c:10 1 {movqi_insn}
     (nil))

(insn 38 8 9 2 (set (reg:HI 30 r30)
        (mem/c:HI (plus:HI (reg/f:HI 28 r28)
                (const_int 1 [0x1])) [4 %sfp+1 S2 A8])) in.c:10 6 {*movhi}
     (nil))

(insn 9 38 10 2 (set (reg:QI 24 r24 [orig:64 x_2(D)->a ] [64])
        (mem/s:QI (reg:HI 30 r30) [0 x_2(D)->a+0 S1 A8])) in.c:10 1 {movqi_insn}
     (nil))

(insn 10 9 39 2 (set (reg/v:QI 25 r25 [orig:56 s ] [56])
        (plus:QI (reg/v:QI 25 r25 [orig:56 s ] [56])
            (reg:QI 24 r24 [orig:64 x_2(D)->a ] [64]))) in.c:10 14 {addqi3}
     (nil))

(insn 39 10 11 2 (set (reg:HI 26 r26)
        (reg/v/f:HI 20 r20 [orig:62 z ] [62])) in.c:11 6 {*movhi}
     (nil))

(insn 11 39 12 2 (set (reg:QI 24 r24 [orig:65 z_8(D)->a ] [65])
        (mem/s:QI (reg:HI 26 r26) [0 z_8(D)->a+0 S1 A8])) in.c:11 1 {movqi_insn}
     (nil))

(insn 12 11 13 2 (set (reg/v:QI 25 r25 [orig:56 s ] [56])
        (plus:QI (reg/v:QI 25 r25 [orig:56 s ] [56])
            (reg:QI 24 r24 [orig:65 z_8(D)->a ] [65]))) in.c:11 14 {addqi3}
     (nil))

(insn 13 12 14 2 (set (reg:QI 24 r24 [orig:66 x_2(D)->b ] [66])
        (mem/s:QI (plus:HI (reg:HI 30 r30)
                (const_int 1 [0x1])) [0 x_2(D)->b+0 S1 A8])) in.c:13 1 {movqi_insn}
     (nil))

(insn 14 13 40 2 (set (reg/v:QI 25 r25 [orig:56 s ] [56])
        (plus:QI (reg/v:QI 25 r25 [orig:56 s ] [56])
            (reg:QI 24 r24 [orig:66 x_2(D)->b ] [66]))) in.c:13 14 {addqi3}
     (nil))

(insn 40 14 15 2 (set (reg:HI 30 r30)
        (mem/c:HI (plus:HI (reg/f:HI 28 r28)
                (const_int 3 [0x3])) [4 %sfp+3 S2 A8])) in.c:14 6 {*movhi}
     (nil))

(insn 15 40 16 2 (set (reg:QI 24 r24 [orig:67 y_5(D)->b ] [67])
        (mem/s:QI (plus:HI (reg:HI 30 r30)
                (const_int 1 [0x1])) [0 y_5(D)->b+0 S1 A8])) in.c:14 1 {movqi_insn}
     (nil))

(insn 16 15 41 2 (set (reg/v:QI 25 r25 [orig:56 s ] [56])
        (plus:QI (reg/v:QI 25 r25 [orig:56 s ] [56])
            (reg:QI 24 r24 [orig:67 y_5(D)->b ] [67]))) in.c:14 14 {addqi3}
     (nil))

(insn 41 16 17 2 (set (reg:HI 30 r30)
        (reg/v/f:HI 20 r20 [orig:62 z ] [62])) in.c:15 6 {*movhi}
     (nil))

(insn 17 41 18 2 (set (reg:QI 24 r24 [orig:68 z_8(D)->b ] [68])
        (mem/s:QI (plus:HI (reg:HI 30 r30)
                (const_int 1 [0x1])) [0 z_8(D)->b+0 S1 A8])) in.c:15 1 {movqi_insn}
     (nil))

(insn 18 17 42 2 (set (reg/v:QI 25 r25 [orig:56 s ] [56])
        (plus:QI (reg/v:QI 25 r25 [orig:56 s ] [56])
            (reg:QI 24 r24 [orig:68 z_8(D)->b ] [68]))) in.c:15 14 {addqi3}
     (nil))

(insn 42 18 19 2 (set (reg:HI 30 r30)
        (mem/c:HI (plus:HI (reg/f:HI 28 r28)
                (const_int 1 [0x1])) [4 %sfp+1 S2 A8])) in.c:17 6 {*movhi}
     (nil))

(insn 19 42 20 2 (set (reg:QI 24 r24 [orig:69 x_2(D)->c ] [69])
        (mem/s:QI (plus:HI (reg:HI 30 r30)
                (const_int 2 [0x2])) [0 x_2(D)->c+0 S1 A8])) in.c:17 1 {movqi_insn}
     (nil))

(insn 20 19 43 2 (set (reg/v:QI 25 r25 [orig:56 s ] [56])
        (plus:QI (reg/v:QI 25 r25 [orig:56 s ] [56])
            (reg:QI 24 r24 [orig:69 x_2(D)->c ] [69]))) in.c:17 14 {addqi3}
     (nil))

(insn 43 20 21 2 (set (reg:HI 30 r30)
        (mem/c:HI (plus:HI (reg/f:HI 28 r28)
                (const_int 3 [0x3])) [4 %sfp+3 S2 A8])) in.c:18 6 {*movhi}
     (nil))

(insn 21 43 22 2 (set (reg:QI 24 r24 [orig:70 y_5(D)->c ] [70])
        (mem/s:QI (plus:HI (reg:HI 30 r30)
                (const_int 2 [0x2])) [0 y_5(D)->c+0 S1 A8])) in.c:18 1 {movqi_insn}
     (nil))

(insn 22 21 44 2 (set (reg/v:QI 25 r25 [orig:56 s ] [56])
        (plus:QI (reg/v:QI 25 r25 [orig:56 s ] [56])
            (reg:QI 24 r24 [orig:70 y_5(D)->c ] [70]))) in.c:18 14 {addqi3}
     (nil))

(insn 44 22 23 2 (set (reg:HI 30 r30)
        (reg/v/f:HI 20 r20 [orig:62 z ] [62])) in.c:19 6 {*movhi}
     (nil))

(insn 23 44 24 2 (set (reg:QI 24 r24 [orig:72 z_8(D)->c ] [72])
        (mem/s:QI (plus:HI (reg:HI 30 r30)
                (const_int 2 [0x2])) [0 z_8(D)->c+0 S1 A8])) in.c:19 1 {movqi_insn}
     (nil))

(note 24 23 29 2 NOTE_INSN_DELETED)

(insn 29 24 32 2 (set (reg/i:QI 24 r24)
        (plus:QI (reg:QI 24 r24 [orig:72 z_8(D)->c ] [72])
            (reg/v:QI 25 r25 [orig:56 s ] [56]))) in.c:22 14 {addqi3}
     (nil))

(insn 32 29 35 2 (use (reg/i:QI 24 r24)) in.c:22 -1
     (nil))

(note 35 32 0 NOTE_INSN_DELETED)

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Help: Register allocator sets up frame at low register pressure (PR 50775)
  2011-10-25 13:19 Help: Register allocator sets up frame at low register pressure (PR 50775) Georg-Johann Lay
@ 2011-10-30 23:28 ` Denis Chertykov
  0 siblings, 0 replies; 2+ messages in thread
From: Denis Chertykov @ 2011-10-30 23:28 UTC (permalink / raw)
  To: Georg-Johann Lay; +Cc: gcc, Vladimir Makarov

2011/10/25 Georg-Johann Lay <avr@gjlay.de>:
> With the following, small C test program
>
>
> typedef struct
> {
>    unsigned char a, b, c, d;
> } s_t;
>
> unsigned char func1 (s_t *x, s_t *y, s_t *z)
> {
>    unsigned char s = 0;
>    s += x->a;
>    s += y->a;
>    s += z->a;
>
>    s += x->b;
>    s += y->b;
>    s += z->b;
>
>    s += x->c;
>    s += y->c;
>    s += z->c;
>
>    return s;
> }
>
> there is a frame pointer set up for no apparent reason.
>
> The machine for which this code is compiled for (AVR) has just few pointer
> registers and taking away one of them to use it as frame pointer leads to
> severe performance degradation in many real-world programs: moving from/to
> memory is more expensive than movon around registers, setting up a frame is
> expensive and taking away 1 of 2 address registers is expensive.
>
> What I tried and what did not fix it:
>
> - increase targetm.memory_move_cost (up to unsane value)
> - play around with targetm.class_likely_spilled_p
>
> The program is compiled with
>
> $ avr-gcc in.c -S -Os -fdump-rtl-ira-details -fdump-rtl-postreload-details
> -mmcu=avr4 -mstrict-X
>
> with avr-gcc from current trunk SVN r180399.
>
>
> The issue is that AVR has only 3 pointer registers X, Y, and Z with the
> following addressing capabilities:
>
>  *X, *X++, *--X             (R27:R26, call-clobbered)
>  *Y, *Y++, *--Y, *(Y+const) (R28:R29, call-saved, frame pointer)
>  *Z, *Z++, *--Z, *(Z+const) (R30:R31, call-clobbered)
>
> Older version of the compiler prior to 4.7 trunk r179993 allowed a fake
> addressing mode *(X+const) and emulated it by emitting appropriate instructions
> sequence like
>
>  X = X + const
>  r = *X
>  X = X - const
>
> which was only a rare corner case in the old register allocator, but in the new
> allocator this sequence is seen very often leading to code bloat of +50% for
> some real-world functions.
>
> This is the reason why the command line option -mstrict-X has been added to the
> AVR backend, see PR46278.
>
> This option denies fake *(X+const) addressing but leads to the mentioned spills
> from register allocator and to code even worse as compared to without setting
> -mstrict-X, i.e. register allocator sabotages a smart usage of the address
> registers.
>
> All I see is that reload1.c:alter_reg() generates the spill because
> ira_conflicts_p is true.
>
> With the option -morder1 turn on (affects ADJUST_REG_ALLOC_ORDER) there is
> still a frame set up even though never accessed.
>
> Can anyone give me some advice how to proceed with this issue?
>
> Can be said if this is a target issue or IRA/reload flaw?

It's not a costs related problem.
I think that I can explain a problem.
I think that it's an IRA bug.

> Spilling for insn 11.
> Using reg 26 for reload 0
> Spilling for insn 17.
> Using reg 30 for reload 0
> Spilling for insn 23.
> Using reg 30 for reload 0
>       Try Assign 60(a6), cost=16000

Wrong thing starts here...
ira-color.c:4120 allocno_reload_assign (a, forbidden_regs);

> changing reg in insn 2
> changing reg in insn 9
> changing reg in insn 13
> changing reg in insn 19
>      Assigning 60(freq=4000) a new slot 0
> Register 60 now on stack.

Call trace:
allocno_reload_assign() -> assign_hard_reg() -> get_conflict_profitable_regs()

The `get_conflict_profitable_regs' calculates wrong `profitable_regs[1]'

(Special for Vladimir)
AVR is an 8 bits microcontroller.
The AVR has only 3 pointer registers X, Y, and Z with the
following addressing capabilities:
 *X, *X++, *--X             (R27:R26, call-clobbered)
 *Y, *Y++, *--Y, *(Y+const) (R28:R29, call-saved, frame pointer)
 *Z, *Z++, *--Z, *(Z+const) (R30:R31, call-clobbered)
Also, all modes larger than 8 bits should start in an even register.

So, `get_conflict_profitable_regs' trying to calculate two arrays:
  - profitable_regs[0] for first word of register 60(a6)
  - profitable_regs[1] for second word of register 60(a6)

Values of `profitable_regs':
(gdb) p print_hard_reg_set (stderr,profitable_regs[0] , 01)
 0-2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
$63 = void
(gdb) p print_hard_reg_set (stderr,profitable_regs[1] , 01)
 0-2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

They are equal !
It's wrong because second word of register 60(a6) must be allocated to
odd register.


This is a wrong place in `get_conflict_profitable_regs':
...
  nwords = ALLOCNO_NUM_OBJECTS (a);
  for (i = 0; i < nwords; i++)
    {
      obj = ALLOCNO_OBJECT (a, i);
      COPY_HARD_REG_SET (conflict_regs[i],
			 OBJECT_TOTAL_CONFLICT_HARD_REGS (obj));
      if (retry_p)
	{
	  COPY_HARD_REG_SET (profitable_regs[i],
			     reg_class_contents[ALLOCNO_CLASS (a)]);
	  AND_COMPL_HARD_REG_SET (profitable_regs[i],
				  ira_prohibited_class_mode_regs
				  [ALLOCNO_CLASS (a)][ALLOCNO_MODE (a)]);
-------------------------------------------------------------^^^^^^^^^^^^^^^^^^^^^^^^^
	}

ALLOCNO_MODE (a) is a right mode for first word (word = 8bits register)
But it's wrong mode for second word of allocno.
Even more, ALLOCNO_MODE (a) is a right mode only for whole allocno.
If we want to spill/load/store separate parts(IRA objects) of allocno
we must use mode of each part(object).

`ira_prohibited_class_mode_regs' derived only from HARD_REGNO_MODE_OK.
So, the second word of 60(a6) permitted to any register after first
word of 60(a6).
For AVR: profitable_regs[1] = profitable_regs[0] << 1

Also, I have a question about the following fields of `ira_allocno':
  /* The number of objects tracked in the following array.  */
  int num_objects;
  /* An array of structures describing conflict information and live
     ranges for each object associated with the allocno.  There may be
     more than one such object in cases where the allocno represents a
     multi-word register.  */
  ira_object_t objects[2];
--------------------------^^^^^
The SImode for AVR consists of 4 words, but only 2 objects in allocno structure.
Is this right ?

Denis.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-10-30 18:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-25 13:19 Help: Register allocator sets up frame at low register pressure (PR 50775) Georg-Johann Lay
2011-10-30 23:28 ` Denis Chertykov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).