public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Register Allocation
@ 2004-03-26 22:21 John Lu
  2004-03-26 22:21 ` Vladimir N. Makarov
  0 siblings, 1 reply; 41+ messages in thread
From: John Lu @ 2004-03-26 22:21 UTC (permalink / raw)
  To: gcc

Hi,

I've been working on a port based on gcc-3.3 and I've noticed that
better assembly code is generated if the C source has
separate variables declared for distinct live ranges.

For example,

     int foo1(int *p1, int *p2) {
       int i;  /* one index variable for two live ranges */
       int total;

       total=0;
       for (i=0; i<100; i++) {
         total+=p1[i];
       }

       for (i=0; i<100; i+=2) {
         total+=p1[i];
       }
  
       return(total);
     }
     
produces worse code than:

     int foo2(int *p1, int *p2) {
       int i1,i2;  /* two index variables for two live ranges */
       int total;
   
       total=0;
       for (i1=0; i1<100; i1++) {
         total+=p1[i1];
       }

       for (i2=0; i2<100; i2+=2) {
         total+=p1[i2];
       }
  
       return(total);
     }
     
In my port, i1 is allocated a loop register which supports faster looping,
and i2 is allocated to a gpr because a decrement by two is needed.  When
only one index variable is declared, "i" is allocated to a loop register,
but this creates bad code for the second loop, since decrement by two is
not supported  for loop registers.  This happens with and without the
"-fnew-ra" option.

I was wondering if anyone else has seen this or am I doing something wrong
in my port.

Thanks,
John Lu

^ permalink raw reply	[flat|nested] 41+ messages in thread
* register allocation
@ 2010-12-23  8:13 roy rosen
  2010-12-23 16:48 ` Vladimir Makarov
  0 siblings, 1 reply; 41+ messages in thread
From: roy rosen @ 2010-12-23  8:13 UTC (permalink / raw)
  To: gcc

Hi All,

I am looking at the code generated by my port and it seems that I have
a problem that too many copies between registers are generated.
I looked a bit at the register allocation and wanted to verify that I
understand its behavior.

Is that true that it first chooses a register class for each pseodo
and only then starts coloring?

I think that my problem is that in my architecture there are two
register classes which can do all arithmetic operation but class X can
also do loads and stores and class Y can also do DSP operations.

So when there are for example two DSP operations and between them some
arithmetic operations I expect to use only class Y but GCC prefers to
copy registers and do the arithmetic operations using X because for
some reason it determined that the prefered class for the registers in
the arithmetic operations is X.

It seems that determining the class does not look at the whole flow
but rather looks only at insns in which the register appears.

Do I understand the situation correctly?
Is there something I can do about it?

Thanks, Roy.

^ permalink raw reply	[flat|nested] 41+ messages in thread
* Re: Register Allocation
@ 2005-11-24 20:51 Joern RENNECKE
  0 siblings, 0 replies; 41+ messages in thread
From: Joern RENNECKE @ 2005-11-24 20:51 UTC (permalink / raw)
  To: Ian Lance Taylor, Andrew MacLeod; +Cc: gcc mailing list

In http://gcc.gnu.org/ml/gcc/2005-11/msg01163.html, Ian Lance Taylor wrote:

> Either way, register elimination can cause addresses which were valid
> to become invalid, typically because valid offsets from the frame
> pointer become invalid offsets from the stack pointer.  So that needs
> to be cleaned up somewhere.

This is not just about some requiring some cleanup somewhere.  Register
elimination and stack slot allocation determine the exact addresses that
are used, which in turn determine what reload inheritance is possible for
address reloads that are for stack slots which are close together on the
stack.  Getting this right is essential to avoid performance degradation on
some platforms.  These targets typically use LEGITIMIZE_RELOAD_ADDRESS to
split out-of-range addresses into a normal form with a base address load
and a memory access using this base with a small offset.

On the other hand, the hard register spills appear to offer a new
opportunity: we have talked about shrink-wrapping code in the past, but
have never implemented this in gcc.
I think that register saves/restores can be considered
special cases of hard register spills.  In order to do this efficiently,
there would have to be some interface with the target to exploit insn
sequences that can save/restore multiple registers more efficiently in
bulk, .e.g load/store multiple, or auto-increment use on targets that
are otherwise ACCUMULATE_OUTGOING_ARGS.  On the other hand, these
techniques can also help when we need to spill multiple hard registers
around a tight loop.




^ permalink raw reply	[flat|nested] 41+ messages in thread
* Register Allocation
@ 2005-11-17 16:53 Andrew MacLeod
  2005-11-18  2:55 ` Mark Mitchell
                   ` (5 more replies)
  0 siblings, 6 replies; 41+ messages in thread
From: Andrew MacLeod @ 2005-11-17 16:53 UTC (permalink / raw)
  To: gcc mailing list

It must be the season for this sort of thing :-)

I have been contemplating building a GCC register allocator from scratch
for some time.  To that end, I have put together a bit of a document
given a high level overview of the various components I think would
benefit GCC, and a rough description of how they would work together.  

It is my intention over the next few months to do some of the initial
underlying infrastructure bits upon which the entire document is based.
Presuming that proceeds OK and I can build up the data structures I am
looking for, I'll move on from there.  If anyone wants to help, I'm sure
there will be some juicy things to do. 

Anyone who wishes to provide constructive comments about what is in the
write up, feel free to send them to me.  I also know there are other
projects ongoing which could be related to this work in some fashion.
Anyone reading this document who is involved with those projects and
sees something useful here or in those projects that could be combined
in some way is encouraged to let me know their thoughts and ideas as
well. 

The document is intended as a starting point and consists mostly of my
thoughts at the moment. By the time the underlying RTL bits are done, I
would like it to have evolved to include input from others.  The more
useful comments there are, the better the chance of us getting a decent
allocator. 

The .pdf file is currently available at:

http://people.redhat.com/dnovillo/rable.pdf

(Thanks dnovillo :-)

Andrew 

^ permalink raw reply	[flat|nested] 41+ messages in thread
* Register Allocation
@ 2004-09-22  1:21 Adrian Strätling
  2004-09-22  5:22 ` tm_gccmail
  0 siblings, 1 reply; 41+ messages in thread
From: Adrian Strätling @ 2004-09-22  1:21 UTC (permalink / raw)
  To: GCC

Hi,

I'm working on a target that does support up to 8 parallel instructions, 
but they have to be grouped at compile time. The scheduler could do much 
better than it currently does if there were less (output) depencies.

Is it possible to tell the register allocator not to reduce the register 
count to the minimum?

example:
( || means that this insn is executed in parallel to the last one )

1)    mvkl  L4,     b4      \
2)    mvkh  L4,     b4      +   push ret_label
3)    stw   b4,     *--b15  /
   || mvkl  times2, b4      \
4)    mvkh  times2, b4      +   branch
5)    b     b4              /
L4:

If the second temporary were not assigned to b4 but to b5, the schedule 
would be shorter:

1)    mvkl  L4,     b4
   || mvkl  times2, b5
2)    mvkh  L4,     b4
   || mvkh  times2, b5
3)    stw   b4,     *--b15    
   || b     b5
L4:

Thanks,
Adrian

^ permalink raw reply	[flat|nested] 41+ messages in thread
* register allocation
@ 2004-05-02 13:27 Qiong Cai
  2004-05-02 16:56 ` Daniel Berlin
  2004-05-03  7:07 ` Michael Matz
  0 siblings, 2 replies; 41+ messages in thread
From: Qiong Cai @ 2004-05-02 13:27 UTC (permalink / raw)
  To: gcc

Hi,

I'm going to study register allocation in GCC.  Currently GCC has 
two register allocators.  Here some questions:

* Is there any performance results(eg. spec2k results) available for
these two allocators?
* Besides those ra*.[c,h] files, is tthere any other source files related 
to register allocation?
* Which function, if available, calcuate the register pressure, 
which is the defined as the max number of live variable for a basic block?

Many thanks.

Qiong

^ permalink raw reply	[flat|nested] 41+ messages in thread
* register Allocation
@ 2002-03-12  6:21 Danish Samad
  0 siblings, 0 replies; 41+ messages in thread
From: Danish Samad @ 2002-03-12  6:21 UTC (permalink / raw)
  To: gcc, syed_rauf_ul_hassan


Hello
 The problem I am getting is that I have a pattern for
addition of Single Integer value. When I give the half
intger values It uses subreg to load the values. and
then perform the single integer addition. The
unoptimized RTL is


(insn 30 28 31 (set (subreg:HI (reg:SI 45) 1)
        (mem/f:HI (reg:SI 43) 0)) -1 (nil)
    (nil))

(insn 33 31 34 (set (subreg:HI (reg:SI 46) 1)
        (mem/f:HI (reg:SI 44) 0)) -1 (nil)
    (nil))

(insn 34 33 36 (set (reg:SI 47)
        (plus:SI (reg:SI 45)
            (reg:SI 46))) -1 (nil)
    (nil))

After GREG the RTL becomes

(insn 30 28 31 (set (reg:HI 1 r1)
        (mem/f:HI (reg:SI 18 a2) 0)) 2 {*movhi_insn}
(nil)
    (nil))

(note 31 30 33 "" NOTE_INSN_DELETED)

(insn 33 31 34 (set (reg:HI 3 r3)
        (mem/f:HI (reg:SI 17 a1) 0)) 2 {*movhi_insn}
(nil)
    (nil))

(insn 34 33 36 (set (reg:SI 0 r0)
        (plus:SI (reg:SI 0 r0)
            (reg:SI 2 r2))) 4 {addsi3} (nil)
    (nil))

r1 should be r0 and  r3 should be r2
What could be causing this problem




__________________________________________________
Do You Yahoo!?
Try FREE Yahoo! Mail - the world's greatest free email!
http://mail.yahoo.com/

^ permalink raw reply	[flat|nested] 41+ messages in thread
* Register allocation
@ 1997-10-14  5:51 Thomas Koenig
  1998-12-21 22:38 ` Jeffrey A Law
  0 siblings, 1 reply; 41+ messages in thread
From: Thomas Koenig @ 1997-10-14  5:51 UTC (permalink / raw)
  To: egcs

egcs 971008 with haifa enabled generates two unnecessary register
moves for the function for Linux i386-glibc1:

typedef struct pt
{
    int x;
    int y;
    struct pt *n;
} pt;

#define WALL(a) ((a)->n == 0)
#define SQR(a) ((double) (a)* (double) (a))

double e_point_point(pt *a, pt *b)
{
    double res;
    res = SQR(a->x - b->x) + SQR(a->y - b->y);
    if (WALL(a) || WALL(b)) {
	res *= 4;
    }
    return res;
}

Here's the assembly output:

	.file	"point.c"
	.version	"01.01"
/ GNU C version egcs-2.90.12 971008 (gcc2-970802 experimental) (i586-pc-linux-gnulibc1) compiled by GNU C version egcs-2.90.12 971008 (gcc2-970802 experimental).
/ options passed:  -O6 -fomit-frame-pointer -fno-exceptions
/ options enabled:  -fdefer-pop -fomit-frame-pointer -fcse-follow-jumps
/ -fcse-skip-blocks -fexpensive-optimizations -fthread-jumps
/ -fstrength-reduce -fpeephole -fforce-mem -ffunction-cse
/ -finline-functions -finline -fkeep-static-consts -fcaller-saves
/ -fpcc-struct-return -frerun-cse-after-loop -fschedule-insns2
/ -fsched-interblock -fsched-spec -fcommon -fverbose-asm -fgnu-linker
/ -fregmove -falias-check -fargument-alias -m80387 -mhard-float
/ -mno-soft-float -mieee-fp -mfp-ret-in-387 -mschedule-prologue
/ -mcpu=pentium -march=pentium

gcc2_compiled.:
.section	.rodata
	.align 4
.LC1:
	.long 0x0,0x40100000
.text
	.align 4
.globl e_point_point
	.type	 e_point_point,@function
e_point_point:
	pushl %ebx
	movl 8(%esp),%edx
	movl 12(%esp),%ecx
	movl (%edx),%ebx
	movl (%ecx),%eax
	fmul %st(0),%st
	subl %eax,%ebx
	movl %ebx,%eax
        ^^^^^^^^^^^^^^
	pushl %eax
	fildl (%esp)
	addl $4,%esp
	fmul %st(0),%st
	faddp %st,%st(1)
	cmpl $0,8(%edx)
	je .L3
	cmpl $0,8(%ecx)
	jne .L2
.L3:
	fldl .LC1
	fmulp %st,%st(1)
.L2:
	popl %ebx
	ret
.Lfe1:
	.size	 e_point_point,.Lfe1-e_point_point
	.ident	"GCC: (GNU) egcs-2.90.12 971008 (gcc2-970802 experimental)"

Both of these register moves are unnecessary, and when I replace the
first one with the more obvious

	subl %eax,%ebx
	pushl %ebx
	movl 4(%edx),%ebx
	fildl (%esp)

the resulting code is indeed faster.
-- 
Thomas Koenig, Thomas.Koenig@ciw.uni-karlsruhe.de, ig25@dkauni2.bitnet.
The joy of engineering is to find a straight line on a double
logarithmic diagram.

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2011-01-11 16:11 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-26 22:21 Register Allocation John Lu
2004-03-26 22:21 ` Vladimir N. Makarov
2004-03-26 22:26   ` Andrew MacLeod
2004-03-27 18:22     ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2010-12-23  8:13 register allocation roy rosen
2010-12-23 16:48 ` Vladimir Makarov
2010-12-23 17:22   ` Jeff Law
2010-12-27 15:43   ` roy rosen
2011-01-03 15:41     ` Jeff Law
2011-01-05 14:44       ` roy rosen
2011-01-05 15:26         ` Jeff Law
2011-01-11 16:11         ` Vladimir Makarov
2011-01-11 15:53       ` Vladimir Makarov
2005-11-24 20:51 Register Allocation Joern RENNECKE
2005-11-17 16:53 Andrew MacLeod
2005-11-18  2:55 ` Mark Mitchell
2005-11-18  3:27 ` Daniel Jacobowitz
2005-11-18  9:53 ` Giovanni Bajo
2005-11-18 15:28   ` Andrew MacLeod
2005-11-19 19:31 ` Ian Lance Taylor
2005-11-19 20:20   ` Denis Chertykov
2005-11-20  0:20   ` Giovanni Bajo
2005-11-23 17:07   ` Andrew MacLeod
2005-11-23 20:43     ` Ian Lance Taylor
2005-11-20  0:37 ` Steven Bosscher
2005-11-23 17:08   ` Andrew MacLeod
2005-11-22 19:26 ` Peter Bergner
2005-11-22 21:55   ` Steven Bosscher
     [not found]   ` <200511222256.13823.>
2005-11-22 22:58     ` Peter Bergner
2005-11-23 14:06   ` Michael Matz
2005-11-23 20:50     ` Peter Bergner
2005-11-23 17:08   ` Andrew MacLeod
2004-09-22  1:21 Adrian Strätling
2004-09-22  5:22 ` tm_gccmail
2004-10-04 14:13   ` Nick Ing-Simmons
2004-05-02 13:27 register allocation Qiong Cai
2004-05-02 16:56 ` Daniel Berlin
2004-05-03  7:07 ` Michael Matz
2002-03-12  6:21 register Allocation Danish Samad
1997-10-14  5:51 Register allocation Thomas Koenig
1998-12-21 22:38 ` Jeffrey A Law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).