public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* PR 6394
@ 2002-04-29 14:36 Mark Mitchell
  2002-04-29 21:38 ` John David Anglin
  0 siblings, 1 reply; 117+ messages in thread
From: Mark Mitchell @ 2002-04-29 14:36 UTC (permalink / raw)
  To: law, dave.anglin; +Cc: gcc

PR 6394 is a floating-point ICE on PA 2.0, and is a regression from GCC
3.0, at least in the sense that we can no longer build Perl.

Jeff, do you have time to look at this?

--
Mark Mitchell                   mark@codesourcery.com
CodeSourcery, LLC               http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-29 14:36 PR 6394 Mark Mitchell
@ 2002-04-29 21:38 ` John David Anglin
  2002-04-30 10:31   ` David Edelsohn
  0 siblings, 1 reply; 117+ messages in thread
From: John David Anglin @ 2002-04-29 21:38 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: law, dave.anglin, gcc

> PR 6394 is a floating-point ICE on PA 2.0, and is a regression from GCC
> 3.0, at least in the sense that we can no longer build Perl.
> 
> Jeff, do you have time to look at this?

This would be definitely helpful.  I have investigated this enough
to know that the floating pointer register is being allocated in
global_alloc.  What I don't understand is why this happens as it
would seem that a general register should have been selected.
The ICE occurs selecting one of two addresses for the return value
and there shouldn't be many conflicts at this point.

The good news is that it only occurs in PIC code at -O1 and -O.
The problem doesn't occur at -O2.

I've had a lot of personal matters to attend to in the past couple
of weeks so I haven't had much time to attend to this problem.
I've been burning the midnight oil to run as many tests as possible.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-29 21:38 ` John David Anglin
@ 2002-04-30 10:31   ` David Edelsohn
  2002-04-30 10:48     ` John David Anglin
  2002-04-30 12:56     ` Geoff Keating
  0 siblings, 2 replies; 117+ messages in thread
From: David Edelsohn @ 2002-04-30 10:31 UTC (permalink / raw)
  To: John David Anglin; +Cc: Mark Mitchell, law, dave.anglin, gcc

>>>>> John David Anglin writes:

John> This would be definitely helpful.  I have investigated this enough
John> to know that the floating pointer register is being allocated in
John> global_alloc.  What I don't understand is why this happens as it
John> would seem that a general register should have been selected.
John> The ICE occurs selecting one of two addresses for the return value
John> and there shouldn't be many conflicts at this point.

	Alan Modra and the IBM ppc64 Linux team found similar bad register
allocation for 64-bit PowerPC.  As an interim solution, we added splitters
that handle FP registers for those patterns to rs6000.md.  I would suggest
implementing something similar for PA-RISC GCC 3.1 or GCC 3.1.1.

	For GCC 3.2, we should focus on the new register allocator instead
of trying to fix an inherently limited and broken implementation.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 10:31   ` David Edelsohn
@ 2002-04-30 10:48     ` John David Anglin
  2002-04-30 18:48       ` law
  2002-04-30 12:56     ` Geoff Keating
  1 sibling, 1 reply; 117+ messages in thread
From: John David Anglin @ 2002-04-30 10:48 UTC (permalink / raw)
  To: David Edelsohn; +Cc: mark, law, dave.anglin, gcc

> 	Alan Modra and the IBM ppc64 Linux team found similar bad register
> allocation for 64-bit PowerPC.  As an interim solution, we added splitters
> that handle FP registers for those patterns to rs6000.md.  I would suggest
> implementing something similar for PA-RISC GCC 3.1 or GCC 3.1.1.

Jeff, should we go ahead with a temporary fix for the ICE in the machine
definition or is there a chance for a better fix?  I am thinking of the
patch that I previously sent, modified to handle the situation in both
32bit and 64bit code.  This would be just for the branch.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 10:31   ` David Edelsohn
  2002-04-30 10:48     ` John David Anglin
@ 2002-04-30 12:56     ` Geoff Keating
  2002-04-30 13:07       ` David Edelsohn
  1 sibling, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-04-30 12:56 UTC (permalink / raw)
  To: David Edelsohn, John David Anglin; +Cc: gcc

David Edelsohn <dje@watson.ibm.com> writes:

> >>>>> John David Anglin writes:
> 
> John> This would be definitely helpful.  I have investigated this enough
> John> to know that the floating pointer register is being allocated in
> John> global_alloc.  What I don't understand is why this happens as it
> John> would seem that a general register should have been selected.
> John> The ICE occurs selecting one of two addresses for the return value
> John> and there shouldn't be many conflicts at this point.
>
> 	Alan Modra and the IBM ppc64 Linux team found similar bad register
> allocation for 64-bit PowerPC.  As an interim solution, we added splitters
> that handle FP registers for those patterns to rs6000.md.  I would suggest
> implementing something similar for PA-RISC GCC 3.1 or GCC 3.1.1.
> 
> 	For GCC 3.2, we should focus on the new register allocator instead
> of trying to fix an inherently limited and broken implementation.

Did you work out _why_ reg-alloc was chosing this class (and which
class it was choosing)?  Often these problems are actually caused by
the machine description, for instance by not having the union of two
register classes available, and a new register allocator won't help
that.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 12:56     ` Geoff Keating
@ 2002-04-30 13:07       ` David Edelsohn
  2002-04-30 13:12         ` John David Anglin
  0 siblings, 1 reply; 117+ messages in thread
From: David Edelsohn @ 2002-04-30 13:07 UTC (permalink / raw)
  To: Geoff Keating; +Cc: John David Anglin, gcc

>>>>> Geoff Keating writes:

Geoff> Did you work out _why_ reg-alloc was chosing this class (and which
Geoff> class it was choosing)?  Often these problems are actually caused by
Geoff> the machine description, for instance by not having the union of two
Geoff> register classes available, and a new register allocator won't help
Geoff> that.

	For the PowerPC case, DImode is allowed in FPRs.  A loop was
generated with a pseudo that was used as an FP value elsewhere.  The
register allocator could not be discouraged from using the FPR and could
not split the variable across two different register classes.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 13:07       ` David Edelsohn
@ 2002-04-30 13:12         ` John David Anglin
  2002-04-30 13:37           ` Geoff Keating
  2002-04-30 14:04           ` law
  0 siblings, 2 replies; 117+ messages in thread
From: John David Anglin @ 2002-04-30 13:12 UTC (permalink / raw)
  To: David Edelsohn; +Cc: geoffk, gcc

> >>>>> Geoff Keating writes:
> 
> Geoff> Did you work out _why_ reg-alloc was chosing this class (and which
> Geoff> class it was choosing)?  Often these problems are actually caused by
> Geoff> the machine description, for instance by not having the union of two
> Geoff> register classes available, and a new register allocator won't help
> Geoff> that.
> 
> 	For the PowerPC case, DImode is allowed in FPRs.  A loop was
> generated with a pseudo that was used as an FP value elsewhere.  The
> register allocator could not be discouraged from using the FPR and could
> not split the variable across two different register classes.

DImode is also allowed in FPRs on the PA.  The class used by global_alloc
for the pseudo was GENERAL_OR_FP_REGS.  This was the first class selected
for the psuedo and a register %fr22 was selected from this class.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 13:12         ` John David Anglin
@ 2002-04-30 13:37           ` Geoff Keating
  2002-04-30 13:41             ` David Edelsohn
  2002-04-30 13:59             ` John David Anglin
  2002-04-30 14:04           ` law
  1 sibling, 2 replies; 117+ messages in thread
From: Geoff Keating @ 2002-04-30 13:37 UTC (permalink / raw)
  To: dave; +Cc: dje, gcc

> Date: Tue, 30 Apr 2002 16:07:12 -0400 (EDT)
> From: "John David Anglin" <dave@hiauly1.hia.nrc.ca>
> Cc: geoffk@geoffk.org, gcc@gcc.gnu.org
> X-OriginalArrivalTime: 30 Apr 2002 20:08:12.0152 (UTC) FILETIME=[C13D4F80:01C1F082]
> 
> > >>>>> Geoff Keating writes:
> > 
> > Geoff> Did you work out _why_ reg-alloc was chosing this class (and which
> > Geoff> class it was choosing)?  Often these problems are actually caused by
> > Geoff> the machine description, for instance by not having the union of two
> > Geoff> register classes available, and a new register allocator won't help
> > Geoff> that.
> > 
> > 	For the PowerPC case, DImode is allowed in FPRs.  A loop was
> > generated with a pseudo that was used as an FP value elsewhere.  The
> > register allocator could not be discouraged from using the FPR and could
> > not split the variable across two different register classes.
> 
> DImode is also allowed in FPRs on the PA.  The class used by global_alloc
> for the pseudo was GENERAL_OR_FP_REGS.  This was the first class selected
> for the psuedo and a register %fr22 was selected from this class.

Hmmm.  It should have been 'GENERAL_OR_FP_REGS, pref GENERAL_REGS',
because although FP_REGS is OK, GENERAL_REGS is cheaper (there are
fewer moves).  Perhaps a cost macro needs tweaking?

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 13:37           ` Geoff Keating
@ 2002-04-30 13:41             ` David Edelsohn
  2002-04-30 14:07               ` law
  2002-04-30 13:59             ` John David Anglin
  1 sibling, 1 reply; 117+ messages in thread
From: David Edelsohn @ 2002-04-30 13:41 UTC (permalink / raw)
  To: Geoff Keating; +Cc: dave, gcc

>>>>> Geoff Keating writes:

Geoff> Hmmm.  It should have been 'GENERAL_OR_FP_REGS, pref GENERAL_REGS',
Geoff> because although FP_REGS is OK, GENERAL_REGS is cheaper (there are
Geoff> fewer moves).  Perhaps a cost macro needs tweaking?

	It depends how the pseudo is being used throughout its entire
lifetime.  While a GPR is cheaper for the address case producing the bad
code, the pseudo may be used in an FP situation (possibly multiple times)
for which GPR is significantly more expensive.

	Either generating a single pseudo was bad or, if the uses are
distinct, the register allocator should have split the lifetime across
multiple registers for the time when each register class is cheaper.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 13:37           ` Geoff Keating
  2002-04-30 13:41             ` David Edelsohn
@ 2002-04-30 13:59             ` John David Anglin
  1 sibling, 0 replies; 117+ messages in thread
From: John David Anglin @ 2002-04-30 13:59 UTC (permalink / raw)
  To: geoffk; +Cc: dje, gcc

> > DImode is also allowed in FPRs on the PA.  The class used by global_alloc
> > for the pseudo was GENERAL_OR_FP_REGS.  This was the first class selected
> > for the psuedo and a register %fr22 was selected from this class.
> 
> Hmmm.  It should have been 'GENERAL_OR_FP_REGS, pref GENERAL_REGS',
> because although FP_REGS is OK, GENERAL_REGS is cheaper (there are
> fewer moves).  Perhaps a cost macro needs tweaking?

Perhaps.  REGISTER_MOVE_COST is 16 when one reg is a FP reg and the other
not, versus 2 otherwise.  However, I am not sure this actually comes
into play in the initial register selection.  The cost to load a DI mode
value is the same for a FPR as a GR.  It's not until a bit further along
the code path that the compiler discovers that it needs a rather expensive
reload to get the result into a general register.  Maybe the cause of
the problem is that there are two paths in which the psuedo can be set.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 13:12         ` John David Anglin
  2002-04-30 13:37           ` Geoff Keating
@ 2002-04-30 14:04           ` law
  2002-04-30 14:39             ` John David Anglin
  1 sibling, 1 reply; 117+ messages in thread
From: law @ 2002-04-30 14:04 UTC (permalink / raw)
  To: John David Anglin; +Cc: David Edelsohn, geoffk, gcc

 In message <200204302007.g3UK7DeK000134@hiauly1.hia.nrc.ca>, "John David 
Anglin" writes:
 > DImode is also allowed in FPRs on the PA.  The class used by global_alloc
 > for the pseudo was GENERAL_OR_FP_REGS.  This was the first class selected
 > for the psuedo and a register %fr22 was selected from this class.
Does the choice of GENERAL_OR_FP_REGS make sense given the uses/sets of
the particular register?  [ I'm probably not going to have time to look
seriously at this today. ]

jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 13:41             ` David Edelsohn
@ 2002-04-30 14:07               ` law
  2002-04-30 14:28                 ` John David Anglin
  0 siblings, 1 reply; 117+ messages in thread
From: law @ 2002-04-30 14:07 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Geoff Keating, dave, gcc

 In message <200204302038.QAA24876@makai.watson.ibm.com>, David Edelsohn 
writes:
 > 	It depends how the pseudo is being used throughout its entire
 > lifetime.  While a GPR is cheaper for the address case producing the bad
 > code, the pseudo may be used in an FP situation (possibly multiple times)
 > for which GPR is significantly more expensive.
Correct.  

 > 	Either generating a single pseudo was bad or, if the uses are
 > distinct, the register allocator should have split the lifetime across
 > multiple registers for the time when each register class is cheaper.
Our allocator doesn't know how to do this.
jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 14:07               ` law
@ 2002-04-30 14:28                 ` John David Anglin
  0 siblings, 0 replies; 117+ messages in thread
From: John David Anglin @ 2002-04-30 14:28 UTC (permalink / raw)
  To: law; +Cc: dje, geoffk, gcc

> 
>  In message <200204302038.QAA24876@makai.watson.ibm.com>, David Edelsohn 
> writes:
>  > 	It depends how the pseudo is being used throughout its entire
>  > lifetime.  While a GPR is cheaper for the address case producing the bad
>  > code, the pseudo may be used in an FP situation (possibly multiple times)
>  > for which GPR is significantly more expensive.
> Correct.  

I've appended the rtl output from the end of the lreg pass below.  The
problem is the register allocation for DI 714.  From that, we somehow get:

;; Start of basic block 258, registers live: 27 [%r27] 30 [%r30]
(note 3479 2994 2996 [bb 258] NOTE_INSN_BASIC_BLOCK)

(insn 2996 3479 2998 (set (reg:DI 1 %r1 [715])
        (plus:DI (reg:DI 27 %r27)
	    (high:DI (symbol_ref:DI ("PL_sv_yes"))))) 80 {*pa.md:2328} (nil)
    (nil))

(insn 2998 2996 3000 (set (reg:DI 50 %fr22 [714])
	(mem/u:DI (lo_sum:DI (reg:DI 1 %r1 [715])
		(unspec:DI[ 
			(symbol_ref:DI ("PL_sv_yes"))
		    ]  0)) [0 S8 A64])) 119 {*pa.md:3106} (insn_list 2996 (nil))
    (nil))

(jump_insn 3000 2998 3001 (set (pc)
	(label_ref 3008)) 269 {jump} (nil)
    (nil))
;; End of basic block 258, registers live:
 3 [%r3] 27 [%r27] 30 [%r30] 714
                                 ^^^^^
[...]

One thing that I see is that there is no register live info for 714.

Output from lreg:

;; Start of basic block 257, registers live: 3 [%r3] 27 [%r27] 30 [%r30] 634
(code_label 2990 2972 3478 954 "" "" [29 uses])

(note 3478 2990 2994 [bb 257] NOTE_INSN_BASIC_BLOCK)

(jump_insn 2994 3478 3479 (set (pc)
        (if_then_else (eq (subreg/s:SI (reg/v:DI 634) 4)
                (const_int 0 [0x0]))
            (label_ref 3002)
            (pc))) 44 {*pa.md:1497} (nil)
    (expr_list:REG_DEAD (reg/v:DI 634)
        (expr_list:REG_BR_PROB (const_int 5000 [0x1388])
            (nil))))
;; End of basic block 257, registers live:
 3 [%r3] 27 [%r27] 30 [%r30]

;; Start of basic block 258, registers live: 3 [%r3] 27 [%r27] 30 [%r30]
(note 3479 2994 2996 [bb 258] NOTE_INSN_BASIC_BLOCK)

(insn 2996 3479 2998 (set (reg:DI 715)
        (plus:DI (reg:DI 27 %r27)
            (high:DI (symbol_ref:DI ("PL_sv_yes"))))) 80 {*pa.md:2328} (nil)
    (nil))

(insn 2998 2996 3000 (set (reg:DI 714)
        (mem/u:DI (lo_sum:DI (reg:DI 715)
                (unspec:DI[ 
                        (symbol_ref:DI ("PL_sv_yes"))
                    ]  0)) [0 S8 A64])) 119 {*pa.md:3106} (insn_list 2996 (nil))
    (expr_list:REG_DEAD (reg:DI 715)
        (nil)))

(jump_insn 3000 2998 3001 (set (pc)
        (label_ref 3008)) 269 {jump} (nil)
    (nil))
;; End of basic block 258, registers live:
 3 [%r3] 27 [%r27] 30 [%r30] 714

(barrier 3001 3000 3002)

;; Start of basic block 259, registers live: 3 [%r3] 27 [%r27] 30 [%r30]
(code_label 3002 3001 3480 1012 "" "" [1 uses])

(note 3480 3002 3004 [bb 259] NOTE_INSN_BASIC_BLOCK)

(insn 3004 3480 3006 (set (reg:DI 716)
        (plus:DI (reg:DI 27 %r27)
            (high:DI (symbol_ref:DI ("PL_sv_no"))))) 80 {*pa.md:2328} (nil)
    (nil))

(insn 3006 3004 3008 (set (reg:DI 714)
        (mem/u:DI (lo_sum:DI (reg:DI 716)
                (unspec:DI[ 
                        (symbol_ref:DI ("PL_sv_no"))
                    ]  0)) [0 S8 A64])) 119 {*pa.md:3106} (insn_list 3004 (nil))
    (expr_list:REG_DEAD (reg:DI 716)
        (nil)))
;; End of basic block 259, registers live:
 3 [%r3] 27 [%r27] 30 [%r30] 714

;; Start of basic block 260, registers live: 3 [%r3] 27 [%r27] 30 [%r30] 714
(code_label 3008 3006 3481 1013 "" "" [1 uses])

(note 3481 3008 3010 [bb 260] NOTE_INSN_BASIC_BLOCK)

(insn 3010 3481 3011 (set (reg/f:DI 66)
        (reg:DI 714)) 119 {*pa.md:3106} (nil)
    (expr_list:REG_DEAD (reg:DI 714)
        (nil)))

(jump_insn 3011 3010 3012 (set (pc)
        (label_ref 3086)) 269 {jump} (nil)
    (nil))
;; End of basic block 260, registers live:
 3 [%r3] 27 [%r27] 30 [%r30] 66

(barrier 3012 3011 3013)

(note 3013 3012 3016 800003fffefe3c00 NOTE_INSN_BLOCK_END)

;; Start of basic block 261, registers live: 3 [%r3] 27 [%r27] 30 [%r30] 70 429
(code_label 3016 3013 3483 953 "" "" [1 uses])

(note 3483 3016 3019 [bb 261] NOTE_INSN_BASIC_BLOCK)

(insn 3019 3483 3020 (set (reg:SI 717)
        (const_int 56 [0x38])) 68 {*pa.md:2088} (nil)
    (expr_list:REG_EQUIV (const_int 56 [0x38])
        (nil)))

(jump_insn 3020 3019 3484 (set (pc)
        (if_then_else (ne (subreg/s:SI (reg/v:DI 70) 4)
                (reg:SI 717))
            (label_ref 3070)
            (pc))) 44 {*pa.md:1497} (insn_list 3019 (nil))
    (expr_list:REG_DEAD (reg/v:DI 70)
        (expr_list:REG_DEAD (reg:SI 717)
            (expr_list:REG_BR_PROB (const_int 7100 [0x1bbc])
                (nil)))))
;; End of basic block 261, registers live:
 3 [%r3] 27 [%r27] 30 [%r30] 429

;; Start of basic block 262, registers live: 3 [%r3] 27 [%r27] 30 [%r30] 429
(note 3484 3020 3025 [bb 262] NOTE_INSN_BASIC_BLOCK)

(insn 3025 3484 3027 (set (reg:SI 719)
        (mem/s/j:SI (plus:DI (reg/v/f:DI 429)
                (const_int 12 [0xc])) [0 <variable>.sv_flags+0 S4 A32])) 68 {*pa.md:2088} (nil)
    (expr_list:REG_EQUIV (mem/s/j:SI (plus:DI (reg/v/f:DI 429)
                (const_int 12 [0xc])) [0 <variable>.sv_flags+0 S4 A32])
        (nil)))

(note 3027 3025 3028 NOTE_INSN_DELETED)

(insn 3028 3027 3029 (set (reg:DI 718)
        (zero_extract:DI (subreg:DI (reg:SI 719) 0)
            (const_int 1 [0x1])
            (const_int 44 [0x2c]))) 287 {extzv_64} (insn_list 3025 (nil))
    (expr_list:REG_DEAD (reg:SI 719)
        (nil)))

(jump_insn 3029 3028 3485 (set (pc)
        (if_then_else (ne (subreg:SI (reg:DI 718) 4)
                (const_int 0 [0x0]))
            (label_ref 3044)
            (pc))) 44 {*pa.md:1497} (insn_list 3028 (nil))
    (expr_list:REG_BR_PROB (const_int 9900 [0x26ac])
        (expr_list:REG_DEAD (reg:DI 718)
            (nil))))
;; End of basic block 262, registers live:
 3 [%r3] 27 [%r27] 30 [%r30] 429

;; Start of basic block 263, registers live: 3 [%r3] 27 [%r27] 30 [%r30]
(note 3485 3029 3034 [bb 263] NOTE_INSN_BASIC_BLOCK)

(insn 3034 3485 3036 (set (reg:DI 721)
        (plus:DI (reg:DI 27 %r27)
            (high:DI (symbol_ref/v/f:DI ("*L$C0096"))))) 80 {*pa.md:2328} (nil)
    (nil))

(insn 3036 3034 3037 (set (reg:DI 721)
        (mem/u:DI (lo_sum:DI (reg:DI 721)
                (unspec:DI[ 
                        (symbol_ref/v/f:DI ("*L$C0096"))
                    ]  0)) [0 S8 A64])) 119 {*pa.md:3106} (insn_list 3034 (nil))
    (nil))

(insn 3037 3036 3039 (set (reg:DI 26 %r26)
        (reg:DI 721)) 119 {*pa.md:3106} (insn_list 3036 (nil))
    (expr_list:REG_DEAD (reg:DI 721)
        (nil)))

(insn 3039 3037 3040 (set (reg/f:DI 29 %r29)
        (plus:DI (reg/f:DI 30 %r30)
            (const_int -32 [0xffffffffffffffe0]))) 162 {*pa.md:3689} (nil)
    (nil))

(call_insn 3040 3039 3043 (parallel[ 
            (call (mem:SI (symbol_ref/v:DI ("@Perl_croak")) [0 S4 A32])
                (const_int 64 [0x40]))
            (clobber (reg:SI 2 %r2))
            (use (const_int 0 [0x0]))
        ] ) 271 {call_internal_symref} (insn_list 3037 (insn_list 3039 (nil)))
    (expr_list:REG_DEAD (reg:DI 26 %r26)
        (expr_list:REG_DEAD (reg:DI 27 %r27)
            (expr_list:REG_DEAD (reg/f:DI 29 %r29)
                (expr_list:REG_UNUSED (reg:SI 2 %r2)
                    (expr_list:REG_NORETURN (const_int 0 [0x0])
                        (nil))))))
    (expr_list (use (reg/f:DI 29 %r29))
        (expr_list (use (reg:DI 27 %r27))
            (expr_list (use (reg:DI 26 %r26))
                (nil)))))
;; End of basic block 263, registers live:
 3 [%r3] 27 [%r27] 30 [%r30]

(barrier 3043 3040 3044)

;; Start of basic block 264, registers live: 3 [%r3] 27 [%r27] 30 [%r30] 429
(code_label 3044 3043 3487 1016 "" "" [1 uses])

(note 3487 3044 3048 [bb 264] NOTE_INSN_BASIC_BLOCK)

(note 3048 3487 3051 800003fffefe3c80 NOTE_INSN_BLOCK_BEG)

(insn 3051 3048 3053 (set (reg/f:DI 724)
        (mem/s/j:DI (reg/v/f:DI 429) [0 <variable>.sv_any+0 S8 A64])) 119 {*pa.md:3106} (nil)
    (expr_list:REG_DEAD (reg/v/f:DI 429)
        (nil)))

(insn 3053 3051 3056 (set (reg/v/f:DI 723)
        (mem/s/j:DI (reg/f:DI 724) [0 <variable>.xrv_rv+0 S8 A64])) 119 {*pa.md:3106} (insn_list 3051 (nil))
    (expr_list:REG_DEAD (reg/f:DI 724)
        (nil)))

(jump_insn 3056 3053 3488 (set (pc)
        (if_then_else (eq (reg/v/f:DI 723)
                (const_int 0 [0x0]))
            (label_ref 3062)
            (pc))) 46 {*pa.md:1551} (insn_list 3053 (nil))
    (expr_list:REG_BR_PROB (const_int 1900 [0x76c])
        (nil)))
;; End of basic block 264, registers live:
 3 [%r3] 27 [%r27] 30 [%r30] 723

;; Start of basic block 265, registers live: 3 [%r3] 27 [%r27] 30 [%r30] 723
(note 3488 3056 3058 [bb 265] NOTE_INSN_BASIC_BLOCK)

(insn 3058 3488 3059 (set (reg:SI 727)
        (mem/s/j:SI (plus:DI (reg/v/f:DI 723)
                (const_int 8 [0x8])) [0 <variable>.sv_refcnt+0 S4 A64])) 68 {*pa.md:2088} (nil)
    (expr_list:REG_EQUIV (mem/s/j:SI (plus:DI (reg/v/f:DI 723)
                (const_int 8 [0x8])) [0 <variable>.sv_refcnt+0 S4 A64])
        (nil)))

(insn 3059 3058 3061 (set (reg:SI 726)
        (plus:SI (reg:SI 727)
            (const_int 1 [0x1]))) 165 {addsi3} (insn_list 3058 (nil))
    (expr_list:REG_EQUIV (mem/s/j:SI (plus:DI (reg/v/f:DI 723)
                (const_int 8 [0x8])) [0 <variable>.sv_refcnt+0 S4 A64])
        (expr_list:REG_DEAD (reg:SI 727)
            (nil))))

(insn 3061 3059 3062 (set (mem/s/j:SI (plus:DI (reg/v/f:DI 723)
                (const_int 8 [0x8])) [0 <variable>.sv_refcnt+0 S4 A64])
        (reg:SI 726)) 68 {*pa.md:2088} (insn_list 3059 (nil))
    (expr_list:REG_DEAD (reg:SI 726)
        (nil)))
;; End of basic block 265, registers live:
 3 [%r3] 27 [%r27] 30 [%r30] 723

;; Start of basic block 266, registers live: 3 [%r3] 27 [%r27] 30 [%r30] 723
(code_label 3062 3061 3489 1017 "" "" [1 uses])

(note 3489 3062 3063 [bb 266] NOTE_INSN_BASIC_BLOCK)

(note 3063 3489 3065 800003fffefe3c80 NOTE_INSN_BLOCK_END)

(insn 3065 3063 3066 (set (reg/f:DI 66)
        (reg/v/f:DI 723)) 119 {*pa.md:3106} (nil)
    (expr_list:REG_DEAD (reg/v/f:DI 723)
        (nil)))

(jump_insn 3066 3065 3067 (set (pc)
        (label_ref 3086)) 269 {jump} (nil)
    (nil))
;; End of basic block 266, registers live:
 3 [%r3] 27 [%r27] 30 [%r30] 66

(barrier 3067 3066 3070)

;; Start of basic block 267, registers live: 3 [%r3] 27 [%r27] 30 [%r30] 429
(code_label 3070 3067 3491 1015 "" "" [1 uses])

(note 3491 3070 3077 [bb 267] NOTE_INSN_BASIC_BLOCK)

(insn 3077 3491 3082 (set (reg/f:DI 66)
        (reg/v/f:DI 429)) 119 {*pa.md:3106} (nil)
    (expr_list:REG_DEAD (reg/v/f:DI 429)
        (nil)))
;; End of basic block 267, registers live:
 3 [%r3] 27 [%r27] 30 [%r30] 66

(note 3082 3077 3083 800003fffefe3d00 NOTE_INSN_BLOCK_END)

(note 3083 3082 3084 800003fffefe3e00 NOTE_INSN_BLOCK_END)

(note 3084 3083 3086 NOTE_INSN_FUNCTION_END)

;; Start of basic block 268, registers live: 3 [%r3] 27 [%r27] 30 [%r30] 66
(code_label 3086 3084 3494 783 "" "" [10 uses])

(note 3494 3086 3088 [bb 268] NOTE_INSN_BASIC_BLOCK)

(insn 3088 3494 3091 (set (reg/i:DI 28 %r28)
        (reg/f:DI 66)) 119 {*pa.md:3106} (nil)
    (expr_list:REG_DEAD (reg/f:DI 66)
        (nil)))

(insn 3091 3088 0 (use (reg/i:DI 28 %r28)) -1 (insn_list 3088 (nil))
    (nil))
;; End of basic block 268, registers live:
 3 [%r3] 27 [%r27] 28 [%r28] 30 [%r30]

-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 14:04           ` law
@ 2002-04-30 14:39             ` John David Anglin
  2002-04-30 14:54               ` law
                                 ` (2 more replies)
  0 siblings, 3 replies; 117+ messages in thread
From: John David Anglin @ 2002-04-30 14:39 UTC (permalink / raw)
  To: law; +Cc: dje, geoffk, gcc

>  In message <200204302007.g3UK7DeK000134@hiauly1.hia.nrc.ca>, "John David 
> Anglin" writes:
>  > DImode is also allowed in FPRs on the PA.  The class used by global_alloc
>  > for the pseudo was GENERAL_OR_FP_REGS.  This was the first class selected
>  > for the psuedo and a register %fr22 was selected from this class.
> Does the choice of GENERAL_OR_FP_REGS make sense given the uses/sets of
> the particular register?  [ I'm probably not going to have time to look
> seriously at this today. ]

No.  I would say the class should be GENERAL_REGS.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 14:39             ` John David Anglin
@ 2002-04-30 14:54               ` law
  2002-04-30 15:04                 ` John David Anglin
  2002-04-30 15:23               ` law
  2002-04-30 16:11               ` law
  2 siblings, 1 reply; 117+ messages in thread
From: law @ 2002-04-30 14:54 UTC (permalink / raw)
  To: John David Anglin; +Cc: dje, geoffk, gcc

In message <200204302128.g3ULSB8S000435@hiauly1.hia.nrc.ca>, "John David Anglin
" writes:
 > >  In message <200204302007.g3UK7DeK000134@hiauly1.hia.nrc.ca>, "John David 
 > > Anglin" writes:
 > >  > DImode is also allowed in FPRs on the PA.  The class used by global_all
 > oc
 > >  > for the pseudo was GENERAL_OR_FP_REGS.  This was the first class select
 > ed
 > >  > for the psuedo and a register %fr22 was selected from this class.
 > > Does the choice of GENERAL_OR_FP_REGS make sense given the uses/sets of
 > > the particular register?  [ I'm probably not going to have time to look
 > > seriously at this today. ]
 > 
 > No.  I would say the class should be GENERAL_REGS.
From looking at the RTL dump, I'm less certain.

The relevalt tidbits:
[ ... paths which set reg714 ... ]

(insn 3010 3481 3011 (set (reg/f:DI 66)
        (reg:DI 714)) 119 {*pa.md:3106} (nil)
    (expr_list:REG_DEAD (reg:DI 714)
        (nil)))

[ ... ]
(insn 3088 3494 3091 (set (reg/i:DI 28 %r28)
        (reg/f:DI 66)) 119 {*pa.md:3106} (nil)
    (expr_list:REG_DEAD (reg/f:DI 66)
        (nil)))

I suspect the copy from reg714 to reg66 is what's causing us to get bumped
into the various FP classes.

What I can't find a reason for is why we would use reg66 explicitly like that
in the first place.  Where in the world did (reg 66) come from?

jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 14:54               ` law
@ 2002-04-30 15:04                 ` John David Anglin
  0 siblings, 0 replies; 117+ messages in thread
From: John David Anglin @ 2002-04-30 15:04 UTC (permalink / raw)
  To: law; +Cc: dje, geoffk, gcc

> I suspect the copy from reg714 to reg66 is what's causing us to get bumped
> into the various FP classes.
> 
> What I can't find a reason for is why we would use reg66 explicitly like that
> in the first place.  Where in the world did (reg 66) come from?

Sharp eyes.   The first I see of DI 66 is back in insn 492.  It seems
to appear in some other functions as well.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 14:39             ` John David Anglin
  2002-04-30 14:54               ` law
@ 2002-04-30 15:23               ` law
  2002-04-30 15:29                 ` John David Anglin
  2002-04-30 16:11               ` law
  2 siblings, 1 reply; 117+ messages in thread
From: law @ 2002-04-30 15:23 UTC (permalink / raw)
  To: John David Anglin; +Cc: dje, geoffk, gcc

In message <200204302128.g3ULSB8S000435@hiauly1.hia.nrc.ca>, "John David Anglin
" writes:
 > >  In message <200204302007.g3UK7DeK000134@hiauly1.hia.nrc.ca>, "John David 
 > > Anglin" writes:
 > >  > DImode is also allowed in FPRs on the PA.  The class used by global_all
 > oc
 > >  > for the pseudo was GENERAL_OR_FP_REGS.  This was the first class select
 > ed
 > >  > for the psuedo and a register %fr22 was selected from this class.
 > > Does the choice of GENERAL_OR_FP_REGS make sense given the uses/sets of
 > > the particular register?  [ I'm probably not going to have time to look
 > > seriously at this today. ]
 > 
 > No.  I would say the class should be GENERAL_REGS.
OK.  A few quick pointers.

The existence of (reg:DI 66) in the RTL is very very suspicious and I think
that's ultimately what's causing this insanely stupid register allocation.

Now the question becomes how did (reg:DI 66) get into the RTL to begin with.
If you look at the decl for Perl_amagic_call, you'll find it sitting in
there.  Not good.

Strip out all the function definitions from your testfile except Perl_amagic_call,
then put a breakpoint in this code in function.c::expand_function_start:

6608              /* Create the pseudo.  */
6609              SET_DECL_RTL (DECL_RESULT (subr), gen_reg_rtx (GET_MODE (hard_reg)));

You'll find that instead of getting a pseudo of the appropriate mode, we
instead get hard reg 66 back from gen_reg_rtx.  Very very bad.  Something
extremely unpleasant is happening here.


jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 15:23               ` law
@ 2002-04-30 15:29                 ` John David Anglin
  2002-04-30 15:52                   ` law
  0 siblings, 1 reply; 117+ messages in thread
From: John David Anglin @ 2002-04-30 15:29 UTC (permalink / raw)
  To: law; +Cc: dje, geoffk, gcc

>  > No.  I would say the class should be GENERAL_REGS.
> OK.  A few quick pointers.
> 
> The existence of (reg:DI 66) in the RTL is very very suspicious and I think
> that's ultimately what's causing this insanely stupid register allocation.

This is hppa64 and the first pseudo is 61.  So, isn't 66 ok?

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 15:29                 ` John David Anglin
@ 2002-04-30 15:52                   ` law
  0 siblings, 0 replies; 117+ messages in thread
From: law @ 2002-04-30 15:52 UTC (permalink / raw)
  To: John David Anglin; +Cc: dje, geoffk, gcc

In message <200204302223.g3UMNdNn000757@hiauly1.hia.nrc.ca>, "John David Anglin
" writes:
 > >  > No.  I would say the class should be GENERAL_REGS.
 > > OK.  A few quick pointers.
 > > 
 > > The existence of (reg:DI 66) in the RTL is very very suspicious and I thin
 > k
 > > that's ultimately what's causing this insanely stupid register allocation.
 > 
 > This is hppa64 and the first pseudo is 61.  So, isn't 66 ok?
Ah.  Major stupidity leak on my part.

Anyway, looking at the register classing information, it's clearly wrong in
the sense that it picked GENERAL_OR_FP_REGS for pseudo 714 (and pseudo 66).
In fact, it picked GENERAL_OR_FP_REGS for a whole lot of pseudos which
really should have been GENERAL_REGS.  The dump data at the top of the .lreg
file will give you at least some of that data.

jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 14:39             ` John David Anglin
  2002-04-30 14:54               ` law
  2002-04-30 15:23               ` law
@ 2002-04-30 16:11               ` law
  2002-04-30 16:27                 ` John David Anglin
  2002-04-30 16:34                 ` Geoff Keating
  2 siblings, 2 replies; 117+ messages in thread
From: law @ 2002-04-30 16:11 UTC (permalink / raw)
  To: John David Anglin; +Cc: dje, geoffk, gcc

In message <200204302128.g3ULSB8S000435@hiauly1.hia.nrc.ca>, "John David Anglin
" writes:
 > >  In message <200204302007.g3UK7DeK000134@hiauly1.hia.nrc.ca>, "John David 
 > > Anglin" writes:
 > >  > DImode is also allowed in FPRs on the PA.  The class used by global_all
 > oc
 > >  > for the pseudo was GENERAL_OR_FP_REGS.  This was the first class select
 > ed
 > >  > for the psuedo and a register %fr22 was selected from this class.
 > > Does the choice of GENERAL_OR_FP_REGS make sense given the uses/sets of
 > > the particular register?  [ I'm probably not going to have time to look
 > > seriously at this today. ]
 > 
 > No.  I would say the class should be GENERAL_REGS.
Agreed now that you've remined me that 9reg 66) is a pseudo for PA64 :-)

What I find curious is that we have the same cost (0) for 
R1_REGS, GENERAL_REGS and FP_REGS for reg714, yet GENERAL_OR_FP_REGS has
a cost of 7000+?!?  Weird.

jeffk

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 16:11               ` law
@ 2002-04-30 16:27                 ` John David Anglin
  2002-04-30 16:41                   ` law
  2002-04-30 18:46                   ` law
  2002-04-30 16:34                 ` Geoff Keating
  1 sibling, 2 replies; 117+ messages in thread
From: John David Anglin @ 2002-04-30 16:27 UTC (permalink / raw)
  To: law; +Cc: dje, geoffk, gcc

>  > No.  I would say the class should be GENERAL_REGS.
> Agreed now that you've remined me that 9reg 66) is a pseudo for PA64 :-)
> 
> What I find curious is that we have the same cost (0) for 
> R1_REGS, GENERAL_REGS and FP_REGS for reg714, yet GENERAL_OR_FP_REGS has
> a cost of 7000+?!?  Weird.

This is the insn

(define_insn ""
  [(set (match_operand:DI 0 "reg_or_nonsymb_mem_operand"
				"=r,r,r,r,r,r,Q,*q,!f,f,*TR")
	(match_operand:DI 1 "move_operand"
				"A,r,J,N,K,RQ,rM,rM,!fM,*RT,f"))]
  "(register_operand (operands[0], DImode)
    || reg_or_0_operand (operands[1], DImode))
   && ! TARGET_SOFT_FLOAT && TARGET_64BIT"

I believe that the possible alternatives are the first and second last.
I don't understand the costs though I can sort of understand thar R1_REGS
might be low since %r1 is an input.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 16:11               ` law
  2002-04-30 16:27                 ` John David Anglin
@ 2002-04-30 16:34                 ` Geoff Keating
  2002-04-30 17:04                   ` law
  2002-04-30 18:48                   ` law
  1 sibling, 2 replies; 117+ messages in thread
From: Geoff Keating @ 2002-04-30 16:34 UTC (permalink / raw)
  To: law; +Cc: dave, dje, gcc

> cc: dje@watson.ibm.com, geoffk@geoffk.org, gcc@gcc.gnu.org
> Reply-To: law@redhat.com
> From: law@redhat.com
> Date: Tue, 30 Apr 2002 17:13:28 -0600
> X-OriginalArrivalTime: 30 Apr 2002 23:10:46.0384 (UTC) FILETIME=[42787700:01C1F09C]
> 
> In message <200204302128.g3ULSB8S000435@hiauly1.hia.nrc.ca>, "John David Anglin
> " writes:
>  > >  In message <200204302007.g3UK7DeK000134@hiauly1.hia.nrc.ca>, "John David 
>  > > Anglin" writes:
>  > >  > DImode is also allowed in FPRs on the PA.  The class used by global_all
>  > oc
>  > >  > for the pseudo was GENERAL_OR_FP_REGS.  This was the first class select
>  > ed
>  > >  > for the psuedo and a register %fr22 was selected from this class.
>  > > Does the choice of GENERAL_OR_FP_REGS make sense given the uses/sets of
>  > > the particular register?  [ I'm probably not going to have time to look
>  > > seriously at this today. ]
>  > 
>  > No.  I would say the class should be GENERAL_REGS.
> Agreed now that you've remined me that 9reg 66) is a pseudo for PA64 :-)
> 
> What I find curious is that we have the same cost (0) for 
> R1_REGS, GENERAL_REGS and FP_REGS for reg714, yet GENERAL_OR_FP_REGS has
> a cost of 7000+?!?  Weird.

Possibly because GENERAL_OR_FP_REGS doesn't match FP_REG_CLASS_P?

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 16:27                 ` John David Anglin
@ 2002-04-30 16:41                   ` law
  2002-04-30 18:46                   ` law
  1 sibling, 0 replies; 117+ messages in thread
From: law @ 2002-04-30 16:41 UTC (permalink / raw)
  To: John David Anglin; +Cc: dje, geoffk, gcc

 In message <200204302323.g3UNNLHv000893@hiauly1.hia.nrc.ca>, "John David 
Anglin
" writes:
 > This is the insn
 > 
 > (define_insn ""
 >   [(set (match_operand:DI 0 "reg_or_nonsymb_mem_operand"
 > 				"=r,r,r,r,r,r,Q,*q,!f,f,*TR")
 > 	(match_operand:DI 1 "move_operand"
 > 				"A,r,J,N,K,RQ,rM,rM,!fM,*RT,f"))]
 >   "(register_operand (operands[0], DImode)
 >     || reg_or_0_operand (operands[1], DImode))
 >    && ! TARGET_SOFT_FLOAT && TARGET_64BIT"
 > 
 > I believe that the possible alternatives are the first and second last.
 > I don't understand the costs though I can sort of understand thar R1_REGS
 > might be low since %r1 is an input.
You have to look at multiple insns to get the cost for any particular register.

ie, you have to look at every insn that sets or references reg 714.

There are two sets.  They match the pattern above, but can only be satisfied
by the first alternative.  The one reference is a simple reg->reg copy 
and can match the r->r alternative or !fM->f alternative.

The key point is that there are two insns which can only match the A->r
alternative which should have given us a very strong preference for a
general register.  Yet we still have zero cost for FP_REGS.

jeff


^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 16:34                 ` Geoff Keating
@ 2002-04-30 17:04                   ` law
  2002-04-30 18:48                   ` law
  1 sibling, 0 replies; 117+ messages in thread
From: law @ 2002-04-30 17:04 UTC (permalink / raw)
  To: Geoff Keating; +Cc: dave, dje, gcc

In message <200204302327.g3UNR2X26790@desire.geoffk.org>, Geoff Keating writes:
 > > cc: dje@watson.ibm.com, geoffk@geoffk.org, gcc@gcc.gnu.org
 > > Reply-To: law@redhat.com
 > > From: law@redhat.com
 > > Date: Tue, 30 Apr 2002 17:13:28 -0600
 > > X-OriginalArrivalTime: 30 Apr 2002 23:10:46.0384 (UTC) FILETIME=[42787700:
 > 01C1F09C]
 > > 
 > > In message <200204302128.g3ULSB8S000435@hiauly1.hia.nrc.ca>, "John David A
 > nglin
 > > " writes:
 > >  > >  In message <200204302007.g3UK7DeK000134@hiauly1.hia.nrc.ca>, "John D
 > avid 
 > >  > > Anglin" writes:
 > >  > >  > DImode is also allowed in FPRs on the PA.  The class used by globa
 > l_all
 > >  > oc
 > >  > >  > for the pseudo was GENERAL_OR_FP_REGS.  This was the first class s
 > elect
 > >  > ed
 > >  > >  > for the psuedo and a register %fr22 was selected from this class.
 > >  > > Does the choice of GENERAL_OR_FP_REGS make sense given the uses/sets 
 > of
 > >  > > the particular register?  [ I'm probably not going to have time to lo
 > ok
 > >  > > seriously at this today. ]
 > >  > 
 > >  > No.  I would say the class should be GENERAL_REGS.
 > > Agreed now that you've remined me that 9reg 66) is a pseudo for PA64 :-)
 > > 
 > > What I find curious is that we have the same cost (0) for 
 > > R1_REGS, GENERAL_REGS and FP_REGS for reg714, yet GENERAL_OR_FP_REGS has
 > > a cost of 7000+?!?  Weird.
 > 
 > Possibly because GENERAL_OR_FP_REGS doesn't match FP_REG_CLASS_P?
Unknown.  Do we actually pass super classes down to REGISTER_MOVE_COST?

jeffk

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 16:27                 ` John David Anglin
  2002-04-30 16:41                   ` law
@ 2002-04-30 18:46                   ` law
  2002-04-30 20:18                     ` John David Anglin
  2002-05-01  7:17                     ` Peter Barada
  1 sibling, 2 replies; 117+ messages in thread
From: law @ 2002-04-30 18:46 UTC (permalink / raw)
  To: John David Anglin; +Cc: dje, geoffk, gcc

In message <200204302323.g3UNNLHv000893@hiauly1.hia.nrc.ca>, "John David Anglin
" writes:
 > >  > No.  I would say the class should be GENERAL_REGS.
 > > Agreed now that you've remined me that (reg 66) is a pseudo for PA64 :-)
 > > 
 > > What I find curious is that we have the same cost (0) for 
 > > R1_REGS, GENERAL_REGS and FP_REGS for reg714, yet GENERAL_OR_FP_REGS has
 > > a cost of 7000+?!?  Weird.
 > 
 > This is the insn
 > 
 > (define_insn ""
 >   [(set (match_operand:DI 0 "reg_or_nonsymb_mem_operand"
 > 				"=r,r,r,r,r,r,Q,*q,!f,f,*TR")
 > 	(match_operand:DI 1 "move_operand"
 > 				"A,r,J,N,K,RQ,rM,rM,!fM,*RT,f"))]
 >   "(register_operand (operands[0], DImode)
 >     || reg_or_0_operand (operands[1], DImode))
 >    && ! TARGET_SOFT_FLOAT && TARGET_64BIT"
 > 
 > I believe that the possible alternatives are the first and second last.
 > I don't understand the costs though I can sort of understand thar R1_REGS
 > might be low since %r1 is an input.
The second to last (*RT->f) isn't supposed to match the (MEM (LO_SUM 
(UNSPEC...)))
expression.  That may be why we consider FP_REGS zero cost.  Or it could be
a complete red herring.  I'm still poking at it.

There are days I *really* wish our methods for recognizing a valid address had
more context (load vs store, load destination register type, store source 
register type, etc).  We play far too many unpleasant games in the PA backend
to recognize the asymmetric addressing modes available on the PA.

Anyway, I'm investigating what fixing the 'T' constraint handling does.  At
first glance the costing of the various register classes looks much better
and the abort goes away.

jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 10:48     ` John David Anglin
@ 2002-04-30 18:48       ` law
  2002-04-30 19:55         ` John David Anglin
  0 siblings, 1 reply; 117+ messages in thread
From: law @ 2002-04-30 18:48 UTC (permalink / raw)
  To: John David Anglin; +Cc: David Edelsohn, mark, dave.anglin, gcc

 In message <200204301746.g3UHkJ7Q028543@hiauly1.hia.nrc.ca>, "John David 
Anglin" writes:
 > > 	Alan Modra and the IBM ppc64 Linux team found similar bad register
 > > allocation for 64-bit PowerPC.  As an interim solution, we added splitters
 > > that handle FP registers for those patterns to rs6000.md.  I would suggest
 > > implementing something similar for PA-RISC GCC 3.1 or GCC 3.1.1.
 > 
 > Jeff, should we go ahead with a temporary fix for the ICE in the machine
 > definition or is there a chance for a better fix?  I am thinking of the
 > patch that I previously sent, modified to handle the situation in both
 > 32bit and 64bit code.  This would be just for the branch.
Let's see how fixing the 'T' constraint handling pans out.  While in theory
we need to handle address loads into FP registers, in practice we shouldn't.

jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 16:34                 ` Geoff Keating
  2002-04-30 17:04                   ` law
@ 2002-04-30 18:48                   ` law
  1 sibling, 0 replies; 117+ messages in thread
From: law @ 2002-04-30 18:48 UTC (permalink / raw)
  To: Geoff Keating; +Cc: dave, dje, gcc

 In message <200204302327.g3UNR2X26790@desire.geoffk.org>, Geoff Keating 
writes:
 > > What I find curious is that we have the same cost (0) for 
 > > R1_REGS, GENERAL_REGS and FP_REGS for reg714, yet GENERAL_OR_FP_REGS has
 > > a cost of 7000+?!?  Weird.
 > 
 > Possibly because GENERAL_OR_FP_REGS doesn't match FP_REG_CLASS_P?
Looks unrelated.  It's also the case that regclass.c knows how to deal with
superclasses in a pretty reasonable manner.  It looks like the costing is
messed up because the PA's 'T' constraint was accepting something it should
have rejected.

jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 18:48       ` law
@ 2002-04-30 19:55         ` John David Anglin
  2002-04-30 20:43           ` law
  0 siblings, 1 reply; 117+ messages in thread
From: John David Anglin @ 2002-04-30 19:55 UTC (permalink / raw)
  To: law; +Cc: dje, mark, dave.anglin, gcc

>  > Jeff, should we go ahead with a temporary fix for the ICE in the machine
>  > definition or is there a chance for a better fix?  I am thinking of the
>  > patch that I previously sent, modified to handle the situation in both
>  > 32bit and 64bit code.  This would be just for the branch.
> Let's see how fixing the 'T' constraint handling pans out.  While in theory
> we need to handle address loads into FP registers, in practice we shouldn't.

Trying to fix the 'T' constraint was in fact the approach I tried first.
Hopefully, I didn't do it correctly.  In my original patch, the 'A'
contraint was placed before the 'T' constraint because it accepted address
loads into FP registers.  If your 'T' fixes works and it proves necessary
to handle address loads into FP registers, possibly '!' could be used
to severely disparage the 'A' FP alternative.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 18:46                   ` law
@ 2002-04-30 20:18                     ` John David Anglin
  2002-05-01  7:17                     ` Peter Barada
  1 sibling, 0 replies; 117+ messages in thread
From: John David Anglin @ 2002-04-30 20:18 UTC (permalink / raw)
  To: law; +Cc: dje, geoffk, gcc

> Anyway, I'm investigating what fixing the 'T' constraint handling does.  At
> first glance the costing of the various register classes looks much better
> and the abort goes away.

Yea!

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 19:55         ` John David Anglin
@ 2002-04-30 20:43           ` law
  2002-04-30 21:06             ` John David Anglin
  0 siblings, 1 reply; 117+ messages in thread
From: law @ 2002-04-30 20:43 UTC (permalink / raw)
  To: John David Anglin; +Cc: dje, mark, dave.anglin, gcc

 In message <200205010252.g412qkvI001506@hiauly1.hia.nrc.ca>, "John David 
Anglin" writes:
 > Trying to fix the 'T' constraint was in fact the approach I tried first.
 > Hopefully, I didn't do it correctly.
Can you describe in a little more detail what you tried?

The underlying problem I see is that 'T' should be rejecting LO_SUM (...)
addresses (the 'T' constraint is for short displacement loads/stores).

 > In my original patch, the 'A'
 > contraint was placed before the 'T' constraint because it accepted address
 > loads into FP registers.  If your 'T' fixes works and it proves necessary
 > to handle address loads into FP registers, possibly '!' could be used
 > to severely disparage the 'A' FP alternative.
So far I haven't had to twiddle anything else other than to have the
'T' constraint reject PIC addresses.

jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 20:43           ` law
@ 2002-04-30 21:06             ` John David Anglin
  0 siblings, 0 replies; 117+ messages in thread
From: John David Anglin @ 2002-04-30 21:06 UTC (permalink / raw)
  To: law; +Cc: dje, mark, dave.anglin, gcc

>  > Trying to fix the 'T' constraint was in fact the approach I tried first.
>  > Hopefully, I didn't do it correctly.
> Can you describe in a little more detail what you tried?
> 
> The underlying problem I see is that 'T' should be rejecting LO_SUM (...)
> addresses (the 'T' constraint is for short displacement loads/stores).

I modified EXTRA_CONSTRAINT.  I've lost the exact details but I believe
that I removed the TARGET_PA_20 check, and added another '&&' to check
for LO_SUMs.

Dave
-- 
J. David Anglin                                  dave.anglin@nrc.ca
National Research Council of Canada              (613) 990-0752 (FAX: 952-6605)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PR 6394
  2002-04-30 18:46                   ` law
  2002-04-30 20:18                     ` John David Anglin
@ 2002-05-01  7:17                     ` Peter Barada
  1 sibling, 0 replies; 117+ messages in thread
From: Peter Barada @ 2002-05-01  7:17 UTC (permalink / raw)
  To: law; +Cc: dave, dje, geoffk, gcc


>There are days I *really* wish our methods for recognizing a valid
>address had more context (load vs store, load destination register
>type, store source register type, etc).  We play far too many
>unpleasant games in the PA backend to recognize the asymmetric
>addressing modes available on the PA. 

Its not limited to the PA. The m68k and ColdFire have these problems
as well.

-- 
Peter Barada                                   Peter.Barada@motorola.com
Wizard                                         781-852-2768 (direct)
WaveMark Solutions(wholly owned by Motorola)   781-270-0193 (fax)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Dumb register allocation (PPC)
@ 2002-05-12 12:50 Zack Weinberg
  2002-05-12 16:59 ` law
  0 siblings, 1 reply; 117+ messages in thread
From: Zack Weinberg @ 2002-05-12 12:50 UTC (permalink / raw)
  To: gcc

I noticed this while I was looking at something else.  This function

static hashval_t
const_double_htab_hash (x)
     const void *x;
{
  hashval_t h = 0;
  size_t i;
  rtx value = (rtx) x;

  for (i = 0; i < sizeof(CONST_DOUBLE_FORMAT)-1; i++)
    h ^= XWINT (value, i);
  return h;
}

gets compiled like this on PowerPC (-mregnames for clarity):

const_double_htab_hash:
	li	%r4, 3
	li	%r9, 0
	mtctr	%r4
	addi	%r3, %r3, 12
.L13:
	lwz	%r4, 0(%r3)
	addi	%r3, %r3, 8
	xor	%r9, %r9, %r4
	bdnz	.L13
	mr	%r3, %r9
	blr

Unless there is something weird about the architecture which I'm not
aware of, it appears that we could get rid of the final move
instruction by swapping %r9 and %r3.  %r3 holds the input parameter
but we've got three-operand add so that's not a difficulty.

const_double_htab_hash:
	li	%r4, 3
	addi	%r9, %r3, 12
	mtctr	%r4
	li	%r3, 0
.L13:
	lwz	%r4, 0(%r9)
	addi	%r9, %r9, 8
	xor	%r3, %r3, %r4
	bdnz	.L13
	blr

So why don't we do that automatically?  Presumably this is register
allocation's fault...

zw

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-12 12:50 Dumb register allocation (PPC) Zack Weinberg
@ 2002-05-12 16:59 ` law
  2002-05-12 21:38   ` David Edelsohn
  0 siblings, 1 reply; 117+ messages in thread
From: law @ 2002-05-12 16:59 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: gcc

In message <20020512193645.GA374@codesourcery.com>, Zack Weinberg writes:
 > I noticed this while I was looking at something else.  This function
 > 
 > static hashval_t
 > const_double_htab_hash (x)
 >      const void *x;
 > {
 >   hashval_t h = 0;
 >   size_t i;
 >   rtx value = (rtx) x;
 > 
 >   for (i = 0; i < sizeof(CONST_DOUBLE_FORMAT)-1; i++)
 >     h ^= XWINT (value, i);
 >   return h;
 > }
 > 
 > gets compiled like this on PowerPC (-mregnames for clarity):
 > 
 > const_double_htab_hash:
 > 	li	%r4, 3
 > 	li	%r9, 0
 > 	mtctr	%r4
 > 	addi	%r3, %r3, 12
 > .L13:
 > 	lwz	%r4, 0(%r3)
 > 	addi	%r3, %r3, 8
 > 	xor	%r9, %r9, %r4
 > 	bdnz	.L13
 > 	mr	%r3, %r9
 > 	blr
 > 
 > Unless there is something weird about the architecture which I'm not
 > aware of, it appears that we could get rid of the final move
 > instruction by swapping %r9 and %r3.  %r3 holds the input parameter
 > but we've got three-operand add so that's not a difficulty.
 > 
 > const_double_htab_hash:
 > 	li	%r4, 3
 > 	addi	%r9, %r3, 12
 > 	mtctr	%r4
 > 	li	%r3, 0
 > .L13:
 > 	lwz	%r4, 0(%r9)
 > 	addi	%r9, %r9, 8
 > 	xor	%r3, %r3, %r4
 > 	bdnz	.L13
 > 	blr
 > 
 > So why don't we do that automatically?  Presumably this is register
 > allocation's fault...
Start looking at the dumps.  Of particular interest will be the incoming
hard register, the outgoing hard register and any pseudos in between.

You'd start the process by looking to see if they conflict with each other.

jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-12 16:59 ` law
@ 2002-05-12 21:38   ` David Edelsohn
  2002-05-12 22:44     ` Zack Weinberg
  2002-05-14 13:15     ` Dale Johannesen
  0 siblings, 2 replies; 117+ messages in thread
From: David Edelsohn @ 2002-05-12 21:38 UTC (permalink / raw)
  To: law, Zack Weinberg; +Cc: gcc

	Please see the

Trivial code generation stupidity thread:
http://gcc.gnu.org/ml/gcc/2001-12/msg00907.html

which evolved into the
cant_combine_insn_p hard_reg->reg moves thread:
http://gcc.gnu.org/ml/gcc/2001-12/msg01283.html

This is a general difficulty of GCC handling argument registers.  The new
register alloctor will allow us to start addressing it.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-12 21:38   ` David Edelsohn
@ 2002-05-12 22:44     ` Zack Weinberg
  2002-05-13 18:20       ` Richard Henderson
  2002-05-14 13:15     ` Dale Johannesen
  1 sibling, 1 reply; 117+ messages in thread
From: Zack Weinberg @ 2002-05-12 22:44 UTC (permalink / raw)
  To: David Edelsohn; +Cc: law, gcc

On Sun, May 12, 2002 at 07:54:56PM -0400, David Edelsohn wrote:
> 	Please see the
> 
> Trivial code generation stupidity thread:
> http://gcc.gnu.org/ml/gcc/2001-12/msg00907.html
> 
> which evolved into the
> cant_combine_insn_p hard_reg->reg moves thread:
> http://gcc.gnu.org/ml/gcc/2001-12/msg01283.html
> 
> This is a general difficulty of GCC handling argument registers.  The new
> register alloctor will allow us to start addressing it.

That's what I thought.

I observe that -fcprop-registers (as suggested in the thread you
referenced) is already on with -O1 and doesn't help any.

zw

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-12 22:44     ` Zack Weinberg
@ 2002-05-13 18:20       ` Richard Henderson
  0 siblings, 0 replies; 117+ messages in thread
From: Richard Henderson @ 2002-05-13 18:20 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: David Edelsohn, law, gcc

On Sun, May 12, 2002 at 09:54:51PM -0700, Zack Weinberg wrote:
> I observe that -fcprop-registers (as suggested in the thread you
> referenced) is already on with -O1 and doesn't help any.

That pass does forward propagation, and thus handles incoming arguments,
but does not handle return registers or outgoing arguments.


r~

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-14 13:15     ` Dale Johannesen
@ 2002-05-14 13:15       ` David Edelsohn
  2002-05-15  0:00         ` Richard Henderson
  0 siblings, 1 reply; 117+ messages in thread
From: David Edelsohn @ 2002-05-14 13:15 UTC (permalink / raw)
  To: Dale Johannesen; +Cc: law, Zack Weinberg, gcc

>>>>> Dale Johannesen writes:

Dale> I haven't bothered submitting this
Dale> given jh's results and the firmness of rth's opinion
Dale> in the thread you quote, but I could.

	Please do submit the patch.  Everyone here has strong opinions,
but facts always trump opinions.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-12 21:38   ` David Edelsohn
  2002-05-12 22:44     ` Zack Weinberg
@ 2002-05-14 13:15     ` Dale Johannesen
  2002-05-14 13:15       ` David Edelsohn
  1 sibling, 1 reply; 117+ messages in thread
From: Dale Johannesen @ 2002-05-14 13:15 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Dale Johannesen, law, Zack Weinberg, gcc


On Sunday, May 12, 2002, at 04:54 PM, David Edelsohn wrote:

> 	Please see the
>
> Trivial code generation stupidity thread:
> http://gcc.gnu.org/ml/gcc/2001-12/msg00907.html
>
> which evolved into the
> cant_combine_insn_p hard_reg->reg moves thread:
> http://gcc.gnu.org/ml/gcc/2001-12/msg01283.html
>
> This is a general difficulty of GCC handling argument registers.  The new
> register alloctor will allow us to start addressing it.

FWIW, I've got the check in cant_combine_insn_p
conditionalized on if(SMALL_REGISTER_CLASSES), and
everything works fine on ppc, and the code does improve.
If you look back at the thread when this patch went in:
http://gcc.gnu.org/ml/gcc-patches/2000-11/msg01466.html
you'll see there was some discussion of doing what I did,
but Jan Hubicka claimed improvements with the patch
even on non-SRC machines so it didn't go in that way;
but this patch is not a win on ppc according to my
measurements.  I haven't bothered submitting this
given jh's results and the firmness of rth's opinion
in the thread you quote, but I could.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-14 13:15       ` David Edelsohn
@ 2002-05-15  0:00         ` Richard Henderson
  2002-05-15  8:27           ` David Edelsohn
  0 siblings, 1 reply; 117+ messages in thread
From: Richard Henderson @ 2002-05-15  0:00 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Dale Johannesen, law, Zack Weinberg, gcc

On Tue, May 14, 2002 at 03:13:23PM -0400, David Edelsohn wrote:
> 	Please do submit the patch.  Everyone here has strong opinions,
> but facts always trump opinions.

Eh?  My objection is still going to be potential reload
failures.  Which "fact" is going to ensure that you
introduce no failures on any target?


r~

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-15  0:00         ` Richard Henderson
@ 2002-05-15  8:27           ` David Edelsohn
  2002-05-15 11:47             ` Geoff Keating
  2002-05-15 11:50             ` David Edelsohn
  0 siblings, 2 replies; 117+ messages in thread
From: David Edelsohn @ 2002-05-15  8:27 UTC (permalink / raw)
  To: Richard Henderson, Dale Johannesen, law, Zack Weinberg, gcc

>>>>> Richard Henderson writes:

Richard> Eh?  My objection is still going to be potential reload
Richard> failures.  Which "fact" is going to ensure that you
Richard> introduce no failures on any target?

	So maybe this should be enabled on a target-by-target basis
instead of SMALL_REGISTER_CLASSES?

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-15  8:27           ` David Edelsohn
@ 2002-05-15 11:47             ` Geoff Keating
  2002-05-15 12:08               ` law
  2002-05-15 11:50             ` David Edelsohn
  1 sibling, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-05-15 11:47 UTC (permalink / raw)
  To: David Edelsohn; +Cc: gcc

David Edelsohn <dje@watson.ibm.com> writes:

> >>>>> Richard Henderson writes:
> 
> Richard> Eh?  My objection is still going to be potential reload
> Richard> failures.  Which "fact" is going to ensure that you
> Richard> introduce no failures on any target?
> 
> 	So maybe this should be enabled on a target-by-target basis
> instead of SMALL_REGISTER_CLASSES?

Either the change is correct or it is not.  If it is correct, it works
for all targets.  If it is not, it'll probably break something on most
targets, and we don't want it.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Dumb register allocation (PPC)
  2002-05-15  8:27           ` David Edelsohn
  2002-05-15 11:47             ` Geoff Keating
@ 2002-05-15 11:50             ` David Edelsohn
  2002-05-15 12:38               ` law
  2002-05-15 12:43               ` Geoff Keating
  1 sibling, 2 replies; 117+ messages in thread
From: David Edelsohn @ 2002-05-15 11:50 UTC (permalink / raw)
  To: Geoff Keating, dalej; +Cc: gcc

>>>>> Geoff Keating writes:

Geoff> Either the change is correct or it is not.  If it is correct, it works
Geoff> for all targets.  If it is not, it'll probably break something on most
Geoff> targets, and we don't want it.

	So we're essentially in the same position as the store-motion
discussion: we believe that enabling the functionality may cause failures,
but we do not have a testcase showing a failure.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-15 11:47             ` Geoff Keating
@ 2002-05-15 12:08               ` law
  0 siblings, 0 replies; 117+ messages in thread
From: law @ 2002-05-15 12:08 UTC (permalink / raw)
  To: Geoff Keating; +Cc: David Edelsohn, gcc

In message <jmptzxb6m4.fsf@desire.geoffk.org>, Geoff Keating writes:
 > David Edelsohn <dje@watson.ibm.com> writes:
 > 
 > > >>>>> Richard Henderson writes:
 > > 
 > > Richard> Eh?  My objection is still going to be potential reload
 > > Richard> failures.  Which "fact" is going to ensure that you
 > > Richard> introduce no failures on any target?
 > > 
 > > 	So maybe this should be enabled on a target-by-target basis
 > > instead of SMALL_REGISTER_CLASSES?
 > 
 > Either the change is correct or it is not.  If it is correct, it works
 > for all targets.  If it is not, it'll probably break something on most
 > targets, and we don't want it.
So, with that in mind, does this code lengthen the lifetime of the
return register?  If it does, then that's a serious correctness
issue.



jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-15 11:50             ` David Edelsohn
@ 2002-05-15 12:38               ` law
  2002-05-15 12:43               ` Geoff Keating
  1 sibling, 0 replies; 117+ messages in thread
From: law @ 2002-05-15 12:38 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Geoff Keating, dalej, gcc

In message <200205151847.OAA25950@makai.watson.ibm.com>, David Edelsohn writes:
 > >>>>> Geoff Keating writes:
 > 
 > Geoff> Either the change is correct or it is not.  If it is correct, it work
 > s
 > Geoff> for all targets.  If it is not, it'll probably break something on mos
 > t
 > Geoff> targets, and we don't want it.
 > 
 > 	So we're essentially in the same position as the store-motion
 > discussion: we believe that enabling the functionality may cause failures,
 > but we do not have a testcase showing a failure.
If the code potentially extends the lifetime of the return register, then
it is incorrect.  

GCC is not prepared to deal with the return register being live except
immediately after a function call and immedietly before the current
function returns.

Jeff

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-15 11:50             ` David Edelsohn
  2002-05-15 12:38               ` law
@ 2002-05-15 12:43               ` Geoff Keating
  2002-05-15 12:55                 ` David Edelsohn
  1 sibling, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-05-15 12:43 UTC (permalink / raw)
  To: dje; +Cc: dalej, gcc

> cc: gcc@gcc.gnu.org
> Date: Wed, 15 May 2002 14:47:27 -0400
> From: David Edelsohn <dje@watson.ibm.com>
> 
> >>>>> Geoff Keating writes:
> 
> Geoff> Either the change is correct or it is not.  If it is correct, it works
> Geoff> for all targets.  If it is not, it'll probably break something on most
> Geoff> targets, and we don't want it.
> 
> 	So we're essentially in the same position as the store-motion
> discussion: we believe that enabling the functionality may cause failures,
> but we do not have a testcase showing a failure.

In both cases, the solution is the same; the author of the patch needs
to explain why the patch is correct.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: Dumb register allocation (PPC)
  2002-05-15 12:43               ` Geoff Keating
@ 2002-05-15 12:55                 ` David Edelsohn
  0 siblings, 0 replies; 117+ messages in thread
From: David Edelsohn @ 2002-05-15 12:55 UTC (permalink / raw)
  To: Geoff Keating; +Cc: dalej, gcc

>>>>> Geoff Keating writes:

Geoff> In both cases, the solution is the same; the author of the patch needs
Geoff> to explain why the patch is correct.

	That is an oversimplification.  At some level this evidence turns
into proving a negative.  If we apply that to everyone, only trivial
patches ever can be approved.  Given the number of patches which have
"been explained to be correct" yet caused regressions, we obviously are
not following that principle to the extreme.  If you want to volunteer to
go first, be my guest.  We cannot apply the rules more strictly for some.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* PCH merge bootstrap failure on systems without flex
@ 2002-06-05 12:28 Kaveh R. Ghazi
  2002-06-05 19:37 ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Kaveh R. Ghazi @ 2002-06-05 12:28 UTC (permalink / raw)
  To: geoffk; +Cc: gcc-bugs, gcc

I'm getting bootstrap failures in stage1 because there's no flex on my
box.

 > checking for flex... false
 > [...]
 > false  -o../../egcc-CVS20020605/gcc/gengtype-lex.c
 > ../../egcc-CVS20020605/gcc/gengtype-lex.l \
 >  || ( rm -f ../../egcc-CVS20020605/gcc/gengtype-lex.c && false )
 > make[2]: *** [../../egcc-CVS20020605/gcc/gengtype-lex.c] Error 255

Are we now requiring flex as part of the bootstrap process?  Was this
decision made knowingly or by accident?

		Thanks,
		--Kaveh
--
Kaveh R. Ghazi			Director of Systems Architecture
ghazi@caip.rutgers.edu		Qwest Solutions

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PCH merge bootstrap failure on systems without flex
  2002-06-05 12:28 PCH merge bootstrap failure on systems without flex Kaveh R. Ghazi
@ 2002-06-05 19:37 ` Geoff Keating
  2002-06-05 19:59   ` David Edelsohn
  0 siblings, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-06-05 19:37 UTC (permalink / raw)
  To: ghazi; +Cc: gcc-bugs, gcc

> Date: Wed, 5 Jun 2002 15:22:57 -0400 (EDT)
> From: "Kaveh R. Ghazi" <ghazi@caip.rutgers.edu>
> Cc: gcc-bugs@gcc.gnu.org, gcc@gcc.gnu.org
> 
> I'm getting bootstrap failures in stage1 because there's no flex on my
> box.
> 
>  > checking for flex... false
>  > [...]
>  > false  -o../../egcc-CVS20020605/gcc/gengtype-lex.c
>  > ../../egcc-CVS20020605/gcc/gengtype-lex.l \
>  >  || ( rm -f ../../egcc-CVS20020605/gcc/gengtype-lex.c && false )
>  > make[2]: *** [../../egcc-CVS20020605/gcc/gengtype-lex.c] Error 255
> 
> Are we now requiring flex as part of the bootstrap process?  Was this
> decision made knowingly or by accident?

I intentionally used flex.  I wasn't the first to introduce it into
mainline, though; that honour goes to Tim Josling who added
gcc/treelang/lex.l.  If I got the makefiles right, gengtype-lex.c
should be pregenerated in the release tarball, just like the uses of
bison.

Even before that, flex was required to build ld from CVS, so it's not
a new tool in the toolchain.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PCH merge bootstrap failure on systems without flex
  2002-06-05 19:37 ` Geoff Keating
@ 2002-06-05 19:59   ` David Edelsohn
  2002-06-05 20:22     ` Kaveh R. Ghazi
  2002-06-05 20:37     ` Geoff Keating
  0 siblings, 2 replies; 117+ messages in thread
From: David Edelsohn @ 2002-06-05 19:59 UTC (permalink / raw)
  To: Geoff Keating; +Cc: ghazi, gcc-bugs, gcc

>>>>> Geoff Keating writes:

Geoff> I intentionally used flex.  I wasn't the first to introduce it into
Geoff> mainline, though; that honour goes to Tim Josling who added
Geoff> gcc/treelang/lex.l.  If I got the makefiles right, gengtype-lex.c
Geoff> should be pregenerated in the release tarball, just like the uses of
Geoff> bison.

	GCC CVS frequently includes pre-processed files and gcc_update
adjusts the timestamps so that one does not need the tool unless one is
modifying the source file.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PCH merge bootstrap failure on systems without flex
  2002-06-05 19:59   ` David Edelsohn
@ 2002-06-05 20:22     ` Kaveh R. Ghazi
  2002-06-10  0:01       ` Mark Mitchell
  2002-06-05 20:37     ` Geoff Keating
  1 sibling, 1 reply; 117+ messages in thread
From: Kaveh R. Ghazi @ 2002-06-05 20:22 UTC (permalink / raw)
  To: dje, geoffk; +Cc: gcc-bugs, gcc

 > From: David Edelsohn <dje@watson.ibm.com>
 > 
 > >>>>> Geoff Keating writes:
 > 
 > Geoff> I intentionally used flex.  I wasn't the first to introduce it into
 > Geoff> mainline, though; that honour goes to Tim Josling who added
 > Geoff> gcc/treelang/lex.l.  If I got the makefiles right, gengtype-lex.c
 > Geoff> should be pregenerated in the release tarball, just like the uses of
 > Geoff> bison.
 > 
 > 	GCC CVS frequently includes pre-processed files and gcc_update
 > adjusts the timestamps so that one does not need the tool unless one is
 > modifying the source file.
 > David

We do that e.g. for autoconf and automake output, but not for bison
generated files (in CVS.)  But those are more fundamental to the build
process I guess.  Do you think we should include the flex output in
CVS?

		--Kaveh
--
Kaveh R. Ghazi			Director of Systems Architecture
ghazi@caip.rutgers.edu		Qwest Solutions

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PCH merge bootstrap failure on systems without flex
  2002-06-05 19:59   ` David Edelsohn
  2002-06-05 20:22     ` Kaveh R. Ghazi
@ 2002-06-05 20:37     ` Geoff Keating
  1 sibling, 0 replies; 117+ messages in thread
From: Geoff Keating @ 2002-06-05 20:37 UTC (permalink / raw)
  To: dje; +Cc: ghazi, gcc-bugs, gcc

> cc: ghazi@caip.rutgers.edu, gcc-bugs@gcc.gnu.org, gcc@gcc.gnu.org
> Date: Wed, 05 Jun 2002 22:36:28 -0400
> From: David Edelsohn <dje@watson.ibm.com>
> 
> >>>>> Geoff Keating writes:
> 
> Geoff> I intentionally used flex.  I wasn't the first to introduce it into
> Geoff> mainline, though; that honour goes to Tim Josling who added
> Geoff> gcc/treelang/lex.l.  If I got the makefiles right, gengtype-lex.c
> Geoff> should be pregenerated in the release tarball, just like the uses of
> Geoff> bison.
> 
> 	GCC CVS frequently includes pre-processed files and gcc_update
> adjusts the timestamps so that one does not need the tool unless one is
> modifying the source file.

I believe the rule is that 'configure ; make' should work; so files
from autoconf, automake, and so on are in CVS, because configure needs
them, but files from bison, flex, and makeinfo are not, since the
makefiles can generate those.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PCH merge bootstrap failure on systems without flex
  2002-06-05 20:22     ` Kaveh R. Ghazi
@ 2002-06-10  0:01       ` Mark Mitchell
  2002-06-10  3:17         ` Gerald Pfeifer
       [not found]         ` <Pine.BSF.4.44.0206101202120.46487-100000@naos.dbai.tuwien. ac.at>
  0 siblings, 2 replies; 117+ messages in thread
From: Mark Mitchell @ 2002-06-10  0:01 UTC (permalink / raw)
  To: Kaveh R. Ghazi, dje, geoffk; +Cc: gcc-bugs, gcc


> We do that e.g. for autoconf and automake output, but not for bison
> generated files (in CVS.)  But those are more fundamental to the build
> process I guess.  Do you think we should include the flex output in
> CVS?

Definitely not.

In fact, we should consider removing *all* generated files from CVS
at some point, including "configure".  (Unfortunately, right now,
differents parts of the tree require different versions of autoconf.)

We want generated files in tar files so that you can build GCC from
a release tarball with a minimum of installed tools.  We don't want
them in CVS because we assume people using CVS are developers and we
assume developers can get the tools.  And having the generated files
in CVS has lead to all kinds of problems over time.

We've had this debate before; let's not to it again.

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PCH merge bootstrap failure on systems without flex
  2002-06-10  0:01       ` Mark Mitchell
@ 2002-06-10  3:17         ` Gerald Pfeifer
  2002-06-10  7:48           ` Kaveh R. Ghazi
       [not found]         ` <Pine.BSF.4.44.0206101202120.46487-100000@naos.dbai.tuwien. ac.at>
  1 sibling, 1 reply; 117+ messages in thread
From: Gerald Pfeifer @ 2002-06-10  3:17 UTC (permalink / raw)
  To: Mark Mitchell; +Cc: Kaveh R. Ghazi, dje, geoffk, gcc-bugs, gcc

On Sun, 9 Jun 2002, Mark Mitchell wrote:
> We want generated files in tar files so that you can build GCC from
> a release tarball with a minimum of installed tools.
> [...]
> We've had this debate before; let's not to it again.

Well, that we've had this debate before doesn't mean we came to a commonly
accepted conclusion.

Requiring flex where we already require bison is one thing, but requiring
a full development chain including the auto-tools is something we should
carefully evaluate.

> We don't want them in CVS because we assume people using CVS are
> developers and we assume developers can get the tools.

This assumption neglects testers (of which we have too few, especially on
not-so-common platforms, and many of which use guest accounts with limited
resources to perform their testing).

Gerald
-- 
Gerald "Jerry" pfeifer@dbai.tuwien.ac.at http://www.dbai.tuwien.ac.at/~pfeifer/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PCH merge bootstrap failure on systems without flex
       [not found]         ` <Pine.BSF.4.44.0206101202120.46487-100000@naos.dbai.tuwien. ac.at>
@ 2002-06-10  5:41           ` Andrea 'Fyre Wyzard' Bocci
  0 siblings, 0 replies; 117+ messages in thread
From: Andrea 'Fyre Wyzard' Bocci @ 2002-06-10  5:41 UTC (permalink / raw)
  To: Gerald Pfeifer, Mark Mitchell; +Cc: Kaveh R. Ghazi, dje, geoffk, gcc-bugs, gcc

At 12:06 10/06/2002 +0200, Gerald Pfeifer wrote:
>On Sun, 9 Jun 2002, Mark Mitchell wrote:
> > We don't want them in CVS because we assume people using CVS are
> > developers and we assume developers can get the tools.
>
>This assumption neglects testers (of which we have too few, especially on
>not-so-common platforms, and many of which use guest accounts with limited
>resources to perform their testing).

Well, I actually don't have the faintest idea, but if someone doesn't have 
a table account to test gcc with, maybe will be using weekly or prerelease 
snapshots instead of CVS, and those have generated files, don't they ?

Just a thought, all in all.

fwyzard


^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: PCH merge bootstrap failure on systems without flex
  2002-06-10  3:17         ` Gerald Pfeifer
@ 2002-06-10  7:48           ` Kaveh R. Ghazi
  0 siblings, 0 replies; 117+ messages in thread
From: Kaveh R. Ghazi @ 2002-06-10  7:48 UTC (permalink / raw)
  To: mark, pfeifer; +Cc: dje, gcc-bugs, gcc, geoffk

 > From: Gerald Pfeifer <pfeifer@dbai.tuwien.ac.at>
 > 
 > On Sun, 9 Jun 2002, Mark Mitchell wrote:
 > > We want generated files in tar files so that you can build GCC from
 > > a release tarball with a minimum of installed tools.
 > > [...]
 > > We've had this debate before; let's not to it again.
 > 
 > Well, that we've had this debate before doesn't mean we came to a commonly
 > accepted conclusion.
 > 
 > Requiring flex where we already require bison is one thing, but requiring
 > a full development chain including the auto-tools is something we should
 > carefully evaluate.

Agreed.


 > > We don't want them in CVS because we assume people using CVS are
 > > developers and we assume developers can get the tools.
 > 
 > This assumption neglects testers (of which we have too few, especially on
 > not-so-common platforms, and many of which use guest accounts with limited
 > resources to perform their testing).
 > Gerald

Yeah, I'm in that boat.

While flex by itself isn't such a big deal, I think its important to
evaluate the cost when we increase these kinds of dependencies.

Certainly, I'm against deleting auto-tool generated files from CVS.

		--Kaveh
--
Kaveh R. Ghazi			Director of Systems Architecture
ghazi@caip.rutgers.edu		Qwest Solutions

^ permalink raw reply	[flat|nested] 117+ messages in thread

* whither specs?
@ 2002-06-10 10:31 ` Geoff Keating
  2002-06-10 10:55   ` David Edelsohn
                     ` (3 more replies)
  0 siblings, 4 replies; 117+ messages in thread
From: Geoff Keating @ 2002-06-10 10:31 UTC (permalink / raw)
  To: gcc


My recent patch to the driver allowed it to include target-dependent
code.  Now that we no longer need to encapsulate all the
target-dependent argument passing in a separate file, I think this is
a good time to sit back and ask ourselves:

- Do we still want to have the 'specs' file in $(gcc_tooldir)?  It's
  now redundant unless the user edits it, since the driver generated
  it in the first place.  I guess the real question is, "do we want
  to support the user changing this file?"  Is there a better way we
  could support the functionality that people try to get by changing
  this file?

- Do we want to keep supporting the -specs= command-line switch?  If
  we could come up with another mechanism for the things that people
  usually use this for, we could mark it as `obsolete' for GCC 3.2.

- Do we like to use specs internally?  I'm going to spend some time
  this week designing a way to have the backends provide C code to
  pass flags to the assembler, which should help with some of the more
  horrible problems in this area; see, for instance, asm_cpu_spec in
  rs6000.h.  Should this sort of thing be encouraged, or kept for
  those situations where the alternative is too terrible to contemplate?

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-10 10:31 ` whither specs? Geoff Keating
@ 2002-06-10 10:55   ` David Edelsohn
  2002-06-10 10:56     ` Geoff Keating
  2002-06-10 12:17   ` Tom Tromey
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 117+ messages in thread
From: David Edelsohn @ 2002-06-10 10:55 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

>>>>> Geoff Keating writes:

Geoff> - Do we still want to have the 'specs' file in $(gcc_tooldir)?  It's
Geoff> now redundant unless the user edits it, since the driver generated
Geoff> it in the first place.  I guess the real question is, "do we want
Geoff> to support the user changing this file?"  Is there a better way we
Geoff> could support the functionality that people try to get by changing
Geoff> this file?

	Installing the specs file and having the compiler use it if it
exists (as opposed to the internal specs), definitely helps users who need
to tweak binary installations.  It gives users the ability to modify at
least that part of the compiler after installation without rebuilding the
compiler.

	I am not saying that specs is great, but being able to fix
something in the time it takes to edit a text file is a lot easier than
rebuilding and reinstalling GCC.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-10 10:55   ` David Edelsohn
@ 2002-06-10 10:56     ` Geoff Keating
  2002-06-10 11:01       ` H . J . Lu
                         ` (3 more replies)
  0 siblings, 4 replies; 117+ messages in thread
From: Geoff Keating @ 2002-06-10 10:56 UTC (permalink / raw)
  To: dje; +Cc: gcc

> cc: gcc@gcc.gnu.org
> Date: Mon, 10 Jun 2002 13:33:30 -0400
> From: David Edelsohn <dje@watson.ibm.com>
> 
> >>>>> Geoff Keating writes:
> 
> Geoff> - Do we still want to have the 'specs' file in $(gcc_tooldir)?  It's
> Geoff> now redundant unless the user edits it, since the driver generated
> Geoff> it in the first place.  I guess the real question is, "do we want
> Geoff> to support the user changing this file?"  Is there a better way we
> Geoff> could support the functionality that people try to get by changing
> Geoff> this file?
> 
> 	Installing the specs file and having the compiler use it if it
> exists (as opposed to the internal specs), definitely helps users who need
> to tweak binary installations.  It gives users the ability to modify at
> least that part of the compiler after installation without rebuilding the
> compiler.
> 
> 	I am not saying that specs is great, but being able to fix
> something in the time it takes to edit a text file is a lot easier than
> rebuilding and reinstalling GCC.

Can you give examples of the sort of things that users might like to
change?  I'm thinking that a lot of them are along the lines of
"please pass this flag to the linker".

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-10 10:56     ` Geoff Keating
@ 2002-06-10 11:01       ` H . J . Lu
  2002-06-10 11:18       ` David Edelsohn
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 117+ messages in thread
From: H . J . Lu @ 2002-06-10 11:01 UTC (permalink / raw)
  To: Geoff Keating; +Cc: dje, gcc

On Mon, Jun 10, 2002 at 10:55:03AM -0700, Geoff Keating wrote:
> > cc: gcc@gcc.gnu.org
> > Date: Mon, 10 Jun 2002 13:33:30 -0400
> > From: David Edelsohn <dje@watson.ibm.com>
> > 
> > >>>>> Geoff Keating writes:
> > 
> > Geoff> - Do we still want to have the 'specs' file in $(gcc_tooldir)?  It's
> > Geoff> now redundant unless the user edits it, since the driver generated
> > Geoff> it in the first place.  I guess the real question is, "do we want
> > Geoff> to support the user changing this file?"  Is there a better way we
> > Geoff> could support the functionality that people try to get by changing
> > Geoff> this file?
> > 
> > 	Installing the specs file and having the compiler use it if it
> > exists (as opposed to the internal specs), definitely helps users who need
> > to tweak binary installations.  It gives users the ability to modify at
> > least that part of the compiler after installation without rebuilding the
> > compiler.
> > 
> > 	I am not saying that specs is great, but being able to fix
> > something in the time it takes to edit a text file is a lot easier than
> > rebuilding and reinstalling GCC.
> 
> Can you give examples of the sort of things that users might like to
> change?  I'm thinking that a lot of them are along the lines of
> "please pass this flag to the linker".

Also "always pass this flag to compiler".


H.J.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-10 10:56     ` Geoff Keating
  2002-06-10 11:01       ` H . J . Lu
@ 2002-06-10 11:18       ` David Edelsohn
  2002-06-10 11:47       ` Mumit Khan
  2002-06-10 15:01       ` Matthias Benkmann
  3 siblings, 0 replies; 117+ messages in thread
From: David Edelsohn @ 2002-06-10 11:18 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

>>>>> Geoff Keating writes:

Geoff> Can you give examples of the sort of things that users might like to
Geoff> change?  I'm thinking that a lot of them are along the lines of
Geoff> "please pass this flag to the linker".

	We just had the case with the GCC 3.1 release that the specs did
not match the version of IBM's Parallel Environment that a user had
installed.  With a text specs file, they were able to edit LINK_SPEC and
STARTFILE_SPEC to match their version of the software.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-10 10:56     ` Geoff Keating
  2002-06-10 11:01       ` H . J . Lu
  2002-06-10 11:18       ` David Edelsohn
@ 2002-06-10 11:47       ` Mumit Khan
  2002-06-11 16:48         ` Jason R Thorpe
  2002-06-10 15:01       ` Matthias Benkmann
  3 siblings, 1 reply; 117+ messages in thread
From: Mumit Khan @ 2002-06-10 11:47 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

On Mon, 10 Jun 2002, Geoff Keating wrote:

> Can you give examples of the sort of things that users might like to
> change?  I'm thinking that a lot of them are along the lines of
> "please pass this flag to the linker".

Typical one for me is to add the rpath/R directive to pass on to the
linker. We have multiple versions of GCC installed on all our platforms,
and each lives in its own little area; this implies that the shared
libraries that are installed with GCC live in directories that are not
normally searched by the dynamic linker[1], and it's simply not practical
to expect users to know where these things hide (and it gets messier
whenever multilibs are involved).

[1] While there are tools such as ldconfig/crle/etc on various platforms
to add these directories to the search list, it's definitely not something
we want to deal with.

Regards,
Mumit




^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-10 10:31 ` whither specs? Geoff Keating
  2002-06-10 10:55   ` David Edelsohn
@ 2002-06-10 12:17   ` Tom Tromey
  2002-06-10 15:37   ` Zack Weinberg
  2002-06-10 19:25   ` Mark Mitchell
  3 siblings, 0 replies; 117+ messages in thread
From: Tom Tromey @ 2002-06-10 12:17 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

>>>>> "Geoff" == Geoff Keating <geoffk@geoffk.org> writes:

Geoff> - Do we want to keep supporting the -specs= command-line switch?  If
Geoff>   we could come up with another mechanism for the things that people
Geoff>   usually use this for, we could mark it as `obsolete' for GCC 3.2.

Geoff> - Do we like to use specs internally?

libgcj uses specs and the -specs= switch.  This isn't an actual
requirement, in that it could be switched to any mechanism that keeps
things working.  I think it is definitely easier for us to have a text
file for this sort of thing than to build it into gcj itself.  Also it
is a bit nicer to have all the configury code in libgcj, near where it
is used, than to have it in gcc.

Tom

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-10 10:56     ` Geoff Keating
                         ` (2 preceding siblings ...)
  2002-06-10 11:47       ` Mumit Khan
@ 2002-06-10 15:01       ` Matthias Benkmann
  3 siblings, 0 replies; 117+ messages in thread
From: Matthias Benkmann @ 2002-06-10 15:01 UTC (permalink / raw)
  To: gcc

On Mon, 10 Jun 2002 10:55:03 -0700 Geoff Keating <geoffk@geoffk.org>
wrote:

> 
> Can you give examples of the sort of things that users might like to
> change?  I'm thinking that a lot of them are along the lines of
> "please pass this flag to the linker".

I built a libc6 Linux From Scratch system from a libc5 host system and
needed the spec file to change the dynamic linker.

MSB

-- 
I tried sniffing Coke once, but the ice cubes got stuck in my nose.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-10 10:31 ` whither specs? Geoff Keating
  2002-06-10 10:55   ` David Edelsohn
  2002-06-10 12:17   ` Tom Tromey
@ 2002-06-10 15:37   ` Zack Weinberg
  2002-06-10 19:25   ` Mark Mitchell
  3 siblings, 0 replies; 117+ messages in thread
From: Zack Weinberg @ 2002-06-10 15:37 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

On Mon, Jun 10, 2002 at 10:29:14AM -0700, Geoff Keating wrote:
> 
> My recent patch to the driver allowed it to include target-dependent
> code.  Now that we no longer need to encapsulate all the
> target-dependent argument passing in a separate file, I think this is
> a good time to sit back and ask ourselves:
> 
> - Do we still want to have the 'specs' file in $(gcc_tooldir)?  It's
>   now redundant unless the user edits it, since the driver generated
>   it in the first place.  I guess the real question is, "do we want
>   to support the user changing this file?"  Is there a better way we
>   could support the functionality that people try to get by changing
>   this file?
> 
> - Do we want to keep supporting the -specs= command-line switch?  If
>   we could come up with another mechanism for the things that people
>   usually use this for, we could mark it as `obsolete' for GCC 3.2.
> 
> - Do we like to use specs internally?  I'm going to spend some time
>   this week designing a way to have the backends provide C code to
>   pass flags to the assembler, which should help with some of the more
>   horrible problems in this area; see, for instance, asm_cpu_spec in
>   rs6000.h.  Should this sort of thing be encouraged, or kept for
>   those situations where the alternative is too terrible to contemplate?

Neil had much of a driver rewrite done last year.  It got shot down on
the grounds that we still needed the ability to edit the specs after
installation.

Personally, I felt that the problems which people currently solve by
doing that, could and should be solved differently -- and that the
appropriate move would be to get rid of specs entirely, then fix the
problems that come up only when they do come up, case-by-case.

Witness how much of an improvement the TARGET_*_BUILTINS stuff is over
doing -D switches from the specs.  I'm confident that similar
improvements are out there if we do everything like that.

zw

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-10 10:31 ` whither specs? Geoff Keating
                     ` (2 preceding siblings ...)
  2002-06-10 15:37   ` Zack Weinberg
@ 2002-06-10 19:25   ` Mark Mitchell
  3 siblings, 0 replies; 117+ messages in thread
From: Mark Mitchell @ 2002-06-10 19:25 UTC (permalink / raw)
  To: Geoff Keating, gcc

> - Do we still want to have the 'specs' file in $(gcc_tooldir)?

No.  This is a fragile mechanism and one that can be dealt with in
other ways (wrapper scripts, rebuilding from modified source).

> - Do we want to keep supporting the -specs= command-line switch?

No; same reasons.

> - Do we like to use specs internally?

No.

>   rs6000.h.  Should this sort of thing be encouraged, or kept for
>   those situations where the alternative is too terrible to contemplate?

I would prefer to get rid of specs and use either a real extension
language (e.g., Guile) or C to do the job.

-- 
Mark Mitchell                mark@codesourcery.com
CodeSourcery, LLC            http://www.codesourcery.com

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-10 11:47       ` Mumit Khan
@ 2002-06-11 16:48         ` Jason R Thorpe
  2002-06-12  1:20           ` Theodore Papadopoulo
  2002-06-12 11:01           ` Richard Henderson
  0 siblings, 2 replies; 117+ messages in thread
From: Jason R Thorpe @ 2002-06-11 16:48 UTC (permalink / raw)
  To: Mumit Khan; +Cc: Geoff Keating, gcc

On Mon, Jun 10, 2002 at 01:44:45PM -0500, Mumit Khan wrote:

 > Typical one for me is to add the rpath/R directive to pass on to the
 > linker. We have multiple versions of GCC installed on all our platforms,

The driver should be doing this itself, IMO.  I have a gross patch in my
local tree right now that passes -R in the places where the driver passes
-L .. I've been meaning to file a PR on it so that I don't forget about it.

 > and each lives in its own little area; this implies that the shared
 > libraries that are installed with GCC live in directories that are not
 > normally searched by the dynamic linker[1], and it's simply not practical
 > to expect users to know where these things hide (and it gets messier
 > whenever multilibs are involved).

...or when the system compiler is one version of GCC (with corresponding
libstdc++.so in /usr/lib) and you're testing another version of GCC which
happens to use a libstdc++.so with the same shlib major number.

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>
	...at USENIX...

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-11 16:48         ` Jason R Thorpe
@ 2002-06-12  1:20           ` Theodore Papadopoulo
  2002-06-12 11:01           ` Richard Henderson
  1 sibling, 0 replies; 117+ messages in thread
From: Theodore Papadopoulo @ 2002-06-12  1:20 UTC (permalink / raw)
  To: Jason R Thorpe, Mumit Khan, Geoff Keating, gcc



thorpej@wasabisystems.com said:
>>  > Typical one for me is to add the rpath/R directive to pass on to the
>>  > linker. We have multiple versions of GCC installed on all our platforms,

> The driver should be doing this itself, IMO.  I have a gross patch in
> my local tree right now that passes -R in the places where the driver
> passes -L .. I've been meaning to file a PR on it so that I don't
> forget about it. 

I totally agree. I also have a patch for the same thing. I even 
posted it for getting comments and got none (if I remember well, 
there is no consensus on the topic, which make such a patch highly 
speculative for inclusion, but things may change)... I would be 
interested in comparing the two patches, though.

	Theo.

--------------------------------------------------------------------
Theodore Papadopoulo
Email: Theodore.Papadopoulo@sophia.inria.fr Tel: (33) 04 92 38 76 01
 --------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-11 16:48         ` Jason R Thorpe
  2002-06-12  1:20           ` Theodore Papadopoulo
@ 2002-06-12 11:01           ` Richard Henderson
  2002-06-12 11:36             ` Theodore Papadopoulo
  1 sibling, 1 reply; 117+ messages in thread
From: Richard Henderson @ 2002-06-12 11:01 UTC (permalink / raw)
  To: Jason R Thorpe, Mumit Khan, Geoff Keating, gcc

On Tue, Jun 11, 2002 at 03:24:02PM -0700, Jason R Thorpe wrote:
> The driver should be doing this itself, IMO.

There are cons to DT_RPATH that in my opinion outweigh any
benefit.  This has been dicussed before; search the archives.


r~

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-12 11:01           ` Richard Henderson
@ 2002-06-12 11:36             ` Theodore Papadopoulo
  2002-06-12 15:48               ` Richard Henderson
  2002-06-12 16:54               ` H . J . Lu
  0 siblings, 2 replies; 117+ messages in thread
From: Theodore Papadopoulo @ 2002-06-12 11:36 UTC (permalink / raw)
  To: Richard Henderson, Jason R Thorpe, Mumit Khan, Geoff Keating, gcc


rth@redhat.com said:
>> The driver should be doing this itself, IMO.
> There are cons to DT_RPATH that in my opinion outweigh any benefit.
> This has been dicussed before; search the archives. 

Right, I remember this discussion. But yet it seems to really be 
useful for some people all the more that since all the gcc librairies 
are dynamic by default, if you want to test multiple compilers, you 
have to specify each time not only a path but also a LD_LIBRARY_PATH 
or LD_RUN_PATH.

Why can't we provide this feature optionnally with a --enable-xxx parameter a 
configure time (by default off). This would allow:

	- for the people that would like to have the feature to get 
          it, and it is an often demanded feature. See how many people
          achieve it by hacking the spec file.

        - for some (large?) scale testing of the pros & cons of having
          this feature.

I do not question your reasons, but as far as my remembering goes, the cons 
were about changing the paths for the libraries after a while. Sometimes people
know they will not do that. At the very least, it looks like some of us are 
doing so without finding the problems that had been raised at the 
time... I do not know how representative we are, if this works better 
on some systems than on some other but I suspect a 
few people would give
it a try. So maybe it could be tried as an experimental feature for a while.
It is not such a difficult or invasive patch to make (actually to 
implement this I first had to make a small cleanup of the lib_spec 
functions, which I still intend to submit by itself...).

	Theo.


--------------------------------------------------------------------
Theodore Papadopoulo
Email: Theodore.Papadopoulo@sophia.inria.fr Tel: (33) 04 92 38 76 01
 --------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-12 11:36             ` Theodore Papadopoulo
@ 2002-06-12 15:48               ` Richard Henderson
  2002-06-12 16:54               ` H . J . Lu
  1 sibling, 0 replies; 117+ messages in thread
From: Richard Henderson @ 2002-06-12 15:48 UTC (permalink / raw)
  To: Theodore Papadopoulo; +Cc: Jason R Thorpe, Mumit Khan, Geoff Keating, gcc

On Wed, Jun 12, 2002 at 08:26:05PM +0200, Theodore Papadopoulo wrote:
> Why can't we provide this feature optionnally with a --enable-xxx parameter a 
> configure time (by default off). This would allow:

I guess I don't have an objection to this.  
Feel free to submit a patch.


r~

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-12 11:36             ` Theodore Papadopoulo
  2002-06-12 15:48               ` Richard Henderson
@ 2002-06-12 16:54               ` H . J . Lu
  2002-11-24  6:52                 ` Jason R Thorpe
  1 sibling, 1 reply; 117+ messages in thread
From: H . J . Lu @ 2002-06-12 16:54 UTC (permalink / raw)
  To: Theodore Papadopoulo
  Cc: Richard Henderson, Jason R Thorpe, Mumit Khan, Geoff Keating, gcc

On Wed, Jun 12, 2002 at 08:26:05PM +0200, Theodore Papadopoulo wrote:
> 
> rth@redhat.com said:
> >> The driver should be doing this itself, IMO.
> > There are cons to DT_RPATH that in my opinion outweigh any benefit.
> > This has been dicussed before; search the archives. 
> 
> Right, I remember this discussion. But yet it seems to really be 
> useful for some people all the more that since all the gcc librairies 
> are dynamic by default, if you want to test multiple compilers, you 
> have to specify each time not only a path but also a LD_LIBRARY_PATH 
> or LD_RUN_PATH.

DT_RPATH is tricky with cross compiling. I don't want it in gcc.



H.J.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* needless deep recursion in gt-c-decl.h
@ 2002-07-24  4:53 Per Bothner
  2002-07-24  5:17 ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Per Bothner @ 2002-07-24  4:53 UTC (permalink / raw)
  To: gcc; +Cc: Geoffrey Keating

The gt_ggc_mx_lang_tree_node routine in the generated
gt-c-decl.h file causes a stack overflow when I tried
it under Darwin.  Perhaps I need to increase the stack
limit, but the deep recursion is a performance problem
that needlessly increases the working set.

Specifically, we should iterate, not recurse, on the
field that is most likely to be a long list, the TREE_CHAIN.
I don't understand how gengtype works, but the generated
code should be something like this:

void
gt_ggc_mx_lang_tree_node (x_p)
       void *x_p;
{
   union lang_tree_node * const x = (union lang_tree_node *)x_p;
   for (;;) {
     if (! ggc_test_and_set_mark (x))
       return;
   {
     ...
     gt_ggc_m_tree_node ((*x).generic.common.type);
     x = (*x).generic.common.chain;
     if (x != NULL_TREE)
       continue;
     return;
   }
}

All the comparisons against tag1 and tag2 may also be a performance
problem.  I don't know if the compiler is smart enough to convert
   if (tag2 == (TS_REAL_CST) ...;
   if (tag2 == (TS_VECTOR)) ...;
to:
   if (tag2 == (TS_REAL_CST) ...;
   else if (tag2 == (TS_VECTOR)) ...;
Assuming it is, it would still be an advantage if we could
order the type tests by frequency, unless the compiler is
smart enough to compile all the tests into a switch.  Is it?
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-24  4:53 needless deep recursion in gt-c-decl.h Per Bothner
@ 2002-07-24  5:17 ` Geoff Keating
  2002-07-24  7:09   ` Per Bothner
  0 siblings, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-07-24  5:17 UTC (permalink / raw)
  To: per; +Cc: gcc

> Date: Tue, 23 Jul 2002 20:11:54 -0700
> From: Per Bothner <per@bothner.com>

> The gt_ggc_mx_lang_tree_node routine in the generated
> gt-c-decl.h file causes a stack overflow when I tried
> it under Darwin.  Perhaps I need to increase the stack
> limit, but the deep recursion is a performance problem
> that needlessly increases the working set.
> 
> Specifically, we should iterate, not recurse, on the
> field that is most likely to be a long list, the TREE_CHAIN.
> I don't understand how gengtype works, but the generated
> code should be something like this:
> 
> void
> gt_ggc_mx_lang_tree_node (x_p)
>        void *x_p;
> {
>    union lang_tree_node * const x = (union lang_tree_node *)x_p;
>    for (;;) {
>      if (! ggc_test_and_set_mark (x))
>        return;
>    {
>      ...
>      gt_ggc_m_tree_node ((*x).generic.common.type);
>      x = (*x).generic.common.chain;
>      if (x != NULL_TREE)
>        continue;
>      return;
>    }
> }

It's not as simple as that, because there'll often be a reference to
the TREE_CHAIN from inside the node.  However, yes, you can do
something like this.  In fact, I tried it in the hope that it would
make GC faster; it didn't (but it didn't make it much slower either).

> All the comparisons against tag1 and tag2 may also be a performance
> problem.  I don't know if the compiler is smart enough to convert
>    if (tag2 == (TS_REAL_CST) ...;
>    if (tag2 == (TS_VECTOR)) ...;
> to:
>    if (tag2 == (TS_REAL_CST) ...;
>    else if (tag2 == (TS_VECTOR)) ...;
> Assuming it is, it would still be an advantage if we could
> order the type tests by frequency, unless the compiler is
> smart enough to compile all the tests into a switch.  Is it?

Doesn't matter, the branch now generates a switch in the same situation.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-24  5:17 ` Geoff Keating
@ 2002-07-24  7:09   ` Per Bothner
  2002-07-24  7:17     ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Per Bothner @ 2002-07-24  7:09 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

Geoff Keating wrote:
>>From: Per Bothner <per@bothner.com>
>>Specifically, we should iterate, not recurse, on the
>>field that is most likely to be a long list, the TREE_CHAIN.

> It's not as simple as that, because there'll often be a reference to
> the TREE_CHAIN from inside the node.

I don't know what this sentence means.  If you mean that some node
referenced by one of the other (non-chain) fields (possibly indirectly)
is the same node as the chain, I don't see how that's complicates it.

> However, yes, you can do
> something like this.  In fact, I tried it in the hope that it would
> make GC faster; it didn't (but it didn't make it much slower either).

Did it make it slower at all?  It would be strange if iteration is
slower than recursion (unless that just happens to hit the cache better,
which would surprise me).  If interation is the same speed or faster,
we should use it, to reduce running out of stack space.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-24  7:09   ` Per Bothner
@ 2002-07-24  7:17     ` Geoff Keating
  2002-07-24  7:52       ` Per Bothner
  0 siblings, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-07-24  7:17 UTC (permalink / raw)
  To: per; +Cc: gcc

> Date: Tue, 23 Jul 2002 22:02:46 -0700
> From: Per Bothner <per@bothner.com>

> Geoff Keating wrote:
> >>From: Per Bothner <per@bothner.com>
> >>Specifically, we should iterate, not recurse, on the
> >>field that is most likely to be a long list, the TREE_CHAIN.
> 
> > It's not as simple as that, because there'll often be a reference to
> > the TREE_CHAIN from inside the node.
> 
> I don't know what this sentence means.  If you mean that some node
> referenced by one of the other (non-chain) fields (possibly indirectly)
> is the same node as the chain, I don't see how that's complicates it.

It means you need a more complicated design than the one you
presented, because in

while (! x is marked)
{
  mark x;
  recursively mark objects pointed to by x, but don't look at TREE_CHAIN(x);
  x = TREE_CHAIN (x);
}

the loop rarely iterates more than once, since very often TREE_CHAIN(x)
will be marked indirectly in the body of the loop.  You need something
like

x_limit = x;
while (! x_limit is marked)
{
  mark x_limit;
  x_limit = TREE_CHAIN (x_limit);
}
while (x != x_limit)
{
  mark objects pointed to by x;
  x = TREE_CHAIN (x);
}

> > However, yes, you can do
> > something like this.  In fact, I tried it in the hope that it would
> > make GC faster; it didn't (but it didn't make it much slower either).
> 
> Did it make it slower at all?  It would be strange if iteration is
> slower than recursion (unless that just happens to hit the cache better,
> which would surprise me).  If interation is the same speed or faster,
> we should use it, to reduce running out of stack space.

They were about the same speed, but I don't remember exactly.

One reason this might be more expensive is because of the extra
variable.  Most of the time cost of the high-level marking is in
function calls, and extra variables make the prologue/epilogue longer.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-24  7:17     ` Geoff Keating
@ 2002-07-24  7:52       ` Per Bothner
  2002-07-25  2:32         ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Per Bothner @ 2002-07-24  7:52 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

Geoff Keating wrote:
> It means you need a more complicated design than the one you
> presented, because in
> 
> while (! x is marked)
> {
>   mark x;
>   recursively mark objects pointed to by x, but don't look at TREE_CHAIN(x);
>   x = TREE_CHAIN (x);
> }
> 
> the loop rarely iterates more than once, since very often TREE_CHAIN(x)
> will be marked indirectly in the body of the loop.  You need something
> like [two loops]

Are you saying that the single loop is incorrect, or that it is
inefficient?  I don't understand why either would be the case.
--
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-24  7:52       ` Per Bothner
@ 2002-07-25  2:32         ` Geoff Keating
  2002-07-25 11:33           ` Per Bothner
  0 siblings, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-07-25  2:32 UTC (permalink / raw)
  To: per; +Cc: gcc

> Date: Tue, 23 Jul 2002 23:22:56 -0700
> From: Per Bothner <per@bothner.com>
> User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1b) Gecko/20020722
> X-Accept-Language: en-us, en
> CC: gcc@gcc.gnu.org
> 
> Geoff Keating wrote:
> > It means you need a more complicated design than the one you
> > presented, because in
> > 
> > while (! x is marked)
> > {
> >   mark x;
> >   recursively mark objects pointed to by x, but don't look at TREE_CHAIN(x);
> >   x = TREE_CHAIN (x);
> > }
> > 
> > the loop rarely iterates more than once, since very often TREE_CHAIN(x)
> > will be marked indirectly in the body of the loop.  You need something
> > like [two loops]
> 
> Are you saying that the single loop is incorrect, or that it is
> inefficient?  I don't understand why either would be the case.

A single loop will be correct, in that everything will be marked, but
it doesn't achieve your aim of reducing the stack recursion depth
significantly.

For instance, consider

int x (void) { return y(); }
int y (void) { return 1; }

'y' will be the tree_chain of the decl for 'x'.  But 'y' is also
referenced inside 'x', and so by the time we look at the tree_chain of
'x', 'y' will already have been marked.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-25  2:32         ` Geoff Keating
@ 2002-07-25 11:33           ` Per Bothner
  2002-07-25 14:20             ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Per Bothner @ 2002-07-25 11:33 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

Geoff Keating wrote:
> A single loop will be correct, in that everything will be marked, but
> it doesn't achieve your aim of reducing the stack recursion depth
> significantly.

Is this speaking from experience?

> For instance, consider
> 
> int x (void) { return y(); }
> int y (void) { return 1; }
> 
> 'y' will be the tree_chain of the decl for 'x'.  But 'y' is also
> referenced inside 'x', and so by the time we look at the tree_chain of
> 'x', 'y' will already have been marked.

That case would require a backwards reference, which I'm guessing are
relatively rare, especially if extern declarations are in the list.
(I'm not sure how forward declarations affect the list order.)

I'd be curious to see if a single loop fixes the problem on Darwin,
but I haven't figured out the logic that gengtype uses to
generate gt-c-decl.h.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-25 11:33           ` Per Bothner
@ 2002-07-25 14:20             ` Geoff Keating
  2002-07-26  8:37               ` Per Bothner
  0 siblings, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-07-25 14:20 UTC (permalink / raw)
  To: per; +Cc: gcc

> Date: Thu, 25 Jul 2002 08:23:45 -0700
> From: Per Bothner <per@bothner.com>

> Geoff Keating wrote:
> > A single loop will be correct, in that everything will be marked, but
> > it doesn't achieve your aim of reducing the stack recursion depth
> > significantly.
> 
> Is this speaking from experience?

I don't remember.

> > For instance, consider
> > 
> > int x (void) { return y(); }
> > int y (void) { return 1; }
> > 
> > 'y' will be the tree_chain of the decl for 'x'.  But 'y' is also
> > referenced inside 'x', and so by the time we look at the tree_chain of
> > 'x', 'y' will already have been marked.
> 
> That case would require a backwards reference, which I'm guessing are
> relatively rare, especially if extern declarations are in the list.
> (I'm not sure how forward declarations affect the list order.)

I think the list is in the order the declarations are seen in the
file, so anything is possible.

> I'd be curious to see if a single loop fixes the problem on Darwin,
> but I haven't figured out the logic that gengtype uses to
> generate gt-c-decl.h.

Just insert the loop by hand after the file is generated, then re-run
make.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-25 14:20             ` Geoff Keating
@ 2002-07-26  8:37               ` Per Bothner
  2002-07-30 21:41                 ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Per Bothner @ 2002-07-26  8:37 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

[-- Attachment #1: Type: text/plain, Size: 722 bytes --]

Geoff Keating wrote:
> Just insert the loop by hand after the file is generated, then re-run
> make.

Using a single loop does appear to solve the problem.
I kludged the Makefile.in to apply the appended patch,
and the compiler made it all the way to building libjava.
There it failed running cc1plus on a .cc file in libjava.
It was the same basic failure, recursing on TREE_CHAIN,
but this time in gtype-cp.h, which I had not patched.
I haven't verified whether the equivalent fix would
work in gtype-cp.h, but I'm guessing so.

So I think we should definitely fix the code that generates
gt-c-decl.h and gtype-cp.h.  I can do it if you tell me to
do it.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

[-- Attachment #2: gt-c-decl.diff --]
[-- Type: text/plain, Size: 4524 bytes --]

--- gt-c-decl.h~	Thu Jul 25 12:42:10 2002
+++ gt-c-decl.h	Thu Jul 25 12:45:22 2002
@@ -42,6 +42,8 @@
       void *x_p;
 {
   union lang_tree_node * const x = (union lang_tree_node *)x_p;
+  for(;;)
+    {
   if (! ggc_test_and_set_mark (x))
     return;
   {
@@ -50,44 +52,36 @@
       {
         unsigned int tag2 = (tree_node_structure (&((*x).generic)));
         if (tag2 == (TS_COMMON)) {
-          gt_ggc_m_tree_node ((*x).generic.common.chain);
           gt_ggc_m_tree_node ((*x).generic.common.type);
         }
         if (tag2 == (TS_INT_CST)) {
-          gt_ggc_m_tree_node ((*x).generic.int_cst.common.chain);
           gt_ggc_m_tree_node ((*x).generic.int_cst.common.type);
           gt_ggc_m_rtx_def ((*x).generic.int_cst.rtl);
         }
         if (tag2 == (TS_REAL_CST)) {
-          gt_ggc_m_tree_node ((*x).generic.real_cst.common.chain);
           gt_ggc_m_tree_node ((*x).generic.real_cst.common.type);
           gt_ggc_m_rtx_def ((*x).generic.real_cst.rtl);
           gt_ggc_m_realvaluetype ((*x).generic.real_cst.real_cst_ptr);
         }
         if (tag2 == (TS_VECTOR)) {
-          gt_ggc_m_tree_node ((*x).generic.vector.common.chain);
           gt_ggc_m_tree_node ((*x).generic.vector.common.type);
           gt_ggc_m_rtx_def ((*x).generic.vector.rtl);
           gt_ggc_m_tree_node ((*x).generic.vector.elements);
         }
         if (tag2 == (TS_STRING)) {
-          gt_ggc_m_tree_node ((*x).generic.string.common.chain);
           gt_ggc_m_tree_node ((*x).generic.string.common.type);
           gt_ggc_m_rtx_def ((*x).generic.string.rtl);
         }
         if (tag2 == (TS_COMPLEX)) {
-          gt_ggc_m_tree_node ((*x).generic.complex.common.chain);
           gt_ggc_m_tree_node ((*x).generic.complex.common.type);
           gt_ggc_m_rtx_def ((*x).generic.complex.rtl);
           gt_ggc_m_tree_node ((*x).generic.complex.real);
           gt_ggc_m_tree_node ((*x).generic.complex.imag);
         }
         if (tag2 == (TS_IDENTIFIER)) {
-          gt_ggc_m_tree_node ((*x).generic.identifier.common.chain);
           gt_ggc_m_tree_node ((*x).generic.identifier.common.type);
         }
         if (tag2 == (TS_DECL)) {
-          gt_ggc_m_tree_node ((*x).generic.decl.common.chain);
           gt_ggc_m_tree_node ((*x).generic.decl.common.type);
           gt_ggc_m_tree_node ((*x).generic.decl.size);
           gt_ggc_m_tree_node ((*x).generic.decl.size_unit);
@@ -120,7 +114,6 @@
           gt_ggc_m_lang_decl ((*x).generic.decl.lang_specific);
         }
         if (tag2 == (TS_TYPE)) {
-          gt_ggc_m_tree_node ((*x).generic.type.common.chain);
           gt_ggc_m_tree_node ((*x).generic.type.common.type);
           gt_ggc_m_tree_node ((*x).generic.type.values);
           gt_ggc_m_tree_node ((*x).generic.type.size);
@@ -143,13 +136,11 @@
           gt_ggc_m_lang_type ((*x).generic.type.lang_specific);
         }
         if (tag2 == (TS_LIST)) {
-          gt_ggc_m_tree_node ((*x).generic.list.common.chain);
           gt_ggc_m_tree_node ((*x).generic.list.common.type);
           gt_ggc_m_tree_node ((*x).generic.list.purpose);
           gt_ggc_m_tree_node ((*x).generic.list.value);
         }
         if (tag2 == (TS_VEC)) {
-          gt_ggc_m_tree_node ((*x).generic.vec.common.chain);
           gt_ggc_m_tree_node ((*x).generic.vec.common.type);
           {
             size_t i5_0;
@@ -160,7 +151,6 @@
           }
         }
         if (tag2 == (TS_EXP)) {
-          gt_ggc_m_tree_node ((*x).generic.exp.common.chain);
           gt_ggc_m_tree_node ((*x).generic.exp.common.type);
           {
             const size_t tree_exp_size = (TREE_CODE_LENGTH (TREE_CODE ((tree) &((*x).generic.exp))));
@@ -174,7 +164,6 @@
           }
         }
         if (tag2 == (TS_BLOCK)) {
-          gt_ggc_m_tree_node ((*x).generic.block.common.chain);
           gt_ggc_m_tree_node ((*x).generic.block.common.type);
           gt_ggc_m_tree_node ((*x).generic.block.vars);
           gt_ggc_m_tree_node ((*x).generic.block.subblocks);
@@ -186,7 +175,6 @@
       }
     }
     if (tag1 == (1)) {
-      gt_ggc_m_tree_node ((*x).identifier.common_id.common.chain);
       gt_ggc_m_tree_node ((*x).identifier.common_id.common.type);
       gt_ggc_m_tree_node ((*x).identifier.global_value);
       gt_ggc_m_tree_node ((*x).identifier.local_value);
@@ -196,6 +184,11 @@
       gt_ggc_m_tree_node ((*x).identifier.limbo_value);
     }
   }
+  x = (*x).identifier.common_id.common.chain;
+  if (x == NULL_TREE)
+    break;
+
+    }
 }
 
 void

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-26  8:37               ` Per Bothner
@ 2002-07-30 21:41                 ` Geoff Keating
  2002-07-31  1:40                   ` Per Bothner
  0 siblings, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-07-30 21:41 UTC (permalink / raw)
  To: per; +Cc: gcc

> Date: Thu, 25 Jul 2002 16:35:19 -0700
> From: Per Bothner <per@bothner.com>

I am still working on this...

One problem is that to make this change:

> @@ -50,44 +52,36 @@
>        {
>          unsigned int tag2 = (tree_node_structure (&((*x).generic)));
>          if (tag2 == (TS_COMMON)) {
> -          gt_ggc_m_tree_node ((*x).generic.common.chain);
>            gt_ggc_m_tree_node ((*x).generic.common.type);
>          }
>          if (tag2 == (TS_INT_CST)) {
> -          gt_ggc_m_tree_node ((*x).generic.int_cst.common.chain);
>            gt_ggc_m_tree_node ((*x).generic.int_cst.common.type);
>            gt_ggc_m_rtx_def ((*x).generic.int_cst.rtl);
>          }
...

would require teaching gengtype that common.chain is the same as
int_cst.common.chain and so on.  I now remember that I never tested
this particular algorithm, for this reason.  Annoyingly, it's only a
problem with trees, RTL and almost all the other structures don't have
this problem.

Do you know what the actual stack limits are on Darwin?  I now have an
instrumented GCC that reports the maximum recursion, but each stack
frame is small (I would expect 32 bytes on powerpc) so you can fit an
awful lot of them in a few Mb of stack.

Once my initial run completes I'll know what source files cause the
highest recursion.  Then I will apply your patch and a two-loop patch
and see what happens with speed & space.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-30 21:41                 ` Geoff Keating
@ 2002-07-31  1:40                   ` Per Bothner
  2002-08-01 12:32                     ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Per Bothner @ 2002-07-31  1:40 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

Geoff Keating wrote:
>>Date: Thu, 25 Jul 2002 16:35:19 -0700
> One problem is that to make this change:

>>@@ -50,44 +52,36 @@
>>       {
>>         unsigned int tag2 = (tree_node_structure (&((*x).generic)));
>>         if (tag2 == (TS_COMMON)) {
>>-          gt_ggc_m_tree_node ((*x).generic.common.chain);
>>           gt_ggc_m_tree_node ((*x).generic.common.type);
>>         }
>>         if (tag2 == (TS_INT_CST)) {
>>-          gt_ggc_m_tree_node ((*x).generic.int_cst.common.chain);
>>           gt_ggc_m_tree_node ((*x).generic.int_cst.common.type);
>>           gt_ggc_m_rtx_def ((*x).generic.int_cst.rtl);
>>         }
> ...
> 
> would require teaching gengtype that common.chain is the same as
> int_cst.common.chain and so on.

An alternative: Annotate the chain fields with GTY((chain)) or somesuch.
This tells gengtype that this is the field that it should iterate on,
rather than recurse.  Then just generate in each branch code that
sets x to whater fields has the chain annotation and does a continue.
If there is no field with the 'chain' annotation, return.

This is more general, since it doesn't special-vase 'chain' - though the
resulting code is needlessly less compact.

> Do you know what the actual stack limits are on Darwin?

512k is the default size.  You can increase the stack limit, but I
really think 512k should be more than enough, and I think it would be,
if we made this fix.

> I now have an
> instrumented GCC that reports the maximum recursion, but each stack
> frame is small (I would expect 32 bytes on powerpc) so you can fit an
> awful lot of them in a few MB of stack.

If you use a few Mb of stack space, you're increasing the working
set by a few MB.  We should avoid that.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-07-31  1:40                   ` Per Bothner
@ 2002-08-01 12:32                     ` Geoff Keating
  2002-08-01 14:00                       ` Mike Stump
                                         ` (2 more replies)
  0 siblings, 3 replies; 117+ messages in thread
From: Geoff Keating @ 2002-08-01 12:32 UTC (permalink / raw)
  To: per; +Cc: gcc


I now have some numbers!

I looked at expr.o, which needs (on the branch) a recursion depth of
24024, the third-highest (and not typical, most files are < 10000);
that corresponds to perhaps 1M of stack.

I tried both algorithms on the tree marker routine for C.  They both
had about the same effect, reducing the recursion depth to about
16000, with no significant change in CPU time.  (This is with
gcac checking, so any effect should be greatly magnified.)

The 16000 turns out to be mostly RTL marking.  So, I applied the
one-loop method to the RTL marker routine, and it reduced the maximum
stack depth to about 13000.  Better, but not really acceptable, since
this still corresponds to perhaps 600k of stack, more than the limit
on Darwin.

I then generated a coredump when the stack depth was 13000, and looked
at it in gdb.  I was very pleased to find that the gdb folks have
improved the performance (since gdb 4.x) when looking at deep stacks,
it now handles this level of stack depth with no problems at all.

I'd manually applied the one-loop method to the NEXT_INSN field of
insns, and otherwise to fld[0] if it was a RTX.  The stack consisted,
by my unscientific survey, almost entirely of RTX marking, and most of
it (more than half but not 90%) was recursing on PREV_INSN fields.

So, I looked at a two-loop method, looking at fld[0] and fld[2].  This
worked better yet, reducing the maximum stack depth to under 10000,
but that's still ~400k of stack.

Nearly all of the remainder is still recursion on PREV_INSN fields.
This happens because the DECL_RTL of a LABEL_REF will refer to a
CODE_LABEL in the middle of the insn chain, and it happens that this
is the first place the insn chain is encountered when marking.  The
label involved is "plus_expr:" in expand_expr, line 7632.
expand_expr starts at line 6183, so there's a long way to go back.

My current best idea is to rearrange the order that
globals are marked so that the insn chain gets seen before the trees,
if this is possible, but I'm not at all sure that this will work
reliably.

Plan B is to implement a stack using malloc, only on stack-constrained
systems because I expect this will be slower on Linux (it was when I
tried it).

Any other ideas?

On the bright side, my "typical" file, flow.o, is reduced from ~6000
maximum depth to ~500 maximum depth.  If I didn't have to worry about
making stack usage small for every case, the two-loop method would be
quite good enough.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-01 12:32                     ` Geoff Keating
@ 2002-08-01 14:00                       ` Mike Stump
  2002-08-01 16:00                         ` Geoff Keating
  2002-08-01 19:17                       ` Richard Henderson
  2002-08-05 10:06                       ` Per Bothner
  2 siblings, 1 reply; 117+ messages in thread
From: Mike Stump @ 2002-08-01 14:00 UTC (permalink / raw)
  To: Geoff Keating; +Cc: per, gcc

On Thursday, August 1, 2002, at 12:32 PM, Geoff Keating wrote:

>
> I now have some numbers!
>
> I looked at expr.o, which needs (on the branch) a recursion depth of
> 24024, the third-highest (and not typical, most files are < 10000);
> that corresponds to perhaps 1M of stack.
>
> I tried both algorithms on the tree marker routine for C.  They both
> had about the same effect, reducing the recursion depth to about
> 16000, with no significant change in CPU time.  (This is with
> gcac checking, so any effect should be greatly magnified.)
>
> The 16000 turns out to be mostly RTL marking.  So, I applied the
> one-loop method to the RTL marker routine, and it reduced the maximum
> stack depth to about 13000.  Better, but not really acceptable, since
> this still corresponds to perhaps 600k of stack, more than the limit
> on Darwin.
>
> I then generated a coredump when the stack depth was 13000, and looked
> at it in gdb.  I was very pleased to find that the gdb folks have
> improved the performance (since gdb 4.x) when looking at deep stacks,
> it now handles this level of stack depth with no problems at all.
>
> I'd manually applied the one-loop method to the NEXT_INSN field of
> insns, and otherwise to fld[0] if it was a RTX.  The stack consisted,
> by my unscientific survey, almost entirely of RTX marking, and most of
> it (more than half but not 90%) was recursing on PREV_INSN fields.
>
> So, I looked at a two-loop method, looking at fld[0] and fld[2].  This
> worked better yet, reducing the maximum stack depth to under 10000,
> but that's still ~400k of stack.
>
> Nearly all of the remainder is still recursion on PREV_INSN fields.
> This happens because the DECL_RTL of a LABEL_REF will refer to a
> CODE_LABEL in the middle of the insn chain, and it happens that this
> is the first place the insn chain is encountered when marking.  The
> label involved is "plus_expr:" in expand_expr, line 7632.
> expand_expr starts at line 6183, so there's a long way to go back.
>
> My current best idea is to rearrange the order that
> globals are marked so that the insn chain gets seen before the trees,
> if this is possible, but I'm not at all sure that this will work
> reliably.
>
> Plan B is to implement a stack using malloc, only on stack-constrained
> systems because I expect this will be slower on Linux (it was when I
> tried it).
>
> Any other ideas?
>
> On the bright side, my "typical" file, flow.o, is reduced from ~6000
> maximum depth to ~500 maximum depth.  If I didn't have to worry about
> making stack usage small for every case, the two-loop method would be
> quite good enough.

I have two ideas.  The first one is a piece of code that resets the 
stack limit to the maximum, should the driver find that the default is 
still in effect:

   /* APPLE LOCAL setrlimit */
#ifdef RLIMIT_STACK
   /* Get rid of any avoidable limit on stack size.  */
   {
     struct rlimit rlim;

     /* Set the stack limit huge.  (Compiles normally work within
        a megabyte of stack, but the normal limit on OSX is 512K for
        some reason.) */
     getrlimit (RLIMIT_STACK, &rlim);
     rlim.rlim_cur = rlim.rlim_max;
     setrlimit (RLIMIT_STACK, &rlim);
   }
#endif /* RLIMIT_STACK defined */

and the other is the idea that we can count how long the insn chain is 
(PREV_INSN and NEXT_INSN as long as the nodes are unmarked), allocate 
the many slots, move them in, mark just those nodes all iteratively, 
unmark the first, then really mark it, loop.  This way, all the long 
insn chains will be in the vec, very storage efficient.

I don't know if this is a good idea with all the extra memory bandwidth 
it causes.  :-(  Maybe the saving in recursion will make up for the 
extra memory hits.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-01 14:00                       ` Mike Stump
@ 2002-08-01 16:00                         ` Geoff Keating
  0 siblings, 0 replies; 117+ messages in thread
From: Geoff Keating @ 2002-08-01 16:00 UTC (permalink / raw)
  To: Mike Stump; +Cc: per, gcc

Mike Stump <mrs@apple.com> writes:

> and the other is the idea that we can count how long the insn chain is
> (PREV_INSN and NEXT_INSN as long as the nodes are unmarked), allocate
> the many slots, move them in, mark just those nodes all iteratively,
> unmark the first, then really mark it, loop.  This way, all the long
> insn chains will be in the vec, very storage efficient.

Thank you!  That helped me think of a much better idea.  The new code
looks like:

  for (xlimit = (struct rtx_def *)x_p;
       ggc_test_and_set_mark (xlimit);
       xlimit = RTL_NEXT (xlimit))
    ;

  x = (struct rtx_def *)x_p;
  if (x != xlimit)
    { 
      struct rtx_def * xprev = x;
      for (;;)
        { 
          xprev = RTL_PREV (xprev);
          if (xprev == NULL)
            break;
          x = xprev;
          ggc_set_mark (x);
        }
    }

  for (; x != xlimit; x = RTL_NEXT (x))
     // mark fields of x

The neat part is the middle loop.  What it does is move to the head of
the doubly-linked list, marking as we go.  We know those elements
haven't been marked because if they'd been marked, the rest of the
list would also have been marked.  Then the final loop marks the whole
list.  No memory allocation is necessary.  The 'xlimit' variable is
still there so to handle EXPR_LISTs and the like.

Of course, it's not _quite_ that simple.  For instance, we have to
check that the thing we're marking is really part of the doubly-linked
list, it turns out this is not always the case.

This takes us down to a maximum nesting depth of 894, and this time
it's trees not RTL.  Much better!  Now, to implement it.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-01 12:32                     ` Geoff Keating
  2002-08-01 14:00                       ` Mike Stump
@ 2002-08-01 19:17                       ` Richard Henderson
  2002-08-05 10:06                       ` Per Bothner
  2 siblings, 0 replies; 117+ messages in thread
From: Richard Henderson @ 2002-08-01 19:17 UTC (permalink / raw)
  To: Geoff Keating; +Cc: per, gcc

On Thu, Aug 01, 2002 at 12:32:18PM -0700, Geoff Keating wrote:
> I'd manually applied the one-loop method to the NEXT_INSN field of
> insns, and otherwise to fld[0] if it was a RTX.

This is absolutely required.

> Plan B is to implement a stack using malloc, only on stack-constrained
> systems because I expect this will be slower on Linux (it was when I
> tried it).

Forget slower, think Correctness.  See PR c/5154.


r~

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-01 12:32                     ` Geoff Keating
  2002-08-01 14:00                       ` Mike Stump
  2002-08-01 19:17                       ` Richard Henderson
@ 2002-08-05 10:06                       ` Per Bothner
  2002-08-05 10:21                         ` Geoff Keating
  2 siblings, 1 reply; 117+ messages in thread
From: Per Bothner @ 2002-08-05 10:06 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

Geoff Keating wrote:
> Nearly all of the remainder is still recursion on PREV_INSN fields.
> This happens because the DECL_RTL of a LABEL_REF will refer to a
> CODE_LABEL in the middle of the insn chain, and it happens that this
> is the first place the insn chain is encountered when marking.  The
> label involved is "plus_expr:" in expand_expr, line 7632.
> expand_expr starts at line 6183, so there's a long way to go back.

Is there any reason to mark the DECL_RTL of a LABEL_REF?  Is there any
case (baring a serious internal compiler error) where the DECL_RTL
could possibly be the only reference to an insn?  If not, why waste
time (and stack space) marking it?
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 10:06                       ` Per Bothner
@ 2002-08-05 10:21                         ` Geoff Keating
  2002-08-05 10:44                           ` Per Bothner
  0 siblings, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-08-05 10:21 UTC (permalink / raw)
  To: per; +Cc: gcc

> Date: Mon, 05 Aug 2002 10:13:18 -0700
> From: Per Bothner <per@bothner.com>

> Geoff Keating wrote:
> > Nearly all of the remainder is still recursion on PREV_INSN fields.
> > This happens because the DECL_RTL of a LABEL_REF will refer to a
> > CODE_LABEL in the middle of the insn chain, and it happens that this
> > is the first place the insn chain is encountered when marking.  The
> > label involved is "plus_expr:" in expand_expr, line 7632.
> > expand_expr starts at line 6183, so there's a long way to go back.
> 
> Is there any reason to mark the DECL_RTL of a LABEL_REF?  Is there any
> case (baring a serious internal compiler error) where the DECL_RTL
> could possibly be the only reference to an insn?  If not, why waste
> time (and stack space) marking it?

I'm sure there are circumstances, for instance when a CODE_LABEL is
deleted because the code that refers to it is unreachable.

Even if this particular case turns out to be impossible, it would be
very hard to make an argument that there's always another reference to
the DECL_RTL; and even if there actually always was another reference,
you'd have to ask yourself whether the (tiny) speed increase is worth
the extra complexity and the maintenance headache.

The patch I finally committed solves this problem in a more general
way.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 10:21                         ` Geoff Keating
@ 2002-08-05 10:44                           ` Per Bothner
  2002-08-05 11:14                             ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Per Bothner @ 2002-08-05 10:44 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

Geoff Keating wrote:
> Even if this particular case turns out to be impossible, it would be
> very hard to make an argument that there's always another reference to
> the DECL_RTL; and even if there actually always was another reference,
> you'd have to ask yourself whether the (tiny) speed increase is worth
> the extra complexity and the maintenance headache.

If the DECL_RTL is what causes the recursion to blow up, as I got
the impresson from your message, and it prevents the simple
iterate-on-TREE_CHAIN solution from working, then it could well be
worth it.  In fact, if it allows iterating on TREE_CHAIN, instead of the
more complicated and general solution, it would be a simplification.

> The patch I finally committed solves this problem in a more general
> way.

Yes, but by slowing down marking by 5%, when compile-time speed is a
major concern.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 10:44                           ` Per Bothner
@ 2002-08-05 11:14                             ` Geoff Keating
  2002-08-05 11:36                               ` Per Bothner
  0 siblings, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-08-05 11:14 UTC (permalink / raw)
  To: per; +Cc: gcc

> Date: Mon, 05 Aug 2002 10:50:57 -0700
> From: Per Bothner <per@bothner.com>

> Geoff Keating wrote:
> > The patch I finally committed solves this problem in a more general
> > way.
> 
> Yes, but by slowing down marking by 5%, when compile-time speed is a
> major concern.

The slowdown appears to be in using a two-loop algorithm for marking
RTL, not in marking trees or in the third loop.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 11:14                             ` Geoff Keating
@ 2002-08-05 11:36                               ` Per Bothner
  2002-08-05 11:39                                 ` David Edelsohn
  2002-08-05 12:08                                 ` Geoff Keating
  0 siblings, 2 replies; 117+ messages in thread
From: Per Bothner @ 2002-08-05 11:36 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

Geoff Keating wrote:
> The slowdown appears to be in using a two-loop algorithm for marking
> RTL, not in marking trees or in the third loop.

I'm point point is I'm not sure a two-loop algorithm is needed.
I got the impression is that the one-loop argorithm doesn't work
for trees because of references from trees to rtl, like DECL_RTL.
I don't understand how rtl to the same extent, but I also think
a single loop should be enough to avoid deep recursion.

I'm just concerned about slowing down gcc bit by bit.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 11:36                               ` Per Bothner
@ 2002-08-05 11:39                                 ` David Edelsohn
  2002-08-05 11:56                                   ` Per Bothner
  2002-08-05 12:08                                   ` Geoff Keating
  2002-08-05 12:08                                 ` Geoff Keating
  1 sibling, 2 replies; 117+ messages in thread
From: David Edelsohn @ 2002-08-05 11:39 UTC (permalink / raw)
  To: Per Bothner; +Cc: Geoff Keating, gcc

>>>>> Per Bothner writes:

Per> I'm point point is I'm not sure a two-loop algorithm is needed.
Per> I got the impression is that the one-loop argorithm doesn't work
Per> for trees because of references from trees to rtl, like DECL_RTL.
Per> I don't understand how rtl to the same extent, but I also think
Per> a single loop should be enough to avoid deep recursion.

Per> I'm just concerned about slowing down gcc bit by bit.

	Does GCC always do the marking or only when PCH will be enabled?

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 11:39                                 ` David Edelsohn
@ 2002-08-05 11:56                                   ` Per Bothner
  2002-08-05 12:08                                   ` Geoff Keating
  1 sibling, 0 replies; 117+ messages in thread
From: Per Bothner @ 2002-08-05 11:56 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Geoff Keating, gcc

David Edelsohn wrote:
>>>>>>Per Bothner writes:
>>>>>
> 
> Per> I'm point point is I'm not sure a two-loop algorithm is needed.
> Per> I got the impression is that the one-loop argorithm doesn't work
> Per> for trees because of references from trees to rtl, like DECL_RTL.
> Per> I don't understand how rtl to the same extent, but I also think
> Per> a single loop should be enough to avoid deep recursion.
> 
> Per> I'm just concerned about slowing down gcc bit by bit.
> 
> 	Does GCC always do the marking or only when PCH will be enabled?

Always (unless I'm very confused) - this is the compile-time
garbage collection we're talking about.  It was re-written in
conjunction with PCH, but it's still the same basic ggc, except
that the tree-marking and rtl-marking code is now auto-generated
from the header files.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 11:39                                 ` David Edelsohn
  2002-08-05 11:56                                   ` Per Bothner
@ 2002-08-05 12:08                                   ` Geoff Keating
  1 sibling, 0 replies; 117+ messages in thread
From: Geoff Keating @ 2002-08-05 12:08 UTC (permalink / raw)
  To: dje; +Cc: per, gcc

> X-Sieve: cmu-sieve 2.0
> cc: Geoff Keating <geoffk@redhat.com>, gcc@gcc.gnu.org
> Date: Mon, 05 Aug 2002 14:39:14 -0400
> From: David Edelsohn <dje@watson.ibm.com>
> 
> >>>>> Per Bothner writes:
> 
> Per> I'm point point is I'm not sure a two-loop algorithm is needed.
> Per> I got the impression is that the one-loop argorithm doesn't work
> Per> for trees because of references from trees to rtl, like DECL_RTL.
> Per> I don't understand how rtl to the same extent, but I also think
> Per> a single loop should be enough to avoid deep recursion.
> 
> Per> I'm just concerned about slowing down gcc bit by bit.
> 
> 	Does GCC always do the marking or only when PCH will be enabled?

The marking is for GC, so it's always on.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 11:36                               ` Per Bothner
  2002-08-05 11:39                                 ` David Edelsohn
@ 2002-08-05 12:08                                 ` Geoff Keating
  2002-08-05 13:23                                   ` David Edelsohn
  2002-08-05 14:28                                   ` Per Bothner
  1 sibling, 2 replies; 117+ messages in thread
From: Geoff Keating @ 2002-08-05 12:08 UTC (permalink / raw)
  To: per; +Cc: gcc

> Date: Mon, 05 Aug 2002 11:43:17 -0700
> From: Per Bothner <per@bothner.com>

> Geoff Keating wrote:
> > The slowdown appears to be in using a two-loop algorithm for marking
> > RTL, not in marking trees or in the third loop.
> 
> I'm point point is I'm not sure a two-loop algorithm is needed.
> I got the impression is that the one-loop argorithm doesn't work
> for trees because of references from trees to rtl, like DECL_RTL.
> I don't understand how rtl to the same extent, but I also think
> a single loop should be enough to avoid deep recursion.

That case was an example of how even the two-loop algorithm doesn't
work.  I think I described in a previous e-mail how the one-loop
algorithm doesn't work; IIRC it went like:

originally: 24000 maximum nesting on expr.o
one loop: 16000
two loops: 9000
three loops: 800

> I'm just concerned about slowing down gcc bit by bit.

Annoying, but there seems little choice.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 12:08                                 ` Geoff Keating
@ 2002-08-05 13:23                                   ` David Edelsohn
  2002-08-05 14:28                                   ` Per Bothner
  1 sibling, 0 replies; 117+ messages in thread
From: David Edelsohn @ 2002-08-05 13:23 UTC (permalink / raw)
  To: Geoff Keating; +Cc: per, gcc

>>>>> Geoff Keating writes:

>> I'm just concerned about slowing down gcc bit by bit.

Geoff> Annoying, but there seems little choice.

	Slowing down GCC by another 5% is not something that we should
just "live with".  This is hurting GCC's competitiveness.

David

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 12:08                                 ` Geoff Keating
  2002-08-05 13:23                                   ` David Edelsohn
@ 2002-08-05 14:28                                   ` Per Bothner
  2002-08-05 15:04                                     ` Geoff Keating
  2002-08-05 15:26                                     ` Zack Weinberg
  1 sibling, 2 replies; 117+ messages in thread
From: Per Bothner @ 2002-08-05 14:28 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

Geoff Keating wrote:
> I think I described in a previous e-mail how the one-loop
> algorithm doesn't work; IIRC it went like:
> 
> originally: 24000 maximum nesting on expr.o
> one loop: 16000
> two loops: 9000
> three loops: 800

To quote from your previous email:

> I tried both algorithms on the tree marker routine for C.  They both
 > had about the same effect, reducing the recursion depth to about
 > 16000, with no significant change in CPU time.
 > ...
 > The 16000 turns out to be mostly RTL marking.

I.e. most of the recursion depth when marking *tree nodes*
consists of marking *rtl modes*.  I'm suggesting that we don't
need to mark rtl nodes while we're marking tree nodes.  Don't
recurse into rtl from tree - we'll get to the rtl from the
other roots anyway.  In that case your results suggest that a
one-loop method is might work fine.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 14:28                                   ` Per Bothner
@ 2002-08-05 15:04                                     ` Geoff Keating
  2002-08-05 15:37                                       ` Per Bothner
  2002-08-05 15:26                                     ` Zack Weinberg
  1 sibling, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2002-08-05 15:04 UTC (permalink / raw)
  To: per; +Cc: gcc

> Date: Mon, 05 Aug 2002 14:34:42 -0700
> From: Per Bothner <per@bothner.com>

> Geoff Keating wrote:
> > I think I described in a previous e-mail how the one-loop
> > algorithm doesn't work; IIRC it went like:
> > 
> > originally: 24000 maximum nesting on expr.o
> > one loop: 16000
> > two loops: 9000
> > three loops: 800

Now that I looked up my previous mail to check, it was actually:

originally: 24000
one-loop method applied to tree marking routine only: 16000
two-loop method applied to tree marking routine only: 16000
one-loop method applied to both trees & rtl: 13000
two-loop method applied to both: 10000
two-loop method on trees, three-loop method on rtl: 800

> To quote from your previous email:
> 
> > I tried both algorithms on the tree marker routine for C.  They both
>  > had about the same effect, reducing the recursion depth to about
>  > 16000, with no significant change in CPU time.
>  > ...
>  > The 16000 turns out to be mostly RTL marking.

By 'mostly RTL', I meant that there was some rtvec marking too.  The
bottom two or three levels might have been tree marking, but not
enough to be significant.

> I.e. most of the recursion depth when marking *tree nodes*
> consists of marking *rtl modes*.

That doesn't follow.  The code path to the maximum depth may have
moved; it might have been 24k levels of trees originally, but 16k
levels of RTL with either the one-loop or two-loop methods applied to
the tree marker.

>   I'm suggesting that we don't need to mark rtl nodes while we're
> marking tree nodes.  Don't recurse into rtl from tree - we'll get to
> the rtl from the other roots anyway.

This won't avoid the LABEL_REF.  It will be encountered the first time
it's used in RTL, which will still be much earlier than the position
of the label.

>  In that case your results suggest that a one-loop method is might
> work fine.

I see no evidence for this.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 14:28                                   ` Per Bothner
  2002-08-05 15:04                                     ` Geoff Keating
@ 2002-08-05 15:26                                     ` Zack Weinberg
  2002-08-05 15:42                                       ` Per Bothner
  1 sibling, 1 reply; 117+ messages in thread
From: Zack Weinberg @ 2002-08-05 15:26 UTC (permalink / raw)
  To: Per Bothner; +Cc: Geoff Keating, gcc

On Mon, Aug 05, 2002 at 02:34:42PM -0700, Per Bothner wrote:
> 
> I.e. most of the recursion depth when marking *tree nodes*
> consists of marking *rtl modes*.  I'm suggesting that we don't
> need to mark rtl nodes while we're marking tree nodes.  Don't
> recurse into rtl from tree - we'll get to the rtl from the
> other roots anyway.  In that case your results suggest that a
> one-loop method is might work fine.

This creates a hidden assumption which may not actually be true: that
all RTL is reachable from the GC root set independent of pointers from
tree nodes.  I would rather eat a 5% speed hit than introduce a
potential bug of this form, which is likely to be extremely hard to
track down.  Note that we are discussing 5% of mark time, not 5% of
total runtime (mark time can dominate, so it's not to be ignored).

However.  There are not too many tree nodes that contain rtx fields -
all for which CST_OR_CONSTRUCTOR_CHECK succeeds; all for which
DECL_CHECK succeeds; and a few of EXPR nodes, SAVE_EXPR,
GOTO_SUBROUTINE_EXPR, RTL_EXPR, WITH_CLEANUP_EXPR, and
METHOD_CALL_EXPR.  I have been wondering off and on whether the RTL
fields should be ripped entirely out of the DECL and CST nodes,
replaced with a lookaside table of some sort.  This would give the
behavior you suggest as a convenient side effect.  Since, empirically,
those pointers are usually null (witness the benefits of making
DECL_RTL lazily constructed) it should not be a performance problem;
it may even improve performance by reducing the memory cost of CST and
DECL nodes.

To implement this, I would: Create a deletable hash table indexed by
address, which stores TREE_CST_RTL and DECL_RTL.  Modify the tree.h
accessor macros accordingly.  Modify the mark process so that when it
comes upon an object stored in a deletable hash table, it does not
recurse into that object; it simply marks the object and continues.
Then, in a second phase, it marks from all marked objects in all
deletable hash tables.  Unmarked objects in those tables are discarded
as usual.

Not sure what to do with the EXPR nodes that point to RTL, but they
may not be a significant problem.

What do you think?

zw

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 15:04                                     ` Geoff Keating
@ 2002-08-05 15:37                                       ` Per Bothner
  2002-08-05 16:15                                         ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Per Bothner @ 2002-08-05 15:37 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

Geoff Keating wrote:
> This won't avoid the LABEL_REF.  It will be encountered the first time
> it's used in RTL, which will still be much earlier than the position
> of the label.

Same point:  Do we really need to follow LABEL_REFs?  Aren't their
targets guaranteed to be referenced by something else?

General point:  We're acting as if we're implementing a general-purpose
garbage collector that needs to work without any knowledge of the
semantics of its objects beyond being able to chase pointers.  However,
we don't need a general gc - we just need something that works for gcc.
And gcc doesn't willy-nilly allocate chunks of rtl and stash them in
random places.  We know how gcc allocates and uses rtl.  Thus we don't
need to chase every cross-reference and back-pointer - they are
redundant, if the compiler isn't broken.  (Of course having self-checks
in the compiler is good.  But that's a separate issue.)

(Of course if we were implementing a general-purpose garbage collector
we should not use recursion to follow pointers!)
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 15:26                                     ` Zack Weinberg
@ 2002-08-05 15:42                                       ` Per Bothner
  0 siblings, 0 replies; 117+ messages in thread
From: Per Bothner @ 2002-08-05 15:42 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Geoff Keating, gcc

Zack Weinberg wrote:
> This creates a hidden assumption which may not actually be true: that
> all RTL is reachable from the GC root set independent of pointers from
> tree nodes.

I don't know.  While I don't think the compiler currently does this,
it may be reasonable to use FUNCTION_DECLs as the "logical root" for
the INSNs created for the function.  In that case there should be a
single top-level pointer to the INSN-chain, and of course we must
follow it.

My point is about the various redundant cross- and back-references.
-- 
	--Per Bothner
per@bothner.com   http://www.bothner.com/per/

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: needless deep recursion in gt-c-decl.h
  2002-08-05 15:37                                       ` Per Bothner
@ 2002-08-05 16:15                                         ` Geoff Keating
  0 siblings, 0 replies; 117+ messages in thread
From: Geoff Keating @ 2002-08-05 16:15 UTC (permalink / raw)
  To: per; +Cc: gcc

> Date: Mon, 05 Aug 2002 15:43:36 -0700
> From: Per Bothner <per@bothner.com>

> Geoff Keating wrote:
> > This won't avoid the LABEL_REF.  It will be encountered the first time
> > it's used in RTL, which will still be much earlier than the position
> > of the label.
> 
> Same point:  Do we really need to follow LABEL_REFs?  Aren't their
> targets guaranteed to be referenced by something else?

I would need to see a proof before I would believe that this is always
true.  Note that there is currently special code for GC in rtl.h which
has to deal with the possiblity that insns might exist which are not
on the insn chain.  Also note that it is not true that every RTL
expression that's live is actually computed by the program.

> General point:  We're acting as if we're implementing a general-purpose
> garbage collector that needs to work without any knowledge of the
> semantics of its objects beyond being able to chase pointers.  However,
> we don't need a general gc - we just need something that works for gcc.
> And gcc doesn't willy-nilly allocate chunks of rtl and stash them in
> random places.  We know how gcc allocates and uses rtl.

Speak for yourself!  I certainly wouldn't claim to know all the places
RTL gets allocated or used.

> Thus we don't need to chase every cross-reference and back-pointer -
> they are redundant, if the compiler isn't broken.  (Of course having
> self-checks in the compiler is good.  But that's a separate issue.)

... so, what collection of pointers takes up a significant amount of
the time spent in GC, yet is guaranteed to be redundant with other
pointers?

> (Of course if we were implementing a general-purpose garbage collector
> we should not use recursion to follow pointers!)

A general-purpose collector would indeed work differently.

-- 
- Geoffrey Keating <geoffk@geoffk.org> <geoffk@redhat.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-06-12 16:54               ` H . J . Lu
@ 2002-11-24  6:52                 ` Jason R Thorpe
  2002-11-24  7:07                   ` H. J. Lu
  0 siblings, 1 reply; 117+ messages in thread
From: Jason R Thorpe @ 2002-11-24  6:52 UTC (permalink / raw)
  To: H . J . Lu
  Cc: Theodore Papadopoulo, Richard Henderson, Mumit Khan, Geoff Keating, gcc

...jumping back to an old conversation (because Gerald's recent
LD_LIBRARY_PATH mail reminded me of it...)

On Wed, Jun 12, 2002 at 04:49:15PM -0700, H . J . Lu wrote:

 > DT_RPATH is tricky with cross compiling. I don't want it in gcc.

It's not that tricky ... you just need to use -rpath-link in addition
to -rpath if the compiler is a cross-compiler, right?  Okay, maybe it's
a little tricky because of the $prefix/$target/lib/... stuff that is
the GNU toolchain's convention...

But it seems like this would no longer be a problem because of the
--with-sysroot stuff on 3.4-b-i-b.

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-11-24  6:52                 ` Jason R Thorpe
@ 2002-11-24  7:07                   ` H. J. Lu
  2002-11-25 10:01                     ` Theodore Papadopoulo
  0 siblings, 1 reply; 117+ messages in thread
From: H. J. Lu @ 2002-11-24  7:07 UTC (permalink / raw)
  To: Jason R Thorpe, Theodore Papadopoulo, Richard Henderson,
	Mumit Khan, Geoff Keating, gcc

On Sat, Nov 23, 2002 at 12:26:56PM -0800, Jason R Thorpe wrote:
> ...jumping back to an old conversation (because Gerald's recent
> LD_LIBRARY_PATH mail reminded me of it...)
> 
> On Wed, Jun 12, 2002 at 04:49:15PM -0700, H . J . Lu wrote:
> 
>  > DT_RPATH is tricky with cross compiling. I don't want it in gcc.
> 
> It's not that tricky ... you just need to use -rpath-link in addition
> to -rpath if the compiler is a cross-compiler, right?  Okay, maybe it's
> a little tricky because of the $prefix/$target/lib/... stuff that is
> the GNU toolchain's convention...
> 
> But it seems like this would no longer be a problem because of the
> --with-sysroot stuff on 3.4-b-i-b.

I don't like -R for many reasons. One of them is

# ls -d /usr/gcc*
/usr/gcc-3.2  /usr/gcc-3.2-redhat-8  /usr/gcc-3.3

I'd like to be able to remove the old ones and my C++ binaries are
still ok. -R doesn't help me on that.


H.J.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-11-24  7:07                   ` H. J. Lu
@ 2002-11-25 10:01                     ` Theodore Papadopoulo
  2002-11-25 10:05                       ` Jason R Thorpe
  2002-11-25 10:37                       ` H. J. Lu
  0 siblings, 2 replies; 117+ messages in thread
From: Theodore Papadopoulo @ 2002-11-25 10:01 UTC (permalink / raw)
  To: H. J. Lu
  Cc: Jason R Thorpe, Richard Henderson, Mumit Khan, Geoff Keating, gcc



hjl@lucon.org said:
> I don't like -R for many reasons. One of them is
> # ls -d /usr/gcc* /usr/gcc-3.2  /usr/gcc-3.2-redhat-8  /usr/gcc-3.3
> I'd like to be able to remove the old ones and my C++ binaries are
> still ok. -R doesn't help me on that. 

Right -R does not help for this, but how does the current situation is
better with this respect. Currently, with gcc default installation:

- either you have root access and you pass the gcc library directory 
to ldconfig.

- either you let the work to the person that compiles the application,
and that boils down to do the -R flag thing at each time manually.
It can also be done by wrapping the call to the application into 
scripts setting first the LD_LIBRARY_PATH.

- either you leave the work to every user to specify the proper 
LD_LIBRARY_PATH.

- Finally, you compile things statically, and it is known to give the 
wrong behaviour in some cases.

Another solution for this last case is to install the compiler with 
only the static version of its libraries.

Of all those solution, only the static thingy allows you to suppress 
one of your gcc directory without trouble.


In all other cases, things will have to be done in case you remove 
the libgcc.so library. If I remember correctly (I can re-run the 
checks), even if you have specified a -R like flag at compilation 
time, then it is still possible to override it with a LD_LIBRARY_PATH 
when it is missing.

That being said, rth was OK for looking at a patch adding some -R 
like functionnality (just for gcc's own libraries) provided this was 
selected at configure time. I have it working for linux and solaris:
it basically works by generating the proper -Wlrpath directives
internally while processing the gcc specs for libraries. I got 
some comments at first versions of the patch... Then, at some point,
I posted a revised (partial) patch for cleaning up lib_spec handling
within gcc.c (most of the non-configury work was in that patch) and
got no answer. I should have probably ping'ed about that patch....

The patch is still available for review:
http://gcc.gnu.org/ml/gcc-patches/2002-06/msg02176.html. But the 
compiler has somewhat changed in this area so that, if there is interest
in adding this functionnality, I can certainly update it.

Now, this was just a preparatory patch. The work needed to introduce 
the configury flag --with-runpath=XXXX is easy to do in principle 
but somewhat more hairy if I want it to behave correctly in all the 
situations (Non elf targets, HP, True64, ...).

Also, among the comments I had on the follow-up patch, was the fact that
rpath was the proper name instead of runpath (the latter being obsolete).
After searching for quite a while on the web, I finally found
http://stage.caldera.com/developer/gabi/latest/ch5.dynamic.html
which states exactly the opposite as far as I understand it. Is there 
a more up to date ressource ???


If there is interest for that feature, I will happily do the job on 
resurrecting the patches (Actually, I have it working, though I have 
not bootstrapped the compiler for a few weeks)... 

Presumably, I will introduce the feature first for the architectures I can test 
(gnu-linux and sparc-solaris), and then hopefully extending it to
other cases such as HP or AIX which have been mentionned previously
in this new thread, provided that I can get the necessary information.

That being said, I'm not clear on the fact that it will be possible to
implement the -R like flag for all the various combinations gcc supports.
Still, my belief is that something should be done when it is possible.

	Theo.

--------------------------------------------------------------------
Theodore Papadopoulo
Email: Theodore.Papadopoulo@sophia.inria.fr Tel: (33) 04 92 38 76 01
 --------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-11-25 10:01                     ` Theodore Papadopoulo
@ 2002-11-25 10:05                       ` Jason R Thorpe
  2002-11-25 14:40                         ` Theodore Papadopoulo
  2002-11-25 10:37                       ` H. J. Lu
  1 sibling, 1 reply; 117+ messages in thread
From: Jason R Thorpe @ 2002-11-25 10:05 UTC (permalink / raw)
  To: Theodore Papadopoulo; +Cc: gcc

On Mon, Nov 25, 2002 at 04:08:14PM +0100, Theodore Papadopoulo wrote:

 > That being said, rth was OK for looking at a patch adding some -R 
 > like functionnality (just for gcc's own libraries) provided this was 
 > selected at configure time. I have it working for linux and solaris:
 > it basically works by generating the proper -Wlrpath directives
 > internally while processing the gcc specs for libraries. I got 

I'm very interested in this, since I'd like to use it for NetBSD,
as well.  (In fact, I'd like it to be the *default* for NetBSD
configurations.)

-- 
        -- Jason R. Thorpe <thorpej@wasabisystems.com>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-11-25 10:01                     ` Theodore Papadopoulo
  2002-11-25 10:05                       ` Jason R Thorpe
@ 2002-11-25 10:37                       ` H. J. Lu
  1 sibling, 0 replies; 117+ messages in thread
From: H. J. Lu @ 2002-11-25 10:37 UTC (permalink / raw)
  To: Theodore Papadopoulo
  Cc: Jason R Thorpe, Richard Henderson, Mumit Khan, Geoff Keating, gcc

On Mon, Nov 25, 2002 at 04:08:14PM +0100, Theodore Papadopoulo wrote:
> That being said, rth was OK for looking at a patch adding some -R 
> like functionnality (just for gcc's own libraries) provided this was 
> selected at configure time. I have it working for linux and solaris:
> it basically works by generating the proper -Wlrpath directives
> internally while processing the gcc specs for libraries. I got 
> some comments at first versions of the patch... Then, at some point,
> I posted a revised (partial) patch for cleaning up lib_spec handling
> within gcc.c (most of the non-configury work was in that patch) and
> got no answer. I should have probably ping'ed about that patch....
> 

I am not totally against passing -R to ld. But it is only marginally
useful, if at all, for Linux. In addition to problems I have pointed
out already, it may not work at all when you move the binary compiled
by gcc to a different machine since gcc libraries may be installed a
different directory. -R only makes senses when the gcc installation
directory will always be the same for all machines over time.


H.J.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: whither specs?
  2002-11-25 10:05                       ` Jason R Thorpe
@ 2002-11-25 14:40                         ` Theodore Papadopoulo
  0 siblings, 0 replies; 117+ messages in thread
From: Theodore Papadopoulo @ 2002-11-25 14:40 UTC (permalink / raw)
  To: Jason R Thorpe; +Cc: gcc


thorpej@wasabisystems.com said:
>> That being said, rth was OK for looking at a patch adding some -R 
>> like functionnality (just for gcc's own libraries) provided this was 
>> selected at configure time. I have it working for linux and solaris:
>> it basically works by generating the proper -Wlrpath directives
>> internally while processing the gcc specs for libraries. I got 

> I'm very interested in this, since I'd like to use it for NetBSD, as
> well.  (In fact, I'd like it to be the *default* for NetBSD
> configurations.) 

OK, I'll resurrect and re-submit the two patches.

--------------------------------------------------------------------
Theodore Papadopoulo
Email: Theodore.Papadopoulo@sophia.inria.fr Tel: (33) 04 92 38 76 01
 --------------------------------------------------------------------


^ permalink raw reply	[flat|nested] 117+ messages in thread

* switch statement performance
@ 2004-01-20  2:19 Paul Vixie
  2004-01-20 22:13 ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Paul Vixie @ 2004-01-20  2:19 UTC (permalink / raw)
  To: gcc

today i wanted to go on a performance-related witch hunt, so i added:

--------

--- stmt.c.orig Mon Jan 19 19:39:41 2004
+++ stmt.c      Mon Jan 19 19:57:25 2004
@@ -5393,6 +5393,12 @@
               || (TREE_CODE (index_expr) == COMPOUND_EXPR
                   && TREE_CODE (TREE_OPERAND (index_expr, 1)) == INTEGER_CST))
        {
+         /* If the sequence would be absurdly long and slow, send warning.  */
+         if (extra_warnings
+             && count >= case_values_threshold ()
+             && compare_tree_int (range, 10 * count) > 0)
+           warning ("large range, slow code");
+
          index = expand_expr (index_expr, NULL_RTX, VOIDmode, 0);
 
          /* If the index is a short or char that we do not have

--------

and then ran -W against bind8, bind9, the freebsd kernel, and other smaller
things.  the results weren't amazing or stupendous, but they were quite
useful.  one of the things i learned is that a lot of sparse switch stmts
should be left as they are for cleanliness reasons, since they are not
inside loops or otherwise called often enough to warrant rewriting them to
avoid the if-else code generation behaviour signalled by the above warning().

therefore if this were to become part of standard gcc, the following things
are desirable:

	1. it should be turned on with "-Wswitch-ifelse", not "-W".

	2. there should be a pragma or attribute to turn it off on
	   any given switch statement, if a code review finds that
	   it's not performance related.

sadly, i've already reached beyond the top end of my gcc hacking ability,
so if those things were to be done, i would need help learning how.  and
of course, i'd need "permission" in the form of some indication from you
folks that this change, if done properly, would be accepted back into gcc.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: switch statement performance
  2004-01-20  2:19 switch statement performance Paul Vixie
@ 2004-01-20 22:13 ` Geoff Keating
  2004-01-20 22:26   ` Paul Vixie
  0 siblings, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2004-01-20 22:13 UTC (permalink / raw)
  To: Paul Vixie; +Cc: gcc

Paul Vixie <paul@vix.com> writes:

> today i wanted to go on a performance-related witch hunt, so i added:
> 
> --------
> 
> --- stmt.c.orig Mon Jan 19 19:39:41 2004
> +++ stmt.c      Mon Jan 19 19:57:25 2004
> @@ -5393,6 +5393,12 @@
>                || (TREE_CODE (index_expr) == COMPOUND_EXPR
>                    && TREE_CODE (TREE_OPERAND (index_expr, 1)) == INTEGER_CST))
>         {
> +         /* If the sequence would be absurdly long and slow, send warning.  */
> +         if (extra_warnings
> +             && count >= case_values_threshold ()
> +             && compare_tree_int (range, 10 * count) > 0)
> +           warning ("large range, slow code");
> +
>           index = expand_expr (index_expr, NULL_RTX, VOIDmode, 0);
>  
>           /* If the index is a short or char that we do not have
> 
> --------
> 
> and then ran -W against bind8, bind9, the freebsd kernel, and other smaller
> things.  the results weren't amazing or stupendous, but they were quite
> useful.  one of the things i learned is that a lot of sparse switch stmts
> should be left as they are for cleanliness reasons, since they are not
> inside loops or otherwise called often enough to warrant rewriting them to
> avoid the if-else code generation behaviour signalled by the above warning().

You didn't explain why you wanted to do this, or what problem you're
trying to solve; I feel like this is the last message of a long
discussion.  Are you concerned about the object code quality, or the
quality of the C code that GCC is compiling?

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: switch statement performance
  2004-01-20 22:13 ` Geoff Keating
@ 2004-01-20 22:26   ` Paul Vixie
  2004-01-21  0:07     ` Geoff Keating
  0 siblings, 1 reply; 117+ messages in thread
From: Paul Vixie @ 2004-01-20 22:26 UTC (permalink / raw)
  To: Geoff Keating; +Cc: gcc

> You didn't explain why you wanted to do this, or what problem you're
> trying to solve; I feel like this is the last message of a long
> discussion.  Are you concerned about the object code quality, or the
> quality of the C code that GCC is compiling?
> -- 
> Geoffrey Keating <geoffk@geoffk.org>

i had a program that was going real slow.  turned out there was an expectation
of jump table in a module, where a switch statement with ~50 cases had values
specified as #define'd constants.  the numeric values of these were all big
32-bit numbers where each set of 8 bits meant something special.  the result
was generated code equivilent to an if-else chain 50 expressions long, and
when this was replaced with uglier code the function containing this inner-loop
switch statement sped up by a factor of 1000.

it wasn't gcc's fault.  i've seen code generators that made sparse jump tables
but not since the days of vax/vms pascal.  i don't consider this a reasonable
complexity tradeoff for gcc -- the generated code is just fine.  however, the
programmer's expectation that his switch statement was going to be fast was
not reasonable -- yet he would have had to inspect the numeric values of a
lot of #define'd constants in order to know this.

so, what i'd like is a way for gcc to warn that "you're probably not getting
what you expect from this bit of code", in the same general spirit as telling
me that subexpressions ought to be parenthesized.  (though in that case it
could be a correctness issue rather than in this case, a performance issue.)

hopefully that's the missing background from my previous post.  thanks for
listening.

(next up, warnings when sub-int-sized integers are used, since the r-m-w cycle
in the microcode is no faster than explicit load/modify/store even though the
instruction sequence might look shorter... but that's for another day.)

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: switch statement performance
  2004-01-20 22:26   ` Paul Vixie
@ 2004-01-21  0:07     ` Geoff Keating
  2004-01-21  0:29       ` Paul Vixie
  0 siblings, 1 reply; 117+ messages in thread
From: Geoff Keating @ 2004-01-21  0:07 UTC (permalink / raw)
  To: Paul Vixie; +Cc: gcc

Paul Vixie <paul@vix.com> writes:

> > You didn't explain why you wanted to do this, or what problem you're
> > trying to solve; I feel like this is the last message of a long
> > discussion.  Are you concerned about the object code quality, or the
> > quality of the C code that GCC is compiling?
> > -- 
> > Geoffrey Keating <geoffk@geoffk.org>
> 
> i had a program that was going real slow.  turned out there was an expectation
> of jump table in a module, where a switch statement with ~50 cases had values
> specified as #define'd constants.  the numeric values of these were all big
> 32-bit numbers where each set of 8 bits meant something special.  the result
> was generated code equivilent to an if-else chain 50 expressions long, and
> when this was replaced with uglier code the function containing this inner-loop
> switch statement sped up by a factor of 1000.

Could you explain a bit more?

You see, the generated code shouldn't be equivalent to an if-else
chain.  It's supposed to be a binary tree, which would be only at most
6 or 7 branches deep (well, I think we actually do a binary tree for
the outside conditions, and then an if-else chain for the inner ones,
since not-taken branches are cheaper), and therefore shouldn't have
particularly bad performance.  Was the 'uglier code' particularly
simple?

Ideally, we'd like to have GCC generate efficient code automatically;
maybe not quite as clever as the programmer would have, but there
shouldn't be a factor of 1000 unless the programmer's *really* clever :-).

-- 
- Geoffrey Keating <geoffk@geoffk.org>

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: switch statement performance
  2004-01-21  0:07     ` Geoff Keating
@ 2004-01-21  0:29       ` Paul Vixie
  2004-01-21  4:13         ` Ian Lance Taylor
  2004-01-21 21:54         ` tm_gccmail
  0 siblings, 2 replies; 117+ messages in thread
From: Paul Vixie @ 2004-01-21  0:29 UTC (permalink / raw)
  To: gcc

> Could you explain a bit more?
> 
> You see, the generated code shouldn't be equivalent to an if-else
> chain.  It's supposed to be a binary tree, which would be only at most
> 6 or 7 branches deep (well, I think we actually do a binary tree for
> the outside conditions, and then an if-else chain for the inner ones,
> since not-taken branches are cheaper), and therefore shouldn't have
> particularly bad performance.

how long as it been doing this?

> Was the 'uglier code' particularly simple?

the ugly code was particularly ugly.

> Ideally, we'd like to have GCC generate efficient code automatically;
> maybe not quite as clever as the programmer would have, but there
> shouldn't be a factor of 1000 unless the programmer's *really* clever
> :-).

so if i have to compile stuff like this on non-gcc platforms (like native
solaris and tru64 which a lot of bind users seem to have) i'll be at the
mercy of their compilers?

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: switch statement performance
  2004-01-21  0:29       ` Paul Vixie
@ 2004-01-21  4:13         ` Ian Lance Taylor
  2004-01-21  5:15           ` Paul Vixie
  2004-01-21 21:54         ` tm_gccmail
  1 sibling, 1 reply; 117+ messages in thread
From: Ian Lance Taylor @ 2004-01-21  4:13 UTC (permalink / raw)
  To: Paul Vixie; +Cc: gcc

Paul Vixie <paul@vix.com> writes:

> > You see, the generated code shouldn't be equivalent to an if-else
> > chain.  It's supposed to be a binary tree, which would be only at most
> > 6 or 7 branches deep (well, I think we actually do a binary tree for
> > the outside conditions, and then an if-else chain for the inner ones,
> > since not-taken branches are cheaper), and therefore shouldn't have
> > particularly bad performance.
> 
> how long as it been doing this?

For more than ten years.

> so if i have to compile stuff like this on non-gcc platforms (like native
> solaris and tru64 which a lot of bind users seem to have) i'll be at the
> mercy of their compilers?

Well, yes.  Though probably their compilers are just fine.

I think it's asking a bit much for gcc to tell you when code might be
poorly compiled by some other compiler.

Ian

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: switch statement performance
  2004-01-21  4:13         ` Ian Lance Taylor
@ 2004-01-21  5:15           ` Paul Vixie
  0 siblings, 0 replies; 117+ messages in thread
From: Paul Vixie @ 2004-01-21  5:15 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: gcc

> Well, yes.  Though probably their compilers are just fine.

some are, some aren't.

> I think it's asking a bit much for gcc to tell you when code might be
> poorly compiled by some other compiler.

agreed.  

i'm taking this as "no" wrt making this warning a standard part of gcc.

thank you all for your time.

^ permalink raw reply	[flat|nested] 117+ messages in thread

* Re: switch statement performance
  2004-01-21  0:29       ` Paul Vixie
  2004-01-21  4:13         ` Ian Lance Taylor
@ 2004-01-21 21:54         ` tm_gccmail
  1 sibling, 0 replies; 117+ messages in thread
From: tm_gccmail @ 2004-01-21 21:54 UTC (permalink / raw)
  To: Paul Vixie; +Cc: gcc

On Wed, 21 Jan 2004, Paul Vixie wrote:

> so if i have to compile stuff like this on non-gcc platforms (like native
> solaris and tru64 which a lot of bind users seem to have) i'll be at the
> mercy of their compilers?

I'm not entirely sure what you're asking, but I think so.

For example, the Renesas H8/300 compiler does not generate btrees for its
switches, and it generates linear compares.

The Javacard has a *huge* switch statement to dispatch to native code for
the Native Method Interface. Guess why the Javacard benchmark results for
native methods looked dismal when compiled with the Renesas compiler...

Toshi


^ permalink raw reply	[flat|nested] 117+ messages in thread

end of thread, other threads:[~2004-01-21 21:38 UTC | newest]

Thread overview: 117+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-01-20  2:19 switch statement performance Paul Vixie
2004-01-20 22:13 ` Geoff Keating
2004-01-20 22:26   ` Paul Vixie
2004-01-21  0:07     ` Geoff Keating
2004-01-21  0:29       ` Paul Vixie
2004-01-21  4:13         ` Ian Lance Taylor
2004-01-21  5:15           ` Paul Vixie
2004-01-21 21:54         ` tm_gccmail
  -- strict thread matches above, loose matches on Subject: below --
2002-07-24  4:53 needless deep recursion in gt-c-decl.h Per Bothner
2002-07-24  5:17 ` Geoff Keating
2002-07-24  7:09   ` Per Bothner
2002-07-24  7:17     ` Geoff Keating
2002-07-24  7:52       ` Per Bothner
2002-07-25  2:32         ` Geoff Keating
2002-07-25 11:33           ` Per Bothner
2002-07-25 14:20             ` Geoff Keating
2002-07-26  8:37               ` Per Bothner
2002-07-30 21:41                 ` Geoff Keating
2002-07-31  1:40                   ` Per Bothner
2002-08-01 12:32                     ` Geoff Keating
2002-08-01 14:00                       ` Mike Stump
2002-08-01 16:00                         ` Geoff Keating
2002-08-01 19:17                       ` Richard Henderson
2002-08-05 10:06                       ` Per Bothner
2002-08-05 10:21                         ` Geoff Keating
2002-08-05 10:44                           ` Per Bothner
2002-08-05 11:14                             ` Geoff Keating
2002-08-05 11:36                               ` Per Bothner
2002-08-05 11:39                                 ` David Edelsohn
2002-08-05 11:56                                   ` Per Bothner
2002-08-05 12:08                                   ` Geoff Keating
2002-08-05 12:08                                 ` Geoff Keating
2002-08-05 13:23                                   ` David Edelsohn
2002-08-05 14:28                                   ` Per Bothner
2002-08-05 15:04                                     ` Geoff Keating
2002-08-05 15:37                                       ` Per Bothner
2002-08-05 16:15                                         ` Geoff Keating
2002-08-05 15:26                                     ` Zack Weinberg
2002-08-05 15:42                                       ` Per Bothner
     [not found] <geoffk@geoffk.org>
2002-06-10 10:31 ` whither specs? Geoff Keating
2002-06-10 10:55   ` David Edelsohn
2002-06-10 10:56     ` Geoff Keating
2002-06-10 11:01       ` H . J . Lu
2002-06-10 11:18       ` David Edelsohn
2002-06-10 11:47       ` Mumit Khan
2002-06-11 16:48         ` Jason R Thorpe
2002-06-12  1:20           ` Theodore Papadopoulo
2002-06-12 11:01           ` Richard Henderson
2002-06-12 11:36             ` Theodore Papadopoulo
2002-06-12 15:48               ` Richard Henderson
2002-06-12 16:54               ` H . J . Lu
2002-11-24  6:52                 ` Jason R Thorpe
2002-11-24  7:07                   ` H. J. Lu
2002-11-25 10:01                     ` Theodore Papadopoulo
2002-11-25 10:05                       ` Jason R Thorpe
2002-11-25 14:40                         ` Theodore Papadopoulo
2002-11-25 10:37                       ` H. J. Lu
2002-06-10 15:01       ` Matthias Benkmann
2002-06-10 12:17   ` Tom Tromey
2002-06-10 15:37   ` Zack Weinberg
2002-06-10 19:25   ` Mark Mitchell
2002-06-05 12:28 PCH merge bootstrap failure on systems without flex Kaveh R. Ghazi
2002-06-05 19:37 ` Geoff Keating
2002-06-05 19:59   ` David Edelsohn
2002-06-05 20:22     ` Kaveh R. Ghazi
2002-06-10  0:01       ` Mark Mitchell
2002-06-10  3:17         ` Gerald Pfeifer
2002-06-10  7:48           ` Kaveh R. Ghazi
     [not found]         ` <Pine.BSF.4.44.0206101202120.46487-100000@naos.dbai.tuwien. ac.at>
2002-06-10  5:41           ` Andrea 'Fyre Wyzard' Bocci
2002-06-05 20:37     ` Geoff Keating
2002-05-12 12:50 Dumb register allocation (PPC) Zack Weinberg
2002-05-12 16:59 ` law
2002-05-12 21:38   ` David Edelsohn
2002-05-12 22:44     ` Zack Weinberg
2002-05-13 18:20       ` Richard Henderson
2002-05-14 13:15     ` Dale Johannesen
2002-05-14 13:15       ` David Edelsohn
2002-05-15  0:00         ` Richard Henderson
2002-05-15  8:27           ` David Edelsohn
2002-05-15 11:47             ` Geoff Keating
2002-05-15 12:08               ` law
2002-05-15 11:50             ` David Edelsohn
2002-05-15 12:38               ` law
2002-05-15 12:43               ` Geoff Keating
2002-05-15 12:55                 ` David Edelsohn
2002-04-29 14:36 PR 6394 Mark Mitchell
2002-04-29 21:38 ` John David Anglin
2002-04-30 10:31   ` David Edelsohn
2002-04-30 10:48     ` John David Anglin
2002-04-30 18:48       ` law
2002-04-30 19:55         ` John David Anglin
2002-04-30 20:43           ` law
2002-04-30 21:06             ` John David Anglin
2002-04-30 12:56     ` Geoff Keating
2002-04-30 13:07       ` David Edelsohn
2002-04-30 13:12         ` John David Anglin
2002-04-30 13:37           ` Geoff Keating
2002-04-30 13:41             ` David Edelsohn
2002-04-30 14:07               ` law
2002-04-30 14:28                 ` John David Anglin
2002-04-30 13:59             ` John David Anglin
2002-04-30 14:04           ` law
2002-04-30 14:39             ` John David Anglin
2002-04-30 14:54               ` law
2002-04-30 15:04                 ` John David Anglin
2002-04-30 15:23               ` law
2002-04-30 15:29                 ` John David Anglin
2002-04-30 15:52                   ` law
2002-04-30 16:11               ` law
2002-04-30 16:27                 ` John David Anglin
2002-04-30 16:41                   ` law
2002-04-30 18:46                   ` law
2002-04-30 20:18                     ` John David Anglin
2002-05-01  7:17                     ` Peter Barada
2002-04-30 16:34                 ` Geoff Keating
2002-04-30 17:04                   ` law
2002-04-30 18:48                   ` law

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).