public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Fwd: Questions on PA machine description?
@ 1999-03-08  6:49 Jerry Quinn
       [not found] ` < 36E3E35A.4BB420BE@americasm01.nt.com >
  1999-03-31 23:46 ` Jerry Quinn
  0 siblings, 2 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-08  6:49 UTC (permalink / raw)
  To: egcs

Oops - got the address wrong the first time

> From: "Jerry Quinn" <jquinn@nortelnetworks.com>
> To: egcs@egcs.cyngus.com
> Subject: Questions on PA machine description?
> Date: Fri, 5 Mar 1999 16:43:35 +0000
> X-Orig: <jquinn@americasm01.nt.com>
> 
> Can someone help me understand function unit descriptions a bit better?
> 
> I've been playing with machine function unit desriptions recently.  Looking at
> the pa7100LC description for ALU, I'm a bit confused.  The following is the
> description:
> 
> ;; We have two basic ALUs.
> (define_function_unit "pa7100LCalu" 2 2
>   (and
>     (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,
> fpdivdbl,fpsqrtdbl,load,fpload,store,fpstore,shift,nullshift")
>    (eq_attr "cpu" "7100LC,7200")) 1 1)
> 
> It says there are 2 ALU's.  The READY=1 and DELAY=1 appear to me to be a gate
> so that each unit can be issued 1 insn per cycle.  What is confusing is the
> SIMULTANEITY of 2.  The documentation claims that this means that each unit
> can have two active insns issued at a time.  But this isn't true.  A total of
> two insns can be issued by using both units.  Is the documentation wrong, the
> function unit description wrong, or is it a convenient means of accomplishing
> something I don't understand?
> 
> Also, my reading of the definition of SIMULTANEITY seems to indicate that it
> should be 1 for the pa7100LCfp_div function unit since the div/sqrt portion of
> the FPU can only execute one insn at a time.  Is this correct or wrong?
> 
> Second question:
> 
> There is a comment that shifts and memory ops execute in one of the ALU's but
> that that can't be modeled.  Can someone explain what that means.  What PA
> descriptions I've found so far (I haven't seen many) seem to say that shift
> and merge are ALU ops just like integer add so the comment doesn't seem
> justified, but it's there for a reason.  Also, how do memory ops fit into
> this?
> 
> Thanks
> 
> - --
> Jerry Quinn                             Tel: (514) 761-8737
> jquinn@nortelnetworks.com               Fax: (514) 761-8505
> Speech Recognition Research
>


To : "Quinn, Jerry (J.) [EXCHANGE:MTL:6X17:BNR]" <jquinn@americasm01.nt.com>
Subject : Delivery Report (failure) for egcs@egcs.cyngus.com
From : "Postmaster, Nortel (B.) [BNRUNIX:FITZ:4C35:NT]"      <postmast@bcarhe66.ca.nortel.com> 
Date : Sat, 6 Mar 1999 13:06:32 -0500
Message-Type : Delivery Report
X400-MTS-Identifier : [/PRMD=BNR/ADMD=TELECOM.CANADA/C=CA/;smtpott1.n.912:06.03.99.17.02.29]


------------------------------ Start of body part 1

This report relates to your message: 
Subject: Questions on PA machine description?,
  To: egcs@egcs.cyngus.com

        of Fri, 5 Mar 1999 11:44:02 -0500

Your message was not delivered to   egcs@egcs.cyngus.com
        for the following reason:
        Message timed out 

***** The following information is directed towards the local administrator
***** and is not intended for the end user
* 
* DR generated by: mta smtpott1.nortel.com
*         in /PRMD=BNR/ADMD=TELECOM.CANADA/C=CA/
*         at Sat, 6 Mar 1999 12:02:29 -0500
*
* Converted to RFC 822 at smtpott1.nortel.com
*         at Sat, 6 Mar 1999 13:06:32 -0500
*
* Delivery Report Contents:
*
* Subject-Submission-Identifier: [/PRMD=BNR/ADMD=TELECOM.CANADA/C=CA/;smtpott1.n.316:05.03.99.16.44.02]
* Content-Identifier: Questions on ...
* Original-Encoded-Information-Types: ia5-text
* Subject-Intermediate-Trace-Information:  /PRMD=BNR/ADMD=TELECOM.CANADA/C=CA/arrival Fri, 5 Mar 1999 11:44:02 -0500 action Relayed
* Content-Correlator: Subject: Questions on PA machine description?,
*                   To: egcs@egcs.cyngus.com
* Recipient-Info: egcs@egcs.cyngus.com,
*         /RFC-822=egcs(a)egcs.cyngus.com/PRMD=BNR/ADMD=TELECOM.CANADA/C=CA/;
*         FAILURE reason Unable-To-Transfer (1);
*         diagnostic Maximum-Time-Expired (5);
*         last trace (ia5-text) Fri, 5 Mar 1999 11:44:02 -0500;
*         converted eits ia5-text;
****** End of administration information 

------------------------------ Start of forwarded message 1

Received: from zcars01t by smtpott1.nortel.ca; Fri, 5 Mar 1999 11:44:02 -0500
Received: from zcard00n.ca.nortel.com by zcars01t;
          Fri, 5 Mar 1999 11:43:33 -0500
Received: from zmtlde5a.ca.nortel.com ([47.64.13.90]) by zcard00n.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2232.9) 
          id F5RNWKT0; Fri, 5 Mar 1999 11:43:33 -0500
Received: from wmtl249c.ca.nortel.com by zmtlde5a.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.0.1460.8) 
          id FTGSD1LK; Fri, 5 Mar 1999 11:43:34 -0500
From: "Jerry Quinn" <jquinn@nortelnetworks.com>
To: egcs@egcs.cyngus.com
Subject: Questions on PA machine description?
Date: Fri, 5 Mar 1999 16:43:35 +0000
X-Orig: <jquinn@americasm01.nt.com>

Can someone help me understand function unit descriptions a bit better?

I've been playing with machine function unit desriptions recently.  Looking at 
the pa7100LC description for ALU, I'm a bit confused.  The following is the
description: 

;; We have two basic ALUs.
(define_function_unit "pa7100LCalu" 2 2
  (and
    (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,
fpdivdbl,fpsqrtdbl,load,fpload,store,fpstore,shift,nullshift")
   (eq_attr "cpu" "7100LC,7200")) 1 1)


It says there are 2 ALU's.  The READY=1 and DELAY=1 appear to me to be a gate
so that each unit can be issued 1 insn per cycle.  What is confusing is the
SIMULTANEITY of 2.  The documentation claims that this means that each unit
can have two active insns issued at a time.  But this isn't true.  A total of
two insns can be issued by using both units.  Is the documentation wrong, the
function unit description wrong, or is it a convenient means of accomplishing
something I don't understand?

Also, my reading of the definition of SIMULTANEITY seems to indicate that it
should be 1 for the pa7100LCfp_div function unit since the div/sqrt portion of 
the FPU can only execute one insn at a time.  Is this correct or wrong?

Second question:

There is a comment that shifts and memory ops execute in one of the ALU's but
that that can't be modeled.  Can someone explain what that means.  What PA
descriptions I've found so far (I haven't seen many) seem to say that shift
and merge are ALU ops just like integer add so the comment doesn't seem
justified, but it's there for a reason.  Also, how do memory ops fit into
this?

Thanks

- -- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research


------------------------------ End of forwarded message 1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found] ` < 36E3E35A.4BB420BE@americasm01.nt.com >
@ 1999-03-08 20:48   ` Jeffrey A Law
  1999-03-09 11:35     ` Jerry Quinn
  1999-03-31 23:46     ` Jeffrey A Law
  0 siblings, 2 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-08 20:48 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: egcs

  In message < 36E3E35A.4BB420BE@americasm01.nt.com >you write:
  > > ;; We have two basic ALUs.
  > > (define_function_unit "pa7100LCalu" 2 2
  > >   (and
  > >     (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,
  > > fpdivdbl,fpsqrtdbl,load,fpload,store,fpstore,shift,nullshift")
  > >    (eq_attr "cpu" "7100LC,7200")) 1 1)
  > > 
  > > It says there are 2 ALU's.  The READY=1 and DELAY=1 appear to me to be a 
  > > gate so that each unit can be issued 1 insn per cycle.
Correct.


  > What is confusing is the SIMULTANEITY of 2.  The documentation claims that
  > this means that each unit can have two active insns issued at a time.  But
  > this isn't true.
Could well be an oversight on my part.  It's been a long time since I wrote
that stuff.


  > > Also, my reading of the definition of SIMULTANEITY seems to indicate
  > > that it should be 1 for the pa7100LCfp_div function unit since the
  > > div/sqrt portion of the FPU can only execute one insn at a time.  Is
  > > this correct or wrong?
I believe this is correct.  The simultaneity should be one.

  > > There is a comment that shifts and memory ops execute in one of the
  > > ALU's but that that can't be modeled.
That was true at one time.  Or more correctly it couldn't be easily modeled.


  > Can someone explain what that means.
Basically one of the ALUs is not complete and can only handle a subset of the
ALU instructions.

So, stuff like adds, subtracts, compares, etc can issue to either ALU, but
a memory load/store or a shift instruction can only issue to the first
alu.  Assuming I remember everything from the LC series correctly.

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-08 20:48   ` Jeffrey A Law
@ 1999-03-09 11:35     ` Jerry Quinn
       [not found]       ` < 36E577DF.F825BAFB@americasm01.nt.com >
  1999-03-31 23:46       ` Jerry Quinn
  1999-03-31 23:46     ` Jeffrey A Law
  1 sibling, 2 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-09 11:35 UTC (permalink / raw)
  To: law; +Cc: egcs

Jeffrey A Law wrote:
>   > Can someone explain what that means.
> Basically one of the ALUs is not complete and can only handle a subset of the
> ALU instructions.
> 
> So, stuff like adds, subtracts, compares, etc can issue to either ALU, but
> a memory load/store or a shift instruction can only issue to the first
> alu.  Assuming I remember everything from the LC series correctly.

Do you have a pointer to something that describes this better?  I've
been unable to find any info at all on the 7100LC and the 7300 is
described mainly as a 7100LC core.

I wouldn't mind playing with the machine description some.  Also, do you
have any clue if the 8000 also has this same asymmetric ALU situation?

Jerry

-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found]       ` < 36E577DF.F825BAFB@americasm01.nt.com >
@ 1999-03-09 12:18         ` Jeffrey A Law
  1999-03-16 10:49           ` Jerry Quinn
  1999-03-31 23:46           ` Jeffrey A Law
  0 siblings, 2 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-09 12:18 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: egcs

  In message < 36E577DF.F825BAFB@americasm01.nt.com >you write:
  > Do you have a pointer to something that describes this better?  I've
  > been unable to find any info at all on the 7100LC and the 7300 is
  > described mainly as a 7100LC core.
Nope.  Sorry.  I know the non-symmetric nature of the ALUs on the 7100LC
series was discussed in a public forum years ago, but I don't remember
precisely where (MPR would be a good first bet).  The scheduling info in
gcc was derived from that public information.

I wouldn't be at all suprised if the 7200 and 7300 were basically 7100LC cores
with different memory latencies.


  > I wouldn't mind playing with the machine description some.  Also, do you
  > have any clue if the 8000 also has this same asymmetric ALU situation?
The 8000 does not suffer from this problem.  The ALUs are symmetric.  Not
that it matters since from a scheduling standpoint you (mostly) want to
ignore latency and schedule for reorder buffer retirement (2 memops, 2 nonmem
ops per cycle).

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-09 12:18         ` Jeffrey A Law
@ 1999-03-16 10:49           ` Jerry Quinn
       [not found]             ` < 36EEA7B6.D3C894DE@americasm01.nt.com >
  1999-03-31 23:46             ` Jerry Quinn
  1999-03-31 23:46           ` Jeffrey A Law
  1 sibling, 2 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-16 10:49 UTC (permalink / raw)
  To: law; +Cc: egcs

Jeffrey A Law wrote:
> 
>   > I wouldn't mind playing with the machine description some.  Also, do you
>   > have any clue if the 8000 also has this same asymmetric ALU situation?
> The 8000 does not suffer from this problem.  The ALUs are symmetric.  Not
> that it matters since from a scheduling standpoint you (mostly) want to
> ignore latency and schedule for reorder buffer retirement (2 memops, 2 nonmem
> ops per cycle).

I thought I'd go ahead and play with the machine description a little. 
I tried the following after the initial implementation of doubling the
7100lc function units:

(define_function_unit "pa8000memory" 2 0
  (and (eq_attr "type" "load,fpload,store,fpstore")
       (eq_attr "cpu" "8000")) 2 1)
(define_function_unit "pa8000fp_div" 2 1
  (and (eq_attr "type" "fpdivsgl,fpsqrtsgl")
	(eq_attr "cpu" "8000")) 17 17)
(define_function_unit "pa8000fp_div" 2 1
  (and (eq_attr "type" "fpdivdbl,fpsqrtdbl")
	(eq_attr "cpu" "8000")) 31 31)
(define_function_unit "pa8000alu" 2 1
   (and
    (eq_attr "type" "!load,fpload,store,fpstore")
    (eq_attr "cpu" "8000")) 1 1)

Theory being that the memory represents things leaving the load reorder
buffer and alu represents the nonload reorder buffer.  I added the
div/sqrt constraint on the theory that they take long enough to have a
big effect on retirement.  All for nought - it is better by a few
percent on some programs, worse by a few percent on others.  No major
differences that I could see.

Any thoughts?

-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found]             ` < 36EEA7B6.D3C894DE@americasm01.nt.com >
@ 1999-03-17 20:12               ` Jeffrey A Law
  1999-03-18 14:35                 ` Jerry Quinn
  1999-03-31 23:46                 ` Jeffrey A Law
  1999-03-17 20:14               ` Jeffrey A Law
  1 sibling, 2 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-17 20:12 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: egcs

  In message < 36EEA7B6.D3C894DE@americasm01.nt.com >you write:
  > (define_function_unit "pa8000memory" 2 0
  >   (and (eq_attr "type" "load,fpload,store,fpstore")
  >        (eq_attr "cpu" "8000")) 2 1)
I would suggest making the simultaneity 1 and ready delay 1.  The point is
we do not want to expose the load latency, since we're trying to describe
how instructions are retired.

ie, we can retire one instruction from each of the two load-store units
every cycle.

I'd also suggest changing the name to "pa8000lsu" since we're not trying to
describe the memory subsystem, but instead how insns retire out of the load
store unit.




  > (define_function_unit "pa8000fp_div" 2 1
  >   (and (eq_attr "type" "fpdivsgl,fpsqrtsgl")
  > 	(eq_attr "cpu" "8000")) 17 17)
  > (define_function_unit "pa8000fp_div" 2 1
  >   (and (eq_attr "type" "fpdivdbl,fpsqrtdbl")
  > 	(eq_attr "cpu" "8000")) 31 31)
  > (define_function_unit "pa8000alu" 2 1
  >    (and
  >     (eq_attr "type" "!load,fpload,store,fpstore")
  >     (eq_attr "cpu" "8000")) 1 1)
These look reasonable.  



I'd create one additional unit -- fmac for all the other fp computation
insn to show the partial latency as recommended by HP.  Something like this:

(define_function_unit "pa8000fmac" 2 0
  (and
    (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl")
    (eq_attr "cpu" "8000")) 2 1)

ie, there's two fmac units which are fully pipelined.   Results are available
in 2 cycles.


I'm going to make those changes and install the patch.


  > Theory being that the memory represents things leaving the load reorder
  > buffer and alu represents the nonload reorder buffer.  I added the
  > div/sqrt constraint on the theory that they take long enough to have a
  > big effect on retirement.  All for nought - it is better by a few
  > percent on some programs, worse by a few percent on others.  No major
  > differences that I could see.
That's basically what we want to do.  You shouldn't expect much from
instruction scheduling on a PA8000 class machine.  All the folks I've
spoken to about this indicate that it's minor relative to other stuff.

The other thing to think about is how to show that some instructions which
are data dependent can/should issue in the same cycle.  ie, if an alu
operation feeds another alu operation, then we should issue them in the
same cycle.

One thought would be to make the ready delay for alu instructions 0, then
tweak haifa to add dependent instrutions to the ready queue immediately
after it issues an insn with a ready delay of zero cycles.


jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found]             ` < 36EEA7B6.D3C894DE@americasm01.nt.com >
  1999-03-17 20:12               ` Jeffrey A Law
@ 1999-03-17 20:14               ` Jeffrey A Law
  1999-03-31 23:46                 ` Jeffrey A Law
  1 sibling, 1 reply; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-17 20:14 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: egcs

BTW -- the other thing you might want to do is have adjust_cost do nothing
when scheduling for a PA8000 class machine.


jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-17 20:12               ` Jeffrey A Law
@ 1999-03-18 14:35                 ` Jerry Quinn
       [not found]                   ` < 36F17F8C.9FD937FA@americasm01.nt.com >
  1999-03-31 23:46                   ` Jerry Quinn
  1999-03-31 23:46                 ` Jeffrey A Law
  1 sibling, 2 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-18 14:35 UTC (permalink / raw)
  To: law; +Cc: egcs

Jeffrey A Law wrote:
> 
> I'd create one additional unit -- fmac for all the other fp computation
> insn to show the partial latency as recommended by HP.  Something like this:
> 
> (define_function_unit "pa8000fmac" 2 0
>   (and
>     (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl")
>     (eq_attr "cpu" "8000")) 2 1)
> 
> ie, there's two fmac units which are fully pipelined.   Results are available
> in 2 cycles.
> 
> I'm going to make those changes and install the patch.

There seems to be some delay going on :-)  I responded to your post on
egcs-patches before realizing you saw my revision of my original
scheduling.

The other message mentioned that eliminating autoincrement/autodecrement
instructions is a good thing.  Do these instructions have the same
problem as fmpyadd, i.e. grabbing multiple reorder slots and function
units?

> The other thing to think about is how to show that some instructions which
> are data dependent can/should issue in the same cycle.  ie, if an alu
> operation feeds another alu operation, then we should issue them in the
> same cycle.
> 
> One thought would be to make the ready delay for alu instructions 0, then
> tweak haifa to add dependent instrutions to the ready queue immediately
> after it issues an insn with a ready delay of zero cycles.

What about making pa_adjust_cost set the cost of a data dependency to
0?  The alpha port does this on the ev5.

Why is this a good thing?  Won't an instruction that depends on another
one have to retire later than the other one?

Jerry

-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found]                   ` < 36F17F8C.9FD937FA@americasm01.nt.com >
@ 1999-03-18 21:00                     ` Jeffrey A Law
       [not found]                       ` < 18375.921819620@hurl.cygnus.com >
  1999-03-31 23:46                       ` Jeffrey A Law
  0 siblings, 2 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-18 21:00 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: egcs

  In message < 36F17F8C.9FD937FA@americasm01.nt.com >you write:
  > The other message mentioned that eliminating autoincrement/autodecrement
  > instructions is a good thing.  Do these instructions have the same
  > problem as fmpyadd, i.e. grabbing multiple reorder slots and function
  > units?
Yes, they have the same problem as fmpyadd/fmpysub.  They also have the 
disadvantage that the autoinc addressing mode adds additional data depedencies
which can inhibit the amount of ILP found by the compiler and by the hardware.

  > > One thought would be to make the ready delay for alu instructions 0, then
  > > tweak haifa to add dependent instrutions to the ready queue immediately
  > > after it issues an insn with a ready delay of zero cycles.
  > 
  > What about making pa_adjust_cost set the cost of a data dependency to
  > 0?  The alpha port does this on the ev5.
Nope, it won't do what we want.  Look at the loop which issues insns from
the ready list in haifa-sched.c.

It has a structure like:

while (not all insns scheduled)
  add insns with no outstanding dependencies to the ready queue
  sort the ready queue
  while (ready list is not empty && target can issue more insns)
    issue an insn off the ready queue, remove dependencies on the issued insn

So, given insn1 which feeds insn2 we will never issue insn1 & insn2 in the
same cycle.

  > Why is this a good thing?  Won't an instruction that depends on another
  > one have to retire later than the other one?
No.  They can retire in the same cycle.  This is discussed in one of the
PA8000 optimization papers from HP.   The key is to remember that PA8000
machine is an out of execution machine.

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found]                       ` < 18375.921819620@hurl.cygnus.com >
@ 1999-03-19 17:32                         ` Richard Henderson
       [not found]                           ` < 19990319173226.C14722@cygnus.com >
  1999-03-31 23:46                           ` Richard Henderson
  0 siblings, 2 replies; 48+ messages in thread
From: Richard Henderson @ 1999-03-19 17:32 UTC (permalink / raw)
  To: law, Jerry Quinn; +Cc: egcs

On Thu, Mar 18, 1999 at 10:00:20PM -0700, Jeffrey A Law wrote:
> It has a structure like:
> 
> while (not all insns scheduled)
>   add insns with no outstanding dependencies to the ready queue
>   sort the ready queue
>   while (ready list is not empty && target can issue more insns)
>     issue an insn off the ready queue, remove dependencies on the issued insn
> 
> So, given insn1 which feeds insn2 we will never issue insn1 & insn2 in the
> same cycle.

I noticed this the other day in a different context. 

Does it seem worthwhile to add some sort of target define to control
adding dependant insns to the ready queue in the same cycle?  It's
true that it doesn't matter to the vast majority of the processors
we support, but I'm thinking of VLIW parts that do actually have
write-after-read conflicts within a cycle.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found]                           ` < 19990319173226.C14722@cygnus.com >
@ 1999-03-20  2:08                             ` Jeffrey A Law
       [not found]                               ` < 1843.921916006@upchuck >
  1999-03-31 23:46                               ` Jeffrey A Law
  0 siblings, 2 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-20  2:08 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jerry Quinn, egcs

  In message < 19990319173226.C14722@cygnus.com >you write:
  > Does it seem worthwhile to add some sort of target define to control
  > adding dependant insns to the ready queue in the same cycle?  It's
  > true that it doesn't matter to the vast majority of the processors
  > we support, but I'm thinking of VLIW parts that do actually have
  > write-after-read conflicts within a cycle.
That's basically what I've had in mind.  In addition to checking the
target macro, I think we'd want to check the ready delay and only do
this if it's zero.

I hadn't even thought about the VLIW issues, but yea, it seems like we'd
want to have the same kind of capability for VLIW targets.

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found]                               ` < 1843.921916006@upchuck >
@ 1999-03-20 10:43                                 ` Richard Henderson
  1999-03-23 14:13                                   ` Jerry Quinn
  1999-03-31 23:46                                   ` Richard Henderson
  0 siblings, 2 replies; 48+ messages in thread
From: Richard Henderson @ 1999-03-20 10:43 UTC (permalink / raw)
  To: law; +Cc: Jerry Quinn, egcs

On Sat, Mar 20, 1999 at 12:46:46AM -0700, Jeffrey A Law wrote:
> That's basically what I've had in mind.  In addition to checking the
> target macro, I think we'd want to check the ready delay and only do
> this if it's zero.

Hum.  I suppose that's resonable.  We would even not need a target
macro then -- just make sure that all sane targets ADJUST_COST 
anti-dependancies to zero.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-20 10:43                                 ` Richard Henderson
@ 1999-03-23 14:13                                   ` Jerry Quinn
       [not found]                                     ` < 36F811ED.32912006@americasm01.nt.com >
  1999-03-31 23:46                                     ` Jerry Quinn
  1999-03-31 23:46                                   ` Richard Henderson
  1 sibling, 2 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-23 14:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: law, egcs

Richard Henderson wrote:
> 
> On Sat, Mar 20, 1999 at 12:46:46AM -0700, Jeffrey A Law wrote:
> > That's basically what I've had in mind.  In addition to checking the
> > target macro, I think we'd want to check the ready delay and only do
> > this if it's zero.
> 
> Hum.  I suppose that's resonable.  We would even not need a target
> macro then -- just make sure that all sane targets ADJUST_COST
> anti-dependancies to zero.

I started trying to look at the haifa_sched.c file.  I can't really see
where insns dependent on the current one in the ready queue are
removed.  I'm looking in the loop on ready[] in schedule_block().  My
guess is that somehow schedule_insn does this, but I'm not sure.

My first thought is that instead of removing dependent instructions from
ready to insn_queue, move them to a temporary queue when the delay
(cost?) is 0.  Then, if the ready queue is empty but can_issue_more is
still live, pull one insn from the temp queue.  Repeat this second loop
until can_issue_more is empty or the temp queue is empty.  The return
everything else back to the insn_queue.

I may just be babbling.  Does this make any sense?  How is the ready
queue sorted?


-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found]                                     ` < 36F811ED.32912006@americasm01.nt.com >
@ 1999-03-24  1:30                                       ` Jeffrey A Law
  1999-03-24 14:44                                         ` Jerry Quinn
  1999-03-31 23:46                                         ` Jeffrey A Law
  0 siblings, 2 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-24  1:30 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: Richard Henderson, egcs

  In message < 36F811ED.32912006@americasm01.nt.com >you write:
  > I started trying to look at the haifa_sched.c file.  I can't really see
  > where insns dependent on the current one in the ready queue are
  > removed.  I'm looking in the loop on ready[] in schedule_block().  My
  > guess is that somehow schedule_insn does this, but I'm not sure.
Yes.

schedule_insn walks over the INSN_DEPEND list and decrements the dependency
count for all the instructions which are dependent on INSN.  If the dependency
count goes to zero, then the dependent insn is added to the ready queue.

  for (link = INSN_DEPEND (insn); link != 0; link = XEXP (link, 1))
    {
      rtx next = XEXP (link, 0);
      int cost = insn_cost (insn, link, next);

      INSN_TICK (next) = MAX (INSN_TICK (next), clock + cost);

      if ((INSN_DEP_COUNT (next) -= 1) == 0)
	{
[ ... ]
          /* Adjust the priority of NEXT and either put it on the ready
             list or queue it.  */
          adjust_priority (next);
          if (effective_cost <= 1)
            ready[n_ready++] = next;
          else
            queue_insn (next, effective_cost);
	}

  > My first thought is that instead of removing dependent instructions from
  > ready to insn_queue, move them to a temporary queue when the delay
  > (cost?) is 0.  Then, if the ready queue is empty but can_issue_more is
  > still live, pull one insn from the temp queue.  Repeat this second loop
  > until can_issue_more is empty or the temp queue is empty.  The return
  > everything else back to the insn_queue.
I think we'd be better off going ahead and adding stuff to the ready queue,
even if there are some insns in the ready queue.  The more insns we expose to
the scheduler as ready, the better.

Consider targets which allow sethi/lo_sum instructions (which have a
dependency) to issue together (hypersparc & PA8000 come to mind).  If we
have such a pair, we want to go ahead and issue them together, even if there
are other insns in the ready queue.  We know they will fire together and it
also minimizes the lifetime of the temporary holding the output form the sethi
instrution (which is important on the PA since there's only one register which
can hold the value from a sethi).


  > I may just be babbling.  Does this make any sense?  How is the ready
  > queue sorted?
SCHED_SORT, then MD_SCHED_REORDER

The generic sorting algorithm is in rank_for_schedule.  targets can override
the default sort via MD_SCHED_REORDER.

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-24  1:30                                       ` Jeffrey A Law
@ 1999-03-24 14:44                                         ` Jerry Quinn
       [not found]                                           ` < 36F96A7D.F0DBDF47@americasm01.nt.com >
  1999-03-31 23:46                                           ` Jerry Quinn
  1999-03-31 23:46                                         ` Jeffrey A Law
  1 sibling, 2 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-24 14:44 UTC (permalink / raw)
  To: law; +Cc: Richard Henderson, egcs

Jeffrey A Law wrote:
> 
>   In message < 36F811ED.32912006@americasm01.nt.com >you write:
>   > I started trying to look at the haifa_sched.c file.  I can't really see
>   > where insns dependent on the current one in the ready queue are
>   > removed.  I'm looking in the loop on ready[] in schedule_block().  My
>   > guess is that somehow schedule_insn does this, but I'm not sure.
> Yes.
> 
> schedule_insn walks over the INSN_DEPEND list and decrements the dependency
> count for all the instructions which are dependent on INSN.  If the dependency
> count goes to zero, then the dependent insn is added to the ready queue.
> 
>   for (link = INSN_DEPEND (insn); link != 0; link = XEXP (link, 1))
>     {
>       rtx next = XEXP (link, 0);
>       int cost = insn_cost (insn, link, next);
> 
>       INSN_TICK (next) = MAX (INSN_TICK (next), clock + cost);
> 
>       if ((INSN_DEP_COUNT (next) -= 1) == 0)
>         {
> [ ... ]
>           /* Adjust the priority of NEXT and either put it on the ready
>              list or queue it.  */
>           adjust_priority (next);
>           if (effective_cost <= 1)
>             ready[n_ready++] = next;
>           else
>             queue_insn (next, effective_cost);
>         }

So the ready list for a cycle starts out with insns with no
dependencies.  Then when we pick an insn off the ready list, it's placed
into the scheduled chain and schedule_insn is called.

OK, now I'm confused, because in my head it looks like the code should
already do what you want.  schedule_insn is called as soon as we
schedule an insn from the ready list.  So, if we have a dependent insn,
such as the lo_sum following a sethi, the dependency would now be
reduced to 0, and it becomes eligible for the lower part of the code. 
So as long as effective_cost is OK, it would be added to the ready
queue.  INSN_DEPEND is the list of insns that depend on the one being
scheduled, right?

Since this doesn't happen, I'm obviously missing something.

If the alu has 0 delay, insn_cost would return 0?  Then, INSN_TICK would
be unchanged.  And effective_cost would end up 0, causing the dependent
insn to be placed into ready.

I'm babbling again and confused.  Enough for today :-)

Jerry


-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found]                                           ` < 36F96A7D.F0DBDF47@americasm01.nt.com >
@ 1999-03-25  1:01                                             ` Jeffrey A Law
       [not found]                                               ` < 4324.922351870@upchuck >
  1999-03-31 23:46                                               ` Jeffrey A Law
  0 siblings, 2 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-25  1:01 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: Richard Henderson, egcs

  In message < 36F96A7D.F0DBDF47@americasm01.nt.com >you write:
  > So the ready list for a cycle starts out with insns with no
  > dependencies.
Right.  The only insns on the ready list should have had all their dependencies
resolved already.

  > Then when we pick an insn off the ready list, it's placed
  > into the scheduled chain and schedule_insn is called.
Yes.


  > OK, now I'm confused, because in my head it looks like the code should
  > already do what you want.
Hmmm, you're right.  Hmmm, now I'm not sure why I saw the undesired behavior.

  > If the alu has 0 delay, insn_cost would return 0?  Then, INSN_TICK would
  > be unchanged.  And effective_cost would end up 0, causing the dependent
  > insn to be placed into ready.
Maybe that was the problem -- maybe I had a ready delay of 1 cycle or
something like that.

I agree the code should do what we want.  Maybe we need to tweak the ready
delay to be zero for the cases where we want to issue a dependent insn in the
same cycle.

Anyway, here's the testcase.  Look at the .sched dump and you'll see that
the two insns which compute the address of the global variable are issued
in different cycles.

It doesn't make a difference in this example, but does in some more complex
code I looked at for the PA8000.

int a;

int *
blah ()
{
return &a;
}

;;   ==================== scheduling visualization for block 0
;;   clock     pa8000alu                          pa8000alu                          no-unit
;;   =====     ==============================     ============================
;;   1         7    r95=high(`a')                 ----------------------------
;;   2         8    %r28=r95+low(`a')             ----------------------------
;;   3         ------------------------------     ----------------------------

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
       [not found]                                               ` < 4324.922351870@upchuck >
@ 1999-03-25 13:56                                                 ` Richard Henderson
  1999-03-25 15:10                                                   ` Richard Henderson
  1999-03-31 23:46                                                   ` Richard Henderson
  0 siblings, 2 replies; 48+ messages in thread
From: Richard Henderson @ 1999-03-25 13:56 UTC (permalink / raw)
  To: law, Jerry Quinn; +Cc: egcs

On Thu, Mar 25, 1999 at 01:51:10AM -0700, Jeffrey A Law wrote:
> Hmmm, you're right.  Hmmm, now I'm not sure why I saw the undesired behavior.

The loop in schedule_block that calls schedule_insn is careful
not to touch any insns that were put on the ready queue after
we began issuing insns for that cycle.

The following change seems to do the right thing for Alpha EV5
wrt compare+cmove.

Comments?


r~


	* haifa-sched.c (insn_cost): LINK_COST_FREE means cost 0, not 1.
	(schedule_insn): Only put insns directly on the ready queue that
	have cost zero.
	(schedule_block): Continue processing the ready list until there
	are no more instructions left to issue.

Index: haifa-sched.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/haifa-sched.c,v
retrieving revision 1.85
diff -c -p -d -r1.85 haifa-sched.c
*** haifa-sched.c	1999/03/13 17:38:17	1.85
--- haifa-sched.c	1999/03/25 21:51:29
*************** insn_cost (insn, link, used)
*** 3146,3160 ****
       and LINK_COST_ZERO.  */
  
    if (LINK_COST_FREE (link))
!     cost = 1;
  #ifdef ADJUST_COST
    else if (!LINK_COST_ZERO (link))
      {
        int ncost = cost;
  
        ADJUST_COST (used, link, insn, ncost);
!       if (ncost <= 1)
! 	LINK_COST_FREE (link) = ncost = 1;
        if (cost == ncost)
  	LINK_COST_ZERO (link) = 1;
        cost = ncost;
--- 3146,3163 ----
       and LINK_COST_ZERO.  */
  
    if (LINK_COST_FREE (link))
!     cost = 0;
  #ifdef ADJUST_COST
    else if (!LINK_COST_ZERO (link))
      {
        int ncost = cost;
  
        ADJUST_COST (used, link, insn, ncost);
!       if (ncost < 1)
! 	{
! 	  LINK_COST_FREE (link) = 1;
! 	  ncost = 0;
! 	}
        if (cost == ncost)
  	LINK_COST_ZERO (link) = 1;
        cost = ncost;
*************** schedule_insn (insn, ready, n_ready, clo
*** 4453,4459 ****
  	  /* Adjust the priority of NEXT and either put it on the ready
  	     list or queue it.  */
  	  adjust_priority (next);
! 	  if (effective_cost <= 1)
  	    ready[n_ready++] = next;
  	  else
  	    queue_insn (next, effective_cost);
--- 4456,4462 ----
  	  /* Adjust the priority of NEXT and either put it on the ready
  	     list or queue it.  */
  	  adjust_priority (next);
! 	  if (effective_cost < 1)
  	    ready[n_ready++] = next;
  	  else
  	    queue_insn (next, effective_cost);
*************** schedule_block (bb, rgn_n_insns)
*** 6921,6940 ****
  	  debug_ready_list (ready, n_ready);
  	}
  
!       /* Issue insns from ready list.
!          It is important to count down from n_ready, because n_ready may change
!          as insns are issued.  */
        can_issue_more = issue_rate;
!       for (i = n_ready - 1; i >= 0 && can_issue_more; i--)
  	{
! 	  rtx insn = ready[i];
  	  int cost = actual_hazard (insn_unit (insn), insn, clock_var, 0);
  
  	  if (cost > 1)
! 	    {
! 	      queue_insn (insn, cost);
! 	      ready[i] = ready[--n_ready];	/* remove insn from ready list */
! 	    }
  	  else if (cost == 0)
  	    {
  	      /* an interblock motion? */
--- 6924,6939 ----
  	  debug_ready_list (ready, n_ready);
  	}
  
!       /* Issue insns from ready list.  */
        can_issue_more = issue_rate;
!       while (n_ready != 0 && can_issue_more)
  	{
! 	  /* Select and remove the insn from the ready list.  */
! 	  rtx insn = ready[--n_ready];
  	  int cost = actual_hazard (insn_unit (insn), insn, clock_var, 0);
  
  	  if (cost > 1)
! 	    queue_insn (insn, cost);
  	  else if (cost == 0)
  	    {
  	      /* an interblock motion? */
*************** schedule_block (bb, rgn_n_insns)
*** 7010,7018 ****
  #endif
  
  	      n_ready = schedule_insn (insn, ready, n_ready, clock_var);
- 
- 	      /* remove insn from ready list */
- 	      ready[i] = ready[--n_ready];
  
  	      /* close this block after scheduling its jump */
  	      if (GET_CODE (last_scheduled_insn) == JUMP_INSN)
--- 7009,7014 ----

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-25 13:56                                                 ` Richard Henderson
@ 1999-03-25 15:10                                                   ` Richard Henderson
  1999-03-26 10:50                                                     ` Jerry Quinn
                                                                       ` (3 more replies)
  1999-03-31 23:46                                                   ` Richard Henderson
  1 sibling, 4 replies; 48+ messages in thread
From: Richard Henderson @ 1999-03-25 15:10 UTC (permalink / raw)
  To: law, Jerry Quinn; +Cc: egcs

On Thu, Mar 25, 1999 at 01:56:39PM -0800, Richard Henderson wrote:
> The following change seems to do the right thing for Alpha EV5
> wrt compare+cmove.

The previous patch wouldn't bootstrap the compiler.  I goofed on
stepping through the ready list insns.


r~



Index: haifa-sched.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/haifa-sched.c,v
retrieving revision 1.85
diff -c -p -d -r1.85 haifa-sched.c
*** haifa-sched.c	1999/03/13 17:38:17	1.85
--- haifa-sched.c	1999/03/25 23:08:02
*************** insn_cost (insn, link, used)
*** 3146,3160 ****
       and LINK_COST_ZERO.  */
  
    if (LINK_COST_FREE (link))
!     cost = 1;
  #ifdef ADJUST_COST
    else if (!LINK_COST_ZERO (link))
      {
        int ncost = cost;
  
        ADJUST_COST (used, link, insn, ncost);
!       if (ncost <= 1)
! 	LINK_COST_FREE (link) = ncost = 1;
        if (cost == ncost)
  	LINK_COST_ZERO (link) = 1;
        cost = ncost;
--- 3146,3163 ----
       and LINK_COST_ZERO.  */
  
    if (LINK_COST_FREE (link))
!     cost = 0;
  #ifdef ADJUST_COST
    else if (!LINK_COST_ZERO (link))
      {
        int ncost = cost;
  
        ADJUST_COST (used, link, insn, ncost);
!       if (ncost < 1)
! 	{
! 	  LINK_COST_FREE (link) = 1;
! 	  ncost = 0;
! 	}
        if (cost == ncost)
  	LINK_COST_ZERO (link) = 1;
        cost = ncost;
*************** schedule_insn (insn, ready, n_ready, clo
*** 4444,4450 ****
  	      if (current_nr_blocks > 1 && INSN_BB (next) != target_bb)
  		fprintf (dump, "/b%d ", INSN_BLOCK (next));
  
! 	      if (effective_cost <= 1)
  		fprintf (dump, "into ready\n");
  	      else
  		fprintf (dump, "into queue with cost=%d\n", effective_cost);
--- 4447,4453 ----
  	      if (current_nr_blocks > 1 && INSN_BB (next) != target_bb)
  		fprintf (dump, "/b%d ", INSN_BLOCK (next));
  
! 	      if (effective_cost < 1)
  		fprintf (dump, "into ready\n");
  	      else
  		fprintf (dump, "into queue with cost=%d\n", effective_cost);
*************** schedule_insn (insn, ready, n_ready, clo
*** 4453,4459 ****
  	  /* Adjust the priority of NEXT and either put it on the ready
  	     list or queue it.  */
  	  adjust_priority (next);
! 	  if (effective_cost <= 1)
  	    ready[n_ready++] = next;
  	  else
  	    queue_insn (next, effective_cost);
--- 4456,4462 ----
  	  /* Adjust the priority of NEXT and either put it on the ready
  	     list or queue it.  */
  	  adjust_priority (next);
! 	  if (effective_cost < 1)
  	    ready[n_ready++] = next;
  	  else
  	    queue_insn (next, effective_cost);
*************** schedule_block (bb, rgn_n_insns)
*** 6921,6941 ****
  	  debug_ready_list (ready, n_ready);
  	}
  
!       /* Issue insns from ready list.
!          It is important to count down from n_ready, because n_ready may change
!          as insns are issued.  */
        can_issue_more = issue_rate;
!       for (i = n_ready - 1; i >= 0 && can_issue_more; i--)
  	{
! 	  rtx insn = ready[i];
  	  int cost = actual_hazard (insn_unit (insn), insn, clock_var, 0);
  
! 	  if (cost > 1)
! 	    {
! 	      queue_insn (insn, cost);
! 	      ready[i] = ready[--n_ready];	/* remove insn from ready list */
! 	    }
! 	  else if (cost == 0)
  	    {
  	      /* an interblock motion? */
  	      if (INSN_BB (insn) != target_bb)
--- 6924,6940 ----
  	  debug_ready_list (ready, n_ready);
  	}
  
!       /* Issue insns from ready list.  */
        can_issue_more = issue_rate;
!       while (n_ready != 0 && can_issue_more)
  	{
! 	  /* Select and remove the insn from the ready list.  */
! 	  rtx insn = ready[--n_ready];
  	  int cost = actual_hazard (insn_unit (insn), insn, clock_var, 0);
  
! 	  if (cost >= 1)
! 	    queue_insn (insn, cost);
! 	  else
  	    {
  	      /* an interblock motion? */
  	      if (INSN_BB (insn) != target_bb)
*************** schedule_block (bb, rgn_n_insns)
*** 7010,7018 ****
  #endif
  
  	      n_ready = schedule_insn (insn, ready, n_ready, clock_var);
- 
- 	      /* remove insn from ready list */
- 	      ready[i] = ready[--n_ready];
  
  	      /* close this block after scheduling its jump */
  	      if (GET_CODE (last_scheduled_insn) == JUMP_INSN)
--- 7009,7014 ----

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-25 15:10                                                   ` Richard Henderson
@ 1999-03-26 10:50                                                     ` Jerry Quinn
  1999-03-26 11:04                                                       ` Richard Henderson
  1999-03-31 23:46                                                       ` Jerry Quinn
  1999-03-26 14:07                                                     ` Jerry Quinn
                                                                       ` (2 subsequent siblings)
  3 siblings, 2 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-26 10:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: law, egcs

At line 6947, there's still a dependency on `i'.  Should this just be
removed, or what?  I know less than nothing about speculative motion:

		      if (!check_live (insn, INSN_BB (insn)))
			{
			  /* speculative motion, live check failed, remove
			     insn from ready list */
			  ready[i] = ready[--n_ready];
			  continue;
			}
		      update_live (insn, INSN_BB (insn));


Jerry

-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-26 10:50                                                     ` Jerry Quinn
@ 1999-03-26 11:04                                                       ` Richard Henderson
  1999-03-31 23:46                                                         ` Richard Henderson
  1999-03-31 23:46                                                       ` Jerry Quinn
  1 sibling, 1 reply; 48+ messages in thread
From: Richard Henderson @ 1999-03-26 11:04 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: law, egcs

On Fri, Mar 26, 1999 at 01:48:56PM -0500, Jerry Quinn wrote:
> At line 6947, there's still a dependency on `i'.  Should this just be
> removed, or what?  I know less than nothing about speculative motion:
> 
> 		      if (!check_live (insn, INSN_BB (insn)))
> 			{
> 			  /* speculative motion, live check failed, remove
> 			     insn from ready list */
> 			  ready[i] = ready[--n_ready];

Yeah, that line should have vanished.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-25 15:10                                                   ` Richard Henderson
  1999-03-26 10:50                                                     ` Jerry Quinn
@ 1999-03-26 14:07                                                     ` Jerry Quinn
  1999-03-27 16:04                                                       ` Jeffrey A Law
  1999-03-31 23:46                                                       ` Jerry Quinn
  1999-03-31 23:46                                                     ` Richard Henderson
  1999-04-02 11:53                                                     ` Jeffrey A Law
  3 siblings, 2 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-26 14:07 UTC (permalink / raw)
  To: Richard Henderson; +Cc: law, egcs

So after applying the adjust_cost to 0 for pa8000, I didn't see any
identifiable change in performance.  I bootstrapped with the change in
place, but no difference in the speed of gcc, or in my code.

Would it be a good idea to try setting load/store delay to 0 as well?

Jerry

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-26 14:07                                                     ` Jerry Quinn
@ 1999-03-27 16:04                                                       ` Jeffrey A Law
  1999-03-31 23:46                                                         ` Jeffrey A Law
  1999-03-31 23:46                                                       ` Jerry Quinn
  1 sibling, 1 reply; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-27 16:04 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: Richard Henderson, egcs

  In message < 36FC0238.64F67851@americasm01.nt.com >you write:
  > So after applying the adjust_cost to 0 for pa8000, I didn't see any
  > identifiable change in performance.  I bootstrapped with the change in
  > place, but no difference in the speed of gcc, or in my code.
I'm hoping to get in a spec run with that patch this weekend.  I'm also
hoping to get in a spec run on an 7100lc with your other patch this weekend.

  > Would it be a good idea to try setting load/store delay to 0 as well?
I'm not sure yet.  The information I've got isn't real clear on how best to
deal with memory latency.  With the lack of solid info on this, we might be
best off with a 100% pragmatic approach -- try both and if one wins, select
it.  (and hope that it's not performance neutral :-)

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-20 10:43                                 ` Richard Henderson
  1999-03-23 14:13                                   ` Jerry Quinn
@ 1999-03-31 23:46                                   ` Richard Henderson
  1 sibling, 0 replies; 48+ messages in thread
From: Richard Henderson @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law; +Cc: Jerry Quinn, egcs

On Sat, Mar 20, 1999 at 12:46:46AM -0700, Jeffrey A Law wrote:
> That's basically what I've had in mind.  In addition to checking the
> target macro, I think we'd want to check the ready delay and only do
> this if it's zero.

Hum.  I suppose that's resonable.  We would even not need a target
macro then -- just make sure that all sane targets ADJUST_COST 
anti-dependancies to zero.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-24  1:30                                       ` Jeffrey A Law
  1999-03-24 14:44                                         ` Jerry Quinn
@ 1999-03-31 23:46                                         ` Jeffrey A Law
  1 sibling, 0 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: Richard Henderson, egcs

  In message < 36F811ED.32912006@americasm01.nt.com >you write:
  > I started trying to look at the haifa_sched.c file.  I can't really see
  > where insns dependent on the current one in the ready queue are
  > removed.  I'm looking in the loop on ready[] in schedule_block().  My
  > guess is that somehow schedule_insn does this, but I'm not sure.
Yes.

schedule_insn walks over the INSN_DEPEND list and decrements the dependency
count for all the instructions which are dependent on INSN.  If the dependency
count goes to zero, then the dependent insn is added to the ready queue.

  for (link = INSN_DEPEND (insn); link != 0; link = XEXP (link, 1))
    {
      rtx next = XEXP (link, 0);
      int cost = insn_cost (insn, link, next);

      INSN_TICK (next) = MAX (INSN_TICK (next), clock + cost);

      if ((INSN_DEP_COUNT (next) -= 1) == 0)
	{
[ ... ]
          /* Adjust the priority of NEXT and either put it on the ready
             list or queue it.  */
          adjust_priority (next);
          if (effective_cost <= 1)
            ready[n_ready++] = next;
          else
            queue_insn (next, effective_cost);
	}

  > My first thought is that instead of removing dependent instructions from
  > ready to insn_queue, move them to a temporary queue when the delay
  > (cost?) is 0.  Then, if the ready queue is empty but can_issue_more is
  > still live, pull one insn from the temp queue.  Repeat this second loop
  > until can_issue_more is empty or the temp queue is empty.  The return
  > everything else back to the insn_queue.
I think we'd be better off going ahead and adding stuff to the ready queue,
even if there are some insns in the ready queue.  The more insns we expose to
the scheduler as ready, the better.

Consider targets which allow sethi/lo_sum instructions (which have a
dependency) to issue together (hypersparc & PA8000 come to mind).  If we
have such a pair, we want to go ahead and issue them together, even if there
are other insns in the ready queue.  We know they will fire together and it
also minimizes the lifetime of the temporary holding the output form the sethi
instrution (which is important on the PA since there's only one register which
can hold the value from a sethi).


  > I may just be babbling.  Does this make any sense?  How is the ready
  > queue sorted?
SCHED_SORT, then MD_SCHED_REORDER

The generic sorting algorithm is in rank_for_schedule.  targets can override
the default sort via MD_SCHED_REORDER.

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-26 10:50                                                     ` Jerry Quinn
  1999-03-26 11:04                                                       ` Richard Henderson
@ 1999-03-31 23:46                                                       ` Jerry Quinn
  1 sibling, 0 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: law, egcs

At line 6947, there's still a dependency on `i'.  Should this just be
removed, or what?  I know less than nothing about speculative motion:

		      if (!check_live (insn, INSN_BB (insn)))
			{
			  /* speculative motion, live check failed, remove
			     insn from ready list */
			  ready[i] = ready[--n_ready];
			  continue;
			}
		      update_live (insn, INSN_BB (insn));


Jerry

-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Fwd: Questions on PA machine description?
  1999-03-08  6:49 Fwd: Questions on PA machine description? Jerry Quinn
       [not found] ` < 36E3E35A.4BB420BE@americasm01.nt.com >
@ 1999-03-31 23:46 ` Jerry Quinn
  1 sibling, 0 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-31 23:46 UTC (permalink / raw)
  To: egcs

Oops - got the address wrong the first time

> From: "Jerry Quinn" <jquinn@nortelnetworks.com>
> To: egcs@egcs.cyngus.com
> Subject: Questions on PA machine description?
> Date: Fri, 5 Mar 1999 16:43:35 +0000
> X-Orig: <jquinn@americasm01.nt.com>
> 
> Can someone help me understand function unit descriptions a bit better?
> 
> I've been playing with machine function unit desriptions recently.  Looking at
> the pa7100LC description for ALU, I'm a bit confused.  The following is the
> description:
> 
> ;; We have two basic ALUs.
> (define_function_unit "pa7100LCalu" 2 2
>   (and
>     (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,
> fpdivdbl,fpsqrtdbl,load,fpload,store,fpstore,shift,nullshift")
>    (eq_attr "cpu" "7100LC,7200")) 1 1)
> 
> It says there are 2 ALU's.  The READY=1 and DELAY=1 appear to me to be a gate
> so that each unit can be issued 1 insn per cycle.  What is confusing is the
> SIMULTANEITY of 2.  The documentation claims that this means that each unit
> can have two active insns issued at a time.  But this isn't true.  A total of
> two insns can be issued by using both units.  Is the documentation wrong, the
> function unit description wrong, or is it a convenient means of accomplishing
> something I don't understand?
> 
> Also, my reading of the definition of SIMULTANEITY seems to indicate that it
> should be 1 for the pa7100LCfp_div function unit since the div/sqrt portion of
> the FPU can only execute one insn at a time.  Is this correct or wrong?
> 
> Second question:
> 
> There is a comment that shifts and memory ops execute in one of the ALU's but
> that that can't be modeled.  Can someone explain what that means.  What PA
> descriptions I've found so far (I haven't seen many) seem to say that shift
> and merge are ALU ops just like integer add so the comment doesn't seem
> justified, but it's there for a reason.  Also, how do memory ops fit into
> this?
> 
> Thanks
> 
> - --
> Jerry Quinn                             Tel: (514) 761-8737
> jquinn@nortelnetworks.com               Fax: (514) 761-8505
> Speech Recognition Research
>


To : "Quinn, Jerry (J.) [EXCHANGE:MTL:6X17:BNR]" <jquinn at americasm01 dot nt dot com>
Subject : Delivery Report (failure) for egcs@egcs.cyngus.com
From : "Postmaster, Nortel (B.) [BNRUNIX:FITZ:4C35:NT]"      <postmast at bcarhe66 dot ca dot nortel dot com> 
Date : Sat, 6 Mar 1999 13:06:32 -0500
Message-Type : Delivery Report
X400-MTS-Identifier : [/PRMD=BNR/ADMD=TELECOM.CANADA/C=CA/;smtpott1.n.912:06.03.99.17.02.29]


------------------------------ Start of body part 1

This report relates to your message: 
Subject: Questions on PA machine description?,
  To: egcs@egcs.cyngus.com

        of Fri, 5 Mar 1999 11:44:02 -0500

Your message was not delivered to   egcs@egcs.cyngus.com
        for the following reason:
        Message timed out 

***** The following information is directed towards the local administrator
***** and is not intended for the end user
* 
* DR generated by: mta smtpott1.nortel.com
*         in /PRMD=BNR/ADMD=TELECOM.CANADA/C=CA/
*         at Sat, 6 Mar 1999 12:02:29 -0500
*
* Converted to RFC 822 at smtpott1.nortel.com
*         at Sat, 6 Mar 1999 13:06:32 -0500
*
* Delivery Report Contents:
*
* Subject-Submission-Identifier: [/PRMD=BNR/ADMD=TELECOM.CANADA/C=CA/;smtpott1.n.316:05.03.99.16.44.02]
* Content-Identifier: Questions on ...
* Original-Encoded-Information-Types: ia5-text
* Subject-Intermediate-Trace-Information:  /PRMD=BNR/ADMD=TELECOM.CANADA/C=CA/arrival Fri, 5 Mar 1999 11:44:02 -0500 action Relayed
* Content-Correlator: Subject: Questions on PA machine description?,
*                   To: egcs@egcs.cyngus.com
* Recipient-Info: egcs@egcs.cyngus.com,
*         /RFC-822=egcs(a)egcs.cyngus.com/PRMD=BNR/ADMD=TELECOM.CANADA/C=CA/;
*         FAILURE reason Unable-To-Transfer (1);
*         diagnostic Maximum-Time-Expired (5);
*         last trace (ia5-text) Fri, 5 Mar 1999 11:44:02 -0500;
*         converted eits ia5-text;
****** End of administration information 

------------------------------ Start of forwarded message 1

Received: from zcars01t by smtpott1.nortel.ca; Fri, 5 Mar 1999 11:44:02 -0500
Received: from zcard00n.ca.nortel.com by zcars01t;
          Fri, 5 Mar 1999 11:43:33 -0500
Received: from zmtlde5a.ca.nortel.com ([47.64.13.90]) by zcard00n.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2232.9) 
          id F5RNWKT0; Fri, 5 Mar 1999 11:43:33 -0500
Received: from wmtl249c.ca.nortel.com by zmtlde5a.ca.nortel.com 
          with SMTP (Microsoft Exchange Internet Mail Service Version 5.0.1460.8) 
          id FTGSD1LK; Fri, 5 Mar 1999 11:43:34 -0500
From: "Jerry Quinn" <jquinn@nortelnetworks.com>
To: egcs@egcs.cyngus.com
Subject: Questions on PA machine description?
Date: Fri, 5 Mar 1999 16:43:35 +0000
X-Orig: <jquinn@americasm01.nt.com>

Can someone help me understand function unit descriptions a bit better?

I've been playing with machine function unit desriptions recently.  Looking at 
the pa7100LC description for ALU, I'm a bit confused.  The following is the
description: 

;; We have two basic ALUs.
(define_function_unit "pa7100LCalu" 2 2
  (and
    (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,
fpdivdbl,fpsqrtdbl,load,fpload,store,fpstore,shift,nullshift")
   (eq_attr "cpu" "7100LC,7200")) 1 1)


It says there are 2 ALU's.  The READY=1 and DELAY=1 appear to me to be a gate
so that each unit can be issued 1 insn per cycle.  What is confusing is the
SIMULTANEITY of 2.  The documentation claims that this means that each unit
can have two active insns issued at a time.  But this isn't true.  A total of
two insns can be issued by using both units.  Is the documentation wrong, the
function unit description wrong, or is it a convenient means of accomplishing
something I don't understand?

Also, my reading of the definition of SIMULTANEITY seems to indicate that it
should be 1 for the pa7100LCfp_div function unit since the div/sqrt portion of 
the FPU can only execute one insn at a time.  Is this correct or wrong?

Second question:

There is a comment that shifts and memory ops execute in one of the ALU's but
that that can't be modeled.  Can someone explain what that means.  What PA
descriptions I've found so far (I haven't seen many) seem to say that shift
and merge are ALU ops just like integer add so the comment doesn't seem
justified, but it's there for a reason.  Also, how do memory ops fit into
this?

Thanks

- -- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research


------------------------------ End of forwarded message 1

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-08 20:48   ` Jeffrey A Law
  1999-03-09 11:35     ` Jerry Quinn
@ 1999-03-31 23:46     ` Jeffrey A Law
  1 sibling, 0 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: egcs

  In message < 36E3E35A.4BB420BE@americasm01.nt.com >you write:
  > > ;; We have two basic ALUs.
  > > (define_function_unit "pa7100LCalu" 2 2
  > >   (and
  > >     (eq_attr "type" "!fpcc,fpalu,fpmulsgl,fpmuldbl,fpdivsgl,fpsqrtsgl,
  > > fpdivdbl,fpsqrtdbl,load,fpload,store,fpstore,shift,nullshift")
  > >    (eq_attr "cpu" "7100LC,7200")) 1 1)
  > > 
  > > It says there are 2 ALU's.  The READY=1 and DELAY=1 appear to me to be a 
  > > gate so that each unit can be issued 1 insn per cycle.
Correct.


  > What is confusing is the SIMULTANEITY of 2.  The documentation claims that
  > this means that each unit can have two active insns issued at a time.  But
  > this isn't true.
Could well be an oversight on my part.  It's been a long time since I wrote
that stuff.


  > > Also, my reading of the definition of SIMULTANEITY seems to indicate
  > > that it should be 1 for the pa7100LCfp_div function unit since the
  > > div/sqrt portion of the FPU can only execute one insn at a time.  Is
  > > this correct or wrong?
I believe this is correct.  The simultaneity should be one.

  > > There is a comment that shifts and memory ops execute in one of the
  > > ALU's but that that can't be modeled.
That was true at one time.  Or more correctly it couldn't be easily modeled.


  > Can someone explain what that means.
Basically one of the ALUs is not complete and can only handle a subset of the
ALU instructions.

So, stuff like adds, subtracts, compares, etc can issue to either ALU, but
a memory load/store or a shift instruction can only issue to the first
alu.  Assuming I remember everything from the LC series correctly.

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-17 20:14               ` Jeffrey A Law
@ 1999-03-31 23:46                 ` Jeffrey A Law
  0 siblings, 0 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: egcs

BTW -- the other thing you might want to do is have adjust_cost do nothing
when scheduling for a PA8000 class machine.


jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-18 21:00                     ` Jeffrey A Law
       [not found]                       ` < 18375.921819620@hurl.cygnus.com >
@ 1999-03-31 23:46                       ` Jeffrey A Law
  1 sibling, 0 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: egcs

  In message < 36F17F8C.9FD937FA@americasm01.nt.com >you write:
  > The other message mentioned that eliminating autoincrement/autodecrement
  > instructions is a good thing.  Do these instructions have the same
  > problem as fmpyadd, i.e. grabbing multiple reorder slots and function
  > units?
Yes, they have the same problem as fmpyadd/fmpysub.  They also have the 
disadvantage that the autoinc addressing mode adds additional data depedencies
which can inhibit the amount of ILP found by the compiler and by the hardware.

  > > One thought would be to make the ready delay for alu instructions 0, then
  > > tweak haifa to add dependent instrutions to the ready queue immediately
  > > after it issues an insn with a ready delay of zero cycles.
  > 
  > What about making pa_adjust_cost set the cost of a data dependency to
  > 0?  The alpha port does this on the ev5.
Nope, it won't do what we want.  Look at the loop which issues insns from
the ready list in haifa-sched.c.

It has a structure like:

while (not all insns scheduled)
  add insns with no outstanding dependencies to the ready queue
  sort the ready queue
  while (ready list is not empty && target can issue more insns)
    issue an insn off the ready queue, remove dependencies on the issued insn

So, given insn1 which feeds insn2 we will never issue insn1 & insn2 in the
same cycle.

  > Why is this a good thing?  Won't an instruction that depends on another
  > one have to retire later than the other one?
No.  They can retire in the same cycle.  This is discussed in one of the
PA8000 optimization papers from HP.   The key is to remember that PA8000
machine is an out of execution machine.

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-27 16:04                                                       ` Jeffrey A Law
@ 1999-03-31 23:46                                                         ` Jeffrey A Law
  0 siblings, 0 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: Richard Henderson, egcs

  In message < 36FC0238.64F67851@americasm01.nt.com >you write:
  > So after applying the adjust_cost to 0 for pa8000, I didn't see any
  > identifiable change in performance.  I bootstrapped with the change in
  > place, but no difference in the speed of gcc, or in my code.
I'm hoping to get in a spec run with that patch this weekend.  I'm also
hoping to get in a spec run on an 7100lc with your other patch this weekend.

  > Would it be a good idea to try setting load/store delay to 0 as well?
I'm not sure yet.  The information I've got isn't real clear on how best to
deal with memory latency.  With the lack of solid info on this, we might be
best off with a 100% pragmatic approach -- try both and if one wins, select
it.  (and hope that it's not performance neutral :-)

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-26 11:04                                                       ` Richard Henderson
@ 1999-03-31 23:46                                                         ` Richard Henderson
  0 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: law, egcs

On Fri, Mar 26, 1999 at 01:48:56PM -0500, Jerry Quinn wrote:
> At line 6947, there's still a dependency on `i'.  Should this just be
> removed, or what?  I know less than nothing about speculative motion:
> 
> 		      if (!check_live (insn, INSN_BB (insn)))
> 			{
> 			  /* speculative motion, live check failed, remove
> 			     insn from ready list */
> 			  ready[i] = ready[--n_ready];

Yeah, that line should have vanished.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-18 14:35                 ` Jerry Quinn
       [not found]                   ` < 36F17F8C.9FD937FA@americasm01.nt.com >
@ 1999-03-31 23:46                   ` Jerry Quinn
  1 sibling, 0 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law; +Cc: egcs

Jeffrey A Law wrote:
> 
> I'd create one additional unit -- fmac for all the other fp computation
> insn to show the partial latency as recommended by HP.  Something like this:
> 
> (define_function_unit "pa8000fmac" 2 0
>   (and
>     (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl")
>     (eq_attr "cpu" "8000")) 2 1)
> 
> ie, there's two fmac units which are fully pipelined.   Results are available
> in 2 cycles.
> 
> I'm going to make those changes and install the patch.

There seems to be some delay going on :-)  I responded to your post on
egcs-patches before realizing you saw my revision of my original
scheduling.

The other message mentioned that eliminating autoincrement/autodecrement
instructions is a good thing.  Do these instructions have the same
problem as fmpyadd, i.e. grabbing multiple reorder slots and function
units?

> The other thing to think about is how to show that some instructions which
> are data dependent can/should issue in the same cycle.  ie, if an alu
> operation feeds another alu operation, then we should issue them in the
> same cycle.
> 
> One thought would be to make the ready delay for alu instructions 0, then
> tweak haifa to add dependent instrutions to the ready queue immediately
> after it issues an insn with a ready delay of zero cycles.

What about making pa_adjust_cost set the cost of a data dependency to
0?  The alpha port does this on the ev5.

Why is this a good thing?  Won't an instruction that depends on another
one have to retire later than the other one?

Jerry

-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-09 12:18         ` Jeffrey A Law
  1999-03-16 10:49           ` Jerry Quinn
@ 1999-03-31 23:46           ` Jeffrey A Law
  1 sibling, 0 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: egcs

  In message < 36E577DF.F825BAFB@americasm01.nt.com >you write:
  > Do you have a pointer to something that describes this better?  I've
  > been unable to find any info at all on the 7100LC and the 7300 is
  > described mainly as a 7100LC core.
Nope.  Sorry.  I know the non-symmetric nature of the ALUs on the 7100LC
series was discussed in a public forum years ago, but I don't remember
precisely where (MPR would be a good first bet).  The scheduling info in
gcc was derived from that public information.

I wouldn't be at all suprised if the 7200 and 7300 were basically 7100LC cores
with different memory latencies.


  > I wouldn't mind playing with the machine description some.  Also, do you
  > have any clue if the 8000 also has this same asymmetric ALU situation?
The 8000 does not suffer from this problem.  The ALUs are symmetric.  Not
that it matters since from a scheduling standpoint you (mostly) want to
ignore latency and schedule for reorder buffer retirement (2 memops, 2 nonmem
ops per cycle).

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-25 15:10                                                   ` Richard Henderson
  1999-03-26 10:50                                                     ` Jerry Quinn
  1999-03-26 14:07                                                     ` Jerry Quinn
@ 1999-03-31 23:46                                                     ` Richard Henderson
  1999-04-02 11:53                                                     ` Jeffrey A Law
  3 siblings, 0 replies; 48+ messages in thread
From: Richard Henderson @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law, Jerry Quinn; +Cc: egcs

On Thu, Mar 25, 1999 at 01:56:39PM -0800, Richard Henderson wrote:
> The following change seems to do the right thing for Alpha EV5
> wrt compare+cmove.

The previous patch wouldn't bootstrap the compiler.  I goofed on
stepping through the ready list insns.


r~



Index: haifa-sched.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/haifa-sched.c,v
retrieving revision 1.85
diff -c -p -d -r1.85 haifa-sched.c
*** haifa-sched.c	1999/03/13 17:38:17	1.85
--- haifa-sched.c	1999/03/25 23:08:02
*************** insn_cost (insn, link, used)
*** 3146,3160 ****
       and LINK_COST_ZERO.  */
  
    if (LINK_COST_FREE (link))
!     cost = 1;
  #ifdef ADJUST_COST
    else if (!LINK_COST_ZERO (link))
      {
        int ncost = cost;
  
        ADJUST_COST (used, link, insn, ncost);
!       if (ncost <= 1)
! 	LINK_COST_FREE (link) = ncost = 1;
        if (cost == ncost)
  	LINK_COST_ZERO (link) = 1;
        cost = ncost;
--- 3146,3163 ----
       and LINK_COST_ZERO.  */
  
    if (LINK_COST_FREE (link))
!     cost = 0;
  #ifdef ADJUST_COST
    else if (!LINK_COST_ZERO (link))
      {
        int ncost = cost;
  
        ADJUST_COST (used, link, insn, ncost);
!       if (ncost < 1)
! 	{
! 	  LINK_COST_FREE (link) = 1;
! 	  ncost = 0;
! 	}
        if (cost == ncost)
  	LINK_COST_ZERO (link) = 1;
        cost = ncost;
*************** schedule_insn (insn, ready, n_ready, clo
*** 4444,4450 ****
  	      if (current_nr_blocks > 1 && INSN_BB (next) != target_bb)
  		fprintf (dump, "/b%d ", INSN_BLOCK (next));
  
! 	      if (effective_cost <= 1)
  		fprintf (dump, "into ready\n");
  	      else
  		fprintf (dump, "into queue with cost=%d\n", effective_cost);
--- 4447,4453 ----
  	      if (current_nr_blocks > 1 && INSN_BB (next) != target_bb)
  		fprintf (dump, "/b%d ", INSN_BLOCK (next));
  
! 	      if (effective_cost < 1)
  		fprintf (dump, "into ready\n");
  	      else
  		fprintf (dump, "into queue with cost=%d\n", effective_cost);
*************** schedule_insn (insn, ready, n_ready, clo
*** 4453,4459 ****
  	  /* Adjust the priority of NEXT and either put it on the ready
  	     list or queue it.  */
  	  adjust_priority (next);
! 	  if (effective_cost <= 1)
  	    ready[n_ready++] = next;
  	  else
  	    queue_insn (next, effective_cost);
--- 4456,4462 ----
  	  /* Adjust the priority of NEXT and either put it on the ready
  	     list or queue it.  */
  	  adjust_priority (next);
! 	  if (effective_cost < 1)
  	    ready[n_ready++] = next;
  	  else
  	    queue_insn (next, effective_cost);
*************** schedule_block (bb, rgn_n_insns)
*** 6921,6941 ****
  	  debug_ready_list (ready, n_ready);
  	}
  
!       /* Issue insns from ready list.
!          It is important to count down from n_ready, because n_ready may change
!          as insns are issued.  */
        can_issue_more = issue_rate;
!       for (i = n_ready - 1; i >= 0 && can_issue_more; i--)
  	{
! 	  rtx insn = ready[i];
  	  int cost = actual_hazard (insn_unit (insn), insn, clock_var, 0);
  
! 	  if (cost > 1)
! 	    {
! 	      queue_insn (insn, cost);
! 	      ready[i] = ready[--n_ready];	/* remove insn from ready list */
! 	    }
! 	  else if (cost == 0)
  	    {
  	      /* an interblock motion? */
  	      if (INSN_BB (insn) != target_bb)
--- 6924,6940 ----
  	  debug_ready_list (ready, n_ready);
  	}
  
!       /* Issue insns from ready list.  */
        can_issue_more = issue_rate;
!       while (n_ready != 0 && can_issue_more)
  	{
! 	  /* Select and remove the insn from the ready list.  */
! 	  rtx insn = ready[--n_ready];
  	  int cost = actual_hazard (insn_unit (insn), insn, clock_var, 0);
  
! 	  if (cost >= 1)
! 	    queue_insn (insn, cost);
! 	  else
  	    {
  	      /* an interblock motion? */
  	      if (INSN_BB (insn) != target_bb)
*************** schedule_block (bb, rgn_n_insns)
*** 7010,7018 ****
  #endif
  
  	      n_ready = schedule_insn (insn, ready, n_ready, clock_var);
- 
- 	      /* remove insn from ready list */
- 	      ready[i] = ready[--n_ready];
  
  	      /* close this block after scheduling its jump */
  	      if (GET_CODE (last_scheduled_insn) == JUMP_INSN)
--- 7009,7014 ----

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-23 14:13                                   ` Jerry Quinn
       [not found]                                     ` < 36F811ED.32912006@americasm01.nt.com >
@ 1999-03-31 23:46                                     ` Jerry Quinn
  1 sibling, 0 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: law, egcs

Richard Henderson wrote:
> 
> On Sat, Mar 20, 1999 at 12:46:46AM -0700, Jeffrey A Law wrote:
> > That's basically what I've had in mind.  In addition to checking the
> > target macro, I think we'd want to check the ready delay and only do
> > this if it's zero.
> 
> Hum.  I suppose that's resonable.  We would even not need a target
> macro then -- just make sure that all sane targets ADJUST_COST
> anti-dependancies to zero.

I started trying to look at the haifa_sched.c file.  I can't really see
where insns dependent on the current one in the ready queue are
removed.  I'm looking in the loop on ready[] in schedule_block().  My
guess is that somehow schedule_insn does this, but I'm not sure.

My first thought is that instead of removing dependent instructions from
ready to insn_queue, move them to a temporary queue when the delay
(cost?) is 0.  Then, if the ready queue is empty but can_issue_more is
still live, pull one insn from the temp queue.  Repeat this second loop
until can_issue_more is empty or the temp queue is empty.  The return
everything else back to the insn_queue.

I may just be babbling.  Does this make any sense?  How is the ready
queue sorted?


-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-25  1:01                                             ` Jeffrey A Law
       [not found]                                               ` < 4324.922351870@upchuck >
@ 1999-03-31 23:46                                               ` Jeffrey A Law
  1 sibling, 0 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: Richard Henderson, egcs

  In message < 36F96A7D.F0DBDF47@americasm01.nt.com >you write:
  > So the ready list for a cycle starts out with insns with no
  > dependencies.
Right.  The only insns on the ready list should have had all their dependencies
resolved already.

  > Then when we pick an insn off the ready list, it's placed
  > into the scheduled chain and schedule_insn is called.
Yes.


  > OK, now I'm confused, because in my head it looks like the code should
  > already do what you want.
Hmmm, you're right.  Hmmm, now I'm not sure why I saw the undesired behavior.

  > If the alu has 0 delay, insn_cost would return 0?  Then, INSN_TICK would
  > be unchanged.  And effective_cost would end up 0, causing the dependent
  > insn to be placed into ready.
Maybe that was the problem -- maybe I had a ready delay of 1 cycle or
something like that.

I agree the code should do what we want.  Maybe we need to tweak the ready
delay to be zero for the cases where we want to issue a dependent insn in the
same cycle.

Anyway, here's the testcase.  Look at the .sched dump and you'll see that
the two insns which compute the address of the global variable are issued
in different cycles.

It doesn't make a difference in this example, but does in some more complex
code I looked at for the PA8000.

int a;

int *
blah ()
{
return &a;
}

;;   ==================== scheduling visualization for block 0
;;   clock     pa8000alu                          pa8000alu                          no-unit
;;   =====     ==============================     ============================
;;   1         7    r95=high(`a')                 ----------------------------
;;   2         8    %r28=r95+low(`a')             ----------------------------
;;   3         ------------------------------     ----------------------------

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-17 20:12               ` Jeffrey A Law
  1999-03-18 14:35                 ` Jerry Quinn
@ 1999-03-31 23:46                 ` Jeffrey A Law
  1 sibling, 0 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Jerry Quinn; +Cc: egcs

  In message < 36EEA7B6.D3C894DE@americasm01.nt.com >you write:
  > (define_function_unit "pa8000memory" 2 0
  >   (and (eq_attr "type" "load,fpload,store,fpstore")
  >        (eq_attr "cpu" "8000")) 2 1)
I would suggest making the simultaneity 1 and ready delay 1.  The point is
we do not want to expose the load latency, since we're trying to describe
how instructions are retired.

ie, we can retire one instruction from each of the two load-store units
every cycle.

I'd also suggest changing the name to "pa8000lsu" since we're not trying to
describe the memory subsystem, but instead how insns retire out of the load
store unit.




  > (define_function_unit "pa8000fp_div" 2 1
  >   (and (eq_attr "type" "fpdivsgl,fpsqrtsgl")
  > 	(eq_attr "cpu" "8000")) 17 17)
  > (define_function_unit "pa8000fp_div" 2 1
  >   (and (eq_attr "type" "fpdivdbl,fpsqrtdbl")
  > 	(eq_attr "cpu" "8000")) 31 31)
  > (define_function_unit "pa8000alu" 2 1
  >    (and
  >     (eq_attr "type" "!load,fpload,store,fpstore")
  >     (eq_attr "cpu" "8000")) 1 1)
These look reasonable.  



I'd create one additional unit -- fmac for all the other fp computation
insn to show the partial latency as recommended by HP.  Something like this:

(define_function_unit "pa8000fmac" 2 0
  (and
    (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl")
    (eq_attr "cpu" "8000")) 2 1)

ie, there's two fmac units which are fully pipelined.   Results are available
in 2 cycles.


I'm going to make those changes and install the patch.


  > Theory being that the memory represents things leaving the load reorder
  > buffer and alu represents the nonload reorder buffer.  I added the
  > div/sqrt constraint on the theory that they take long enough to have a
  > big effect on retirement.  All for nought - it is better by a few
  > percent on some programs, worse by a few percent on others.  No major
  > differences that I could see.
That's basically what we want to do.  You shouldn't expect much from
instruction scheduling on a PA8000 class machine.  All the folks I've
spoken to about this indicate that it's minor relative to other stuff.

The other thing to think about is how to show that some instructions which
are data dependent can/should issue in the same cycle.  ie, if an alu
operation feeds another alu operation, then we should issue them in the
same cycle.

One thought would be to make the ready delay for alu instructions 0, then
tweak haifa to add dependent instrutions to the ready queue immediately
after it issues an insn with a ready delay of zero cycles.


jeff


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-25 13:56                                                 ` Richard Henderson
  1999-03-25 15:10                                                   ` Richard Henderson
@ 1999-03-31 23:46                                                   ` Richard Henderson
  1 sibling, 0 replies; 48+ messages in thread
From: Richard Henderson @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law, Jerry Quinn; +Cc: egcs

On Thu, Mar 25, 1999 at 01:51:10AM -0700, Jeffrey A Law wrote:
> Hmmm, you're right.  Hmmm, now I'm not sure why I saw the undesired behavior.

The loop in schedule_block that calls schedule_insn is careful
not to touch any insns that were put on the ready queue after
we began issuing insns for that cycle.

The following change seems to do the right thing for Alpha EV5
wrt compare+cmove.

Comments?


r~


	* haifa-sched.c (insn_cost): LINK_COST_FREE means cost 0, not 1.
	(schedule_insn): Only put insns directly on the ready queue that
	have cost zero.
	(schedule_block): Continue processing the ready list until there
	are no more instructions left to issue.

Index: haifa-sched.c
===================================================================
RCS file: /egcs/carton/cvsfiles/egcs/gcc/haifa-sched.c,v
retrieving revision 1.85
diff -c -p -d -r1.85 haifa-sched.c
*** haifa-sched.c	1999/03/13 17:38:17	1.85
--- haifa-sched.c	1999/03/25 21:51:29
*************** insn_cost (insn, link, used)
*** 3146,3160 ****
       and LINK_COST_ZERO.  */
  
    if (LINK_COST_FREE (link))
!     cost = 1;
  #ifdef ADJUST_COST
    else if (!LINK_COST_ZERO (link))
      {
        int ncost = cost;
  
        ADJUST_COST (used, link, insn, ncost);
!       if (ncost <= 1)
! 	LINK_COST_FREE (link) = ncost = 1;
        if (cost == ncost)
  	LINK_COST_ZERO (link) = 1;
        cost = ncost;
--- 3146,3163 ----
       and LINK_COST_ZERO.  */
  
    if (LINK_COST_FREE (link))
!     cost = 0;
  #ifdef ADJUST_COST
    else if (!LINK_COST_ZERO (link))
      {
        int ncost = cost;
  
        ADJUST_COST (used, link, insn, ncost);
!       if (ncost < 1)
! 	{
! 	  LINK_COST_FREE (link) = 1;
! 	  ncost = 0;
! 	}
        if (cost == ncost)
  	LINK_COST_ZERO (link) = 1;
        cost = ncost;
*************** schedule_insn (insn, ready, n_ready, clo
*** 4453,4459 ****
  	  /* Adjust the priority of NEXT and either put it on the ready
  	     list or queue it.  */
  	  adjust_priority (next);
! 	  if (effective_cost <= 1)
  	    ready[n_ready++] = next;
  	  else
  	    queue_insn (next, effective_cost);
--- 4456,4462 ----
  	  /* Adjust the priority of NEXT and either put it on the ready
  	     list or queue it.  */
  	  adjust_priority (next);
! 	  if (effective_cost < 1)
  	    ready[n_ready++] = next;
  	  else
  	    queue_insn (next, effective_cost);
*************** schedule_block (bb, rgn_n_insns)
*** 6921,6940 ****
  	  debug_ready_list (ready, n_ready);
  	}
  
!       /* Issue insns from ready list.
!          It is important to count down from n_ready, because n_ready may change
!          as insns are issued.  */
        can_issue_more = issue_rate;
!       for (i = n_ready - 1; i >= 0 && can_issue_more; i--)
  	{
! 	  rtx insn = ready[i];
  	  int cost = actual_hazard (insn_unit (insn), insn, clock_var, 0);
  
  	  if (cost > 1)
! 	    {
! 	      queue_insn (insn, cost);
! 	      ready[i] = ready[--n_ready];	/* remove insn from ready list */
! 	    }
  	  else if (cost == 0)
  	    {
  	      /* an interblock motion? */
--- 6924,6939 ----
  	  debug_ready_list (ready, n_ready);
  	}
  
!       /* Issue insns from ready list.  */
        can_issue_more = issue_rate;
!       while (n_ready != 0 && can_issue_more)
  	{
! 	  /* Select and remove the insn from the ready list.  */
! 	  rtx insn = ready[--n_ready];
  	  int cost = actual_hazard (insn_unit (insn), insn, clock_var, 0);
  
  	  if (cost > 1)
! 	    queue_insn (insn, cost);
  	  else if (cost == 0)
  	    {
  	      /* an interblock motion? */
*************** schedule_block (bb, rgn_n_insns)
*** 7010,7018 ****
  #endif
  
  	      n_ready = schedule_insn (insn, ready, n_ready, clock_var);
- 
- 	      /* remove insn from ready list */
- 	      ready[i] = ready[--n_ready];
  
  	      /* close this block after scheduling its jump */
  	      if (GET_CODE (last_scheduled_insn) == JUMP_INSN)
--- 7009,7014 ----

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-16 10:49           ` Jerry Quinn
       [not found]             ` < 36EEA7B6.D3C894DE@americasm01.nt.com >
@ 1999-03-31 23:46             ` Jerry Quinn
  1 sibling, 0 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law; +Cc: egcs

Jeffrey A Law wrote:
> 
>   > I wouldn't mind playing with the machine description some.  Also, do you
>   > have any clue if the 8000 also has this same asymmetric ALU situation?
> The 8000 does not suffer from this problem.  The ALUs are symmetric.  Not
> that it matters since from a scheduling standpoint you (mostly) want to
> ignore latency and schedule for reorder buffer retirement (2 memops, 2 nonmem
> ops per cycle).

I thought I'd go ahead and play with the machine description a little. 
I tried the following after the initial implementation of doubling the
7100lc function units:

(define_function_unit "pa8000memory" 2 0
  (and (eq_attr "type" "load,fpload,store,fpstore")
       (eq_attr "cpu" "8000")) 2 1)
(define_function_unit "pa8000fp_div" 2 1
  (and (eq_attr "type" "fpdivsgl,fpsqrtsgl")
	(eq_attr "cpu" "8000")) 17 17)
(define_function_unit "pa8000fp_div" 2 1
  (and (eq_attr "type" "fpdivdbl,fpsqrtdbl")
	(eq_attr "cpu" "8000")) 31 31)
(define_function_unit "pa8000alu" 2 1
   (and
    (eq_attr "type" "!load,fpload,store,fpstore")
    (eq_attr "cpu" "8000")) 1 1)

Theory being that the memory represents things leaving the load reorder
buffer and alu represents the nonload reorder buffer.  I added the
div/sqrt constraint on the theory that they take long enough to have a
big effect on retirement.  All for nought - it is better by a few
percent on some programs, worse by a few percent on others.  No major
differences that I could see.

Any thoughts?

-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-20  2:08                             ` Jeffrey A Law
       [not found]                               ` < 1843.921916006@upchuck >
@ 1999-03-31 23:46                               ` Jeffrey A Law
  1 sibling, 0 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jerry Quinn, egcs

  In message < 19990319173226.C14722@cygnus.com >you write:
  > Does it seem worthwhile to add some sort of target define to control
  > adding dependant insns to the ready queue in the same cycle?  It's
  > true that it doesn't matter to the vast majority of the processors
  > we support, but I'm thinking of VLIW parts that do actually have
  > write-after-read conflicts within a cycle.
That's basically what I've had in mind.  In addition to checking the
target macro, I think we'd want to check the ready delay and only do
this if it's zero.

I hadn't even thought about the VLIW issues, but yea, it seems like we'd
want to have the same kind of capability for VLIW targets.

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-26 14:07                                                     ` Jerry Quinn
  1999-03-27 16:04                                                       ` Jeffrey A Law
@ 1999-03-31 23:46                                                       ` Jerry Quinn
  1 sibling, 0 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-31 23:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: law, egcs

So after applying the adjust_cost to 0 for pa8000, I didn't see any
identifiable change in performance.  I bootstrapped with the change in
place, but no difference in the speed of gcc, or in my code.

Would it be a good idea to try setting load/store delay to 0 as well?

Jerry

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-09 11:35     ` Jerry Quinn
       [not found]       ` < 36E577DF.F825BAFB@americasm01.nt.com >
@ 1999-03-31 23:46       ` Jerry Quinn
  1 sibling, 0 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law; +Cc: egcs

Jeffrey A Law wrote:
>   > Can someone explain what that means.
> Basically one of the ALUs is not complete and can only handle a subset of the
> ALU instructions.
> 
> So, stuff like adds, subtracts, compares, etc can issue to either ALU, but
> a memory load/store or a shift instruction can only issue to the first
> alu.  Assuming I remember everything from the LC series correctly.

Do you have a pointer to something that describes this better?  I've
been unable to find any info at all on the 7100LC and the 7300 is
described mainly as a 7100LC core.

I wouldn't mind playing with the machine description some.  Also, do you
have any clue if the 8000 also has this same asymmetric ALU situation?

Jerry

-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-24 14:44                                         ` Jerry Quinn
       [not found]                                           ` < 36F96A7D.F0DBDF47@americasm01.nt.com >
@ 1999-03-31 23:46                                           ` Jerry Quinn
  1 sibling, 0 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law; +Cc: Richard Henderson, egcs

Jeffrey A Law wrote:
> 
>   In message < 36F811ED.32912006@americasm01.nt.com >you write:
>   > I started trying to look at the haifa_sched.c file.  I can't really see
>   > where insns dependent on the current one in the ready queue are
>   > removed.  I'm looking in the loop on ready[] in schedule_block().  My
>   > guess is that somehow schedule_insn does this, but I'm not sure.
> Yes.
> 
> schedule_insn walks over the INSN_DEPEND list and decrements the dependency
> count for all the instructions which are dependent on INSN.  If the dependency
> count goes to zero, then the dependent insn is added to the ready queue.
> 
>   for (link = INSN_DEPEND (insn); link != 0; link = XEXP (link, 1))
>     {
>       rtx next = XEXP (link, 0);
>       int cost = insn_cost (insn, link, next);
> 
>       INSN_TICK (next) = MAX (INSN_TICK (next), clock + cost);
> 
>       if ((INSN_DEP_COUNT (next) -= 1) == 0)
>         {
> [ ... ]
>           /* Adjust the priority of NEXT and either put it on the ready
>              list or queue it.  */
>           adjust_priority (next);
>           if (effective_cost <= 1)
>             ready[n_ready++] = next;
>           else
>             queue_insn (next, effective_cost);
>         }

So the ready list for a cycle starts out with insns with no
dependencies.  Then when we pick an insn off the ready list, it's placed
into the scheduled chain and schedule_insn is called.

OK, now I'm confused, because in my head it looks like the code should
already do what you want.  schedule_insn is called as soon as we
schedule an insn from the ready list.  So, if we have a dependent insn,
such as the lo_sum following a sethi, the dependency would now be
reduced to 0, and it becomes eligible for the lower part of the code. 
So as long as effective_cost is OK, it would be added to the ready
queue.  INSN_DEPEND is the list of insns that depend on the one being
scheduled, right?

Since this doesn't happen, I'm obviously missing something.

If the alu has 0 delay, insn_cost would return 0?  Then, INSN_TICK would
be unchanged.  And effective_cost would end up 0, causing the dependent
insn to be placed into ready.

I'm babbling again and confused.  Enough for today :-)

Jerry


-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-19 17:32                         ` Richard Henderson
       [not found]                           ` < 19990319173226.C14722@cygnus.com >
@ 1999-03-31 23:46                           ` Richard Henderson
  1 sibling, 0 replies; 48+ messages in thread
From: Richard Henderson @ 1999-03-31 23:46 UTC (permalink / raw)
  To: law, Jerry Quinn; +Cc: egcs

On Thu, Mar 18, 1999 at 10:00:20PM -0700, Jeffrey A Law wrote:
> It has a structure like:
> 
> while (not all insns scheduled)
>   add insns with no outstanding dependencies to the ready queue
>   sort the ready queue
>   while (ready list is not empty && target can issue more insns)
>     issue an insn off the ready queue, remove dependencies on the issued insn
> 
> So, given insn1 which feeds insn2 we will never issue insn1 & insn2 in the
> same cycle.

I noticed this the other day in a different context. 

Does it seem worthwhile to add some sort of target define to control
adding dependant insns to the ready queue in the same cycle?  It's
true that it doesn't matter to the vast majority of the processors
we support, but I'm thinking of VLIW parts that do actually have
write-after-read conflicts within a cycle.


r~

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-03-25 15:10                                                   ` Richard Henderson
                                                                       ` (2 preceding siblings ...)
  1999-03-31 23:46                                                     ` Richard Henderson
@ 1999-04-02 11:53                                                     ` Jeffrey A Law
  1999-04-05 15:50                                                       ` Jerry Quinn
  1999-04-30 23:15                                                       ` Jeffrey A Law
  3 siblings, 2 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-04-02 11:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jerry Quinn, egcs

  In message <19990325151052.A28577@cygnus.com>you write:
  > On Thu, Mar 25, 1999 at 01:56:39PM -0800, Richard Henderson wrote:
  > > The following change seems to do the right thing for Alpha EV5
  > > wrt compare+cmove.
  > 
  > The previous patch wouldn't bootstrap the compiler.  I goofed on
  > stepping through the ready list insns.
I played around a little with this (the patch to get the scheduler to 
place dependent insns into the ready list as soon as their depedencies
were resolved) and couldn't ever get the trivial example to fire the
PA equivalent of sethi/lo_sum in the same cycle -- even after hacking up
my pa.md to have an issue/ready delay of zero for ALU insns.

Maybe I'm missing something...  

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-04-02 11:53                                                     ` Jeffrey A Law
@ 1999-04-05 15:50                                                       ` Jerry Quinn
  1999-04-30 23:15                                                         ` Jerry Quinn
  1999-04-30 23:15                                                       ` Jeffrey A Law
  1 sibling, 1 reply; 48+ messages in thread
From: Jerry Quinn @ 1999-04-05 15:50 UTC (permalink / raw)
  To: law; +Cc: Richard Henderson, egcs

Jeffrey A Law wrote:
> 
>   In message <19990325151052.A28577@cygnus.com>you write:
>   > On Thu, Mar 25, 1999 at 01:56:39PM -0800, Richard Henderson wrote:
>   > > The following change seems to do the right thing for Alpha EV5
>   > > wrt compare+cmove.
>   >
>   > The previous patch wouldn't bootstrap the compiler.  I goofed on
>   > stepping through the ready list insns.
> I played around a little with this (the patch to get the scheduler to
> place dependent insns into the ready list as soon as their depedencies
> were resolved) and couldn't ever get the trivial example to fire the
> PA equivalent of sethi/lo_sum in the same cycle -- even after hacking up
> my pa.md to have an issue/ready delay of zero for ALU insns.
> 
> Maybe I'm missing something...
> 
> jeff

I found the same thing when I tried zeroing issue/ready delay.  However,
if I had pa_adjust_cost return 0, it appears to do the right thing.

Jerry

-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-04-05 15:50                                                       ` Jerry Quinn
@ 1999-04-30 23:15                                                         ` Jerry Quinn
  0 siblings, 0 replies; 48+ messages in thread
From: Jerry Quinn @ 1999-04-30 23:15 UTC (permalink / raw)
  To: law; +Cc: Richard Henderson, egcs

Jeffrey A Law wrote:
> 
>   In message <19990325151052.A28577@cygnus.com>you write:
>   > On Thu, Mar 25, 1999 at 01:56:39PM -0800, Richard Henderson wrote:
>   > > The following change seems to do the right thing for Alpha EV5
>   > > wrt compare+cmove.
>   >
>   > The previous patch wouldn't bootstrap the compiler.  I goofed on
>   > stepping through the ready list insns.
> I played around a little with this (the patch to get the scheduler to
> place dependent insns into the ready list as soon as their depedencies
> were resolved) and couldn't ever get the trivial example to fire the
> PA equivalent of sethi/lo_sum in the same cycle -- even after hacking up
> my pa.md to have an issue/ready delay of zero for ALU insns.
> 
> Maybe I'm missing something...
> 
> jeff

I found the same thing when I tried zeroing issue/ready delay.  However,
if I had pa_adjust_cost return 0, it appears to do the right thing.

Jerry

-- 
Jerry Quinn                             Tel: (514) 761-8737
jquinn@nortelnetworks.com               Fax: (514) 761-8505
Speech Recognition Research

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: Fwd: Questions on PA machine description?
  1999-04-02 11:53                                                     ` Jeffrey A Law
  1999-04-05 15:50                                                       ` Jerry Quinn
@ 1999-04-30 23:15                                                       ` Jeffrey A Law
  1 sibling, 0 replies; 48+ messages in thread
From: Jeffrey A Law @ 1999-04-30 23:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Jerry Quinn, egcs

  In message <19990325151052.A28577@cygnus.com>you write:
  > On Thu, Mar 25, 1999 at 01:56:39PM -0800, Richard Henderson wrote:
  > > The following change seems to do the right thing for Alpha EV5
  > > wrt compare+cmove.
  > 
  > The previous patch wouldn't bootstrap the compiler.  I goofed on
  > stepping through the ready list insns.
I played around a little with this (the patch to get the scheduler to 
place dependent insns into the ready list as soon as their depedencies
were resolved) and couldn't ever get the trivial example to fire the
PA equivalent of sethi/lo_sum in the same cycle -- even after hacking up
my pa.md to have an issue/ready delay of zero for ALU insns.

Maybe I'm missing something...  

jeff

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~1999-04-30 23:15 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
1999-03-08  6:49 Fwd: Questions on PA machine description? Jerry Quinn
     [not found] ` < 36E3E35A.4BB420BE@americasm01.nt.com >
1999-03-08 20:48   ` Jeffrey A Law
1999-03-09 11:35     ` Jerry Quinn
     [not found]       ` < 36E577DF.F825BAFB@americasm01.nt.com >
1999-03-09 12:18         ` Jeffrey A Law
1999-03-16 10:49           ` Jerry Quinn
     [not found]             ` < 36EEA7B6.D3C894DE@americasm01.nt.com >
1999-03-17 20:12               ` Jeffrey A Law
1999-03-18 14:35                 ` Jerry Quinn
     [not found]                   ` < 36F17F8C.9FD937FA@americasm01.nt.com >
1999-03-18 21:00                     ` Jeffrey A Law
     [not found]                       ` < 18375.921819620@hurl.cygnus.com >
1999-03-19 17:32                         ` Richard Henderson
     [not found]                           ` < 19990319173226.C14722@cygnus.com >
1999-03-20  2:08                             ` Jeffrey A Law
     [not found]                               ` < 1843.921916006@upchuck >
1999-03-20 10:43                                 ` Richard Henderson
1999-03-23 14:13                                   ` Jerry Quinn
     [not found]                                     ` < 36F811ED.32912006@americasm01.nt.com >
1999-03-24  1:30                                       ` Jeffrey A Law
1999-03-24 14:44                                         ` Jerry Quinn
     [not found]                                           ` < 36F96A7D.F0DBDF47@americasm01.nt.com >
1999-03-25  1:01                                             ` Jeffrey A Law
     [not found]                                               ` < 4324.922351870@upchuck >
1999-03-25 13:56                                                 ` Richard Henderson
1999-03-25 15:10                                                   ` Richard Henderson
1999-03-26 10:50                                                     ` Jerry Quinn
1999-03-26 11:04                                                       ` Richard Henderson
1999-03-31 23:46                                                         ` Richard Henderson
1999-03-31 23:46                                                       ` Jerry Quinn
1999-03-26 14:07                                                     ` Jerry Quinn
1999-03-27 16:04                                                       ` Jeffrey A Law
1999-03-31 23:46                                                         ` Jeffrey A Law
1999-03-31 23:46                                                       ` Jerry Quinn
1999-03-31 23:46                                                     ` Richard Henderson
1999-04-02 11:53                                                     ` Jeffrey A Law
1999-04-05 15:50                                                       ` Jerry Quinn
1999-04-30 23:15                                                         ` Jerry Quinn
1999-04-30 23:15                                                       ` Jeffrey A Law
1999-03-31 23:46                                                   ` Richard Henderson
1999-03-31 23:46                                               ` Jeffrey A Law
1999-03-31 23:46                                           ` Jerry Quinn
1999-03-31 23:46                                         ` Jeffrey A Law
1999-03-31 23:46                                     ` Jerry Quinn
1999-03-31 23:46                                   ` Richard Henderson
1999-03-31 23:46                               ` Jeffrey A Law
1999-03-31 23:46                           ` Richard Henderson
1999-03-31 23:46                       ` Jeffrey A Law
1999-03-31 23:46                   ` Jerry Quinn
1999-03-31 23:46                 ` Jeffrey A Law
1999-03-17 20:14               ` Jeffrey A Law
1999-03-31 23:46                 ` Jeffrey A Law
1999-03-31 23:46             ` Jerry Quinn
1999-03-31 23:46           ` Jeffrey A Law
1999-03-31 23:46       ` Jerry Quinn
1999-03-31 23:46     ` Jeffrey A Law
1999-03-31 23:46 ` Jerry Quinn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).