From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeffrey A Law To: "Jerry Quinn" Cc: egcs@egcs.cygnus.com Subject: Re: Fwd: Questions on PA machine description? Date: Wed, 17 Mar 1999 20:12:00 -0000 Message-id: <14489.921730350@hurl.cygnus.com> In-reply-to: Your message of Tue, 16 Mar 1999 13:49:26 MST. < 36EEA7B6.D3C894DE@americasm01.nt.com > References: <36EEA7B6.D3C894DE@americasm01.nt.com> X-SW-Source: 1999-03/msg00602.html In message < 36EEA7B6.D3C894DE@americasm01.nt.com >you write: > (define_function_unit "pa8000memory" 2 0 > (and (eq_attr "type" "load,fpload,store,fpstore") > (eq_attr "cpu" "8000")) 2 1) I would suggest making the simultaneity 1 and ready delay 1. The point is we do not want to expose the load latency, since we're trying to describe how instructions are retired. ie, we can retire one instruction from each of the two load-store units every cycle. I'd also suggest changing the name to "pa8000lsu" since we're not trying to describe the memory subsystem, but instead how insns retire out of the load store unit. > (define_function_unit "pa8000fp_div" 2 1 > (and (eq_attr "type" "fpdivsgl,fpsqrtsgl") > (eq_attr "cpu" "8000")) 17 17) > (define_function_unit "pa8000fp_div" 2 1 > (and (eq_attr "type" "fpdivdbl,fpsqrtdbl") > (eq_attr "cpu" "8000")) 31 31) > (define_function_unit "pa8000alu" 2 1 > (and > (eq_attr "type" "!load,fpload,store,fpstore") > (eq_attr "cpu" "8000")) 1 1) These look reasonable. I'd create one additional unit -- fmac for all the other fp computation insn to show the partial latency as recommended by HP. Something like this: (define_function_unit "pa8000fmac" 2 0 (and (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl") (eq_attr "cpu" "8000")) 2 1) ie, there's two fmac units which are fully pipelined. Results are available in 2 cycles. I'm going to make those changes and install the patch. > Theory being that the memory represents things leaving the load reorder > buffer and alu represents the nonload reorder buffer. I added the > div/sqrt constraint on the theory that they take long enough to have a > big effect on retirement. All for nought - it is better by a few > percent on some programs, worse by a few percent on others. No major > differences that I could see. That's basically what we want to do. You shouldn't expect much from instruction scheduling on a PA8000 class machine. All the folks I've spoken to about this indicate that it's minor relative to other stuff. The other thing to think about is how to show that some instructions which are data dependent can/should issue in the same cycle. ie, if an alu operation feeds another alu operation, then we should issue them in the same cycle. One thought would be to make the ready delay for alu instructions 0, then tweak haifa to add dependent instrutions to the ready queue immediately after it issues an insn with a ready delay of zero cycles. jeff From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeffrey A Law To: "Jerry Quinn" Cc: egcs@egcs.cygnus.com Subject: Re: Fwd: Questions on PA machine description? Date: Wed, 31 Mar 1999 23:46:00 -0000 Message-ID: <14489.921730350@hurl.cygnus.com> References: <36EEA7B6.D3C894DE@americasm01.nt.com> X-SW-Source: 1999-03n/msg00607.html Message-ID: <19990331234600.madw9bLF_SOjRxPnrmdq8rUW7i2hH-OR8vrci2XuV8k@z> In message < 36EEA7B6.D3C894DE@americasm01.nt.com >you write: > (define_function_unit "pa8000memory" 2 0 > (and (eq_attr "type" "load,fpload,store,fpstore") > (eq_attr "cpu" "8000")) 2 1) I would suggest making the simultaneity 1 and ready delay 1. The point is we do not want to expose the load latency, since we're trying to describe how instructions are retired. ie, we can retire one instruction from each of the two load-store units every cycle. I'd also suggest changing the name to "pa8000lsu" since we're not trying to describe the memory subsystem, but instead how insns retire out of the load store unit. > (define_function_unit "pa8000fp_div" 2 1 > (and (eq_attr "type" "fpdivsgl,fpsqrtsgl") > (eq_attr "cpu" "8000")) 17 17) > (define_function_unit "pa8000fp_div" 2 1 > (and (eq_attr "type" "fpdivdbl,fpsqrtdbl") > (eq_attr "cpu" "8000")) 31 31) > (define_function_unit "pa8000alu" 2 1 > (and > (eq_attr "type" "!load,fpload,store,fpstore") > (eq_attr "cpu" "8000")) 1 1) These look reasonable. I'd create one additional unit -- fmac for all the other fp computation insn to show the partial latency as recommended by HP. Something like this: (define_function_unit "pa8000fmac" 2 0 (and (eq_attr "type" "fpcc,fpalu,fpmulsgl,fpmuldbl") (eq_attr "cpu" "8000")) 2 1) ie, there's two fmac units which are fully pipelined. Results are available in 2 cycles. I'm going to make those changes and install the patch. > Theory being that the memory represents things leaving the load reorder > buffer and alu represents the nonload reorder buffer. I added the > div/sqrt constraint on the theory that they take long enough to have a > big effect on retirement. All for nought - it is better by a few > percent on some programs, worse by a few percent on others. No major > differences that I could see. That's basically what we want to do. You shouldn't expect much from instruction scheduling on a PA8000 class machine. All the folks I've spoken to about this indicate that it's minor relative to other stuff. The other thing to think about is how to show that some instructions which are data dependent can/should issue in the same cycle. ie, if an alu operation feeds another alu operation, then we should issue them in the same cycle. One thought would be to make the ready delay for alu instructions 0, then tweak haifa to add dependent instrutions to the ready queue immediately after it issues an insn with a ready delay of zero cycles. jeff