From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 6796 invoked by alias); 9 Jan 2003 03:27:32 -0000 Mailing-List: contact cgen-help@sources.redhat.com; run by ezmlm Precedence: bulk List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cgen-owner@sources.redhat.com Received: (qmail 6773 invoked from network); 9 Jan 2003 03:27:30 -0000 Received: from unknown (HELO mx1.redhat.com) (66.187.233.31) by 209.249.29.67 with SMTP; 9 Jan 2003 03:27:30 -0000 Received: from int-mx1.corp.redhat.com (int-mx1.corp.redhat.com [172.16.52.254]) by mx1.redhat.com (8.11.6/8.11.6) with ESMTP id h092xXB01871 for ; Wed, 8 Jan 2003 21:59:33 -0500 Received: from hypatia.brisbane.redhat.com (IDENT:root@hypatia.brisbane.redhat.com [172.16.5.3]) by int-mx1.corp.redhat.com (8.11.6/8.11.6) with ESMTP id h093REa29860 for ; Wed, 8 Jan 2003 22:27:14 -0500 Received: from scooby.brisbane.redhat.com (scooby.brisbane.redhat.com [172.16.5.228]) by hypatia.brisbane.redhat.com (8.11.6/8.11.6) with ESMTP id h093R9w15180 for ; Thu, 9 Jan 2003 13:27:09 +1000 Received: by scooby.brisbane.redhat.com (Postfix, from userid 500) id B615D1BE0; Wed, 8 Jan 2003 22:27:08 -0500 (EST) To: cgen@sources.redhat.com Subject: exposed pipeline patch (long!) From: Ben Elliston Date: Thu, 09 Jan 2003 03:27:00 -0000 Message-ID: User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Honest Recruiter) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-SW-Source: 2003-q1/txt/msg00004.txt.bz2 I'm posting this patch on behalf of Graydon Hoare, who write this exposed pipeline support last year. It's a more generalised form of the (delay ..) rtx and has been used for a couple of ports already. Rather than just commit it, I thought I would post it for review. Okay to commit? Ben 2001-06-05 graydon hoare * utils.scm (foldl): Define. (foldr): Define. (union): Define. (intersection): Simplify. * sid.scm : Set APPLICATION to SID-SIMULATOR. (-op-gen-delayed-set-maybe-trace): Define. ( 'gen-set-{quiet,trace}): Delegate to op-gen-delayed-set-quiet etc. Note: this is still a little tangled up and needs cleaning. (-with-parallel?): Hardwire with-parallel to #t. ( 'cxmake-get): Replace with lookahead-aware code * sid-decode.scm: Remove per-insn writeback fns. (-gen-idesc-decls): Redefine sem_fn type. * sid-cpu.scm (gen-write-stack-structure): Replace parexec stuff with write stack stuff. (cgen-write.cxx): Replace per-insn writebacks with single write stack writeback. Add write stack reset function. (-gen-scache-semantic-fn insn): Replace parexec stuff with write stack stuff. * rtl-c.scm (xop): Clone operand into delayed operand if #:delayed estate attribute set. (delay): Set #:delayed attribute to calculated delay, update maximum delay of cpu, check (delay ...) usage. * operand.scm (): Add delayed slot to . * mach.scm (): Add max-delay slot to . * dev.scm (load-sid): Set APPLICATION to SID-SIMULATOR. * doc/rtl.texi (Expressions): Add section on (delay ...). Index: utils.scm =================================================================== RCS file: /cvs/src/src/cgen/utils.scm,v retrieving revision 1.7 diff -u -p -r1.7 utils.scm --- utils.scm 7 Jan 2002 08:23:59 -0000 1.7 +++ utils.scm 9 Jan 2003 03:22:12 -0000 @@ -78,6 +78,10 @@ (define (spaces n) (make-string n #\space)) +; simple list-generators +(define (seq p q) (if (> p q) '() (cons p (seq (+ p 1) q)))) +(define (fill x n) (if (> n 0) (cons x (fill x (- n 1))) '())) + ; Write N spaces to PORT, or the current output port if elided. (define (write-spaces n . port) @@ -471,6 +475,17 @@ (reverse! (list-drop n (reverse l))) ) +;; left fold +(define (foldl kons accum lis) + (if (null? lis) accum + (foldl kons (kons accum (car lis)) (cdr lis)))) + +;; right fold +(define (foldr kons knil lis) + (if (null? lis) knil + (kons (car lis) (foldr kons knil (cdr lis))))) + + ; APL's +\ operation on a vector of numbers. (define (plus-scan l) @@ -540,12 +555,13 @@ ; Return intersection of two lists. -(define (intersection l1 l2) - (cond ((null? l1) l1) - ((null? l2) l2) - ((memq (car l1) l2) (cons (car l1) (intersection (cdr l1) l2))) - (else (intersection (cdr l1) l2))) -) +(define (intersection a b) + (foldl (lambda (l e) (if (memq e a) (cons e l) l)) '() b)) + +; Return union of two lists. + +(define (union a b) + (foldl (lambda (l e) (if (memq e l) l (cons e l))) a b)) ; Return a count of the number of elements of list L1 that are in list L2. ; Uses memq. Index: sid.scm =================================================================== RCS file: /cvs/src/src/cgen/sid.scm,v retrieving revision 1.7 diff -u -p -r1.7 sid.scm --- sid.scm 7 Jan 2002 08:23:59 -0000 1.7 +++ sid.scm 9 Jan 2003 03:22:18 -0000 @@ -10,7 +10,7 @@ ; [It still does but that's to be fixed.] ; Specify which application. -(set! APPLICATION 'SIMULATOR) +(set! APPLICATION 'SID-SIMULATOR) ; Misc. state info. @@ -118,7 +118,7 @@ ; While processing operand reading (or writing), parallel execution support ; needs to be turned off, so it is up to the appropriate cgen-foo.c proc to ; set-with-parallel?! appropriately. -(define -with-parallel? #f) +(define -with-parallel? #t) (define (with-parallel?) -with-parallel?) (define (set-with-parallel?! flag) (set! -with-parallel? flag)) @@ -924,43 +924,6 @@ (rtl-c++ INT yes? nil #:rtl-cover-fns? #t))) ) -; For parallel write post-processing, we don't want to defer setting the pc. -; ??? Not sure anymore. -;(method-make! -; 'gen-set-quiet -; (lambda (self estate mode index selector newval) -; (-op-gen-set-quiet self estate mode index selector newval))) -;(method-make! -; 'gen-set-trace -; (lambda (self estate mode index selector newval) -; (-op-gen-set-trace self estate mode index selector newval))) - -; Name of C macro to access parallel execution operand support. - -(define -par-operand-macro "OPRND") - -; Return C code to fetch an operand's value and save it away for the -; semantic handler. This is used to handle parallel execution of several -; instructions where all inputs of all insns are read before any outputs are -; written. -; For operands, the word `read' is only used in this context. - -(define (op:read op sfmt) - (let ((estate (estate-make-for-normal-rtl-c++ nil nil))) - (send op 'gen-read estate sfmt -par-operand-macro)) -) - -; Return C code to write an operand's value. -; This is used to handle parallel execution of several instructions where all -; outputs are written to temporary spots first, and then a final -; post-processing pass is run to update cpu state. -; For operands, the word `write' is only used in this context. - -(define (op:write op sfmt) - (let ((estate (estate-make-for-normal-rtl-c++ nil nil))) - (send op 'gen-write estate sfmt -par-operand-macro)) -) - ; Default gen-read method. ; This is used to help support targets with parallel insns. ; Either this or gen-write (but not both) is used. @@ -1010,36 +973,46 @@ (method-make! 'cxmake-get (lambda (self estate mode index selector) - (let ((mode (if (mode:eq? 'DFLT mode) - (send self 'get-mode) - mode)) - (index (if index index (op:index self))) - (selector (if selector selector (op:selector self)))) - ; If the object is marked with the RAW attribute, access the hardware - ; object directly. + (let* ((mode (if (mode:eq? 'DFLT mode) + (send self 'get-mode) + mode)) + (hw (op:type self)) + (index (if index index (op:index self))) + (selector (if selector selector (op:selector self))) + (delayval (op:delay self)) + (md (mode:c-type mode)) + (name (if + (eq? (obj:name hw) 'h-memory) + (string-append md "_memory") + (gen-c-symbol (obj:name hw)))) + (getter (op:getter self)) + (def-val (cond ((obj-has-attr? self 'RAW) + (send hw 'cxmake-get-raw estate mode index selector)) + (getter + (let ((args (car getter)) + (expr (cadr getter))) + (rtl-c-expr mode expr + (if (= (length args) 0) nil + (list (list (car args) 'UINT index))) + #:rtl-cover-fns? #t + #:output-language (estate-output-language estate)))) + (else + (send hw 'cxmake-get estate mode index selector))))) + (logit 4 " cxmake-get self=" (obj:name self) " mode=" (obj:name mode) " index=" (obj:name index) " selector=" selector "\n") - (cond ((obj-has-attr? self 'RAW) - (send (op:type self) 'cxmake-get-raw estate mode index selector)) - ; If the instruction could be parallely executed with others and - ; we're doing read pre-processing, the operand has already been - ; fetched, we just have to grab the cached value. - ((with-parallel-read?) - (cx:make-with-atlist mode - (string-append -par-operand-macro - " (" (gen-sym self) ")") - nil)) ; FIXME: want CACHED attr if present - ((op:getter self) - (let ((args (car (op:getter self))) - (expr (cadr (op:getter self)))) - (rtl-c-expr mode expr - (if (= (length args) 0) - nil - (list (list (car args) 'UINT index))) - #:rtl-cover-fns? #t - #:output-language (estate-output-language estate)))) - (else - (send (op:type self) 'cxmake-get estate mode index selector))))) + + (if delayval + (if (derived-operand? self) + (error "delayed derived operands currently unsupported: " self) + (let ((idx (if index (string-append ", " (-gen-hw-index index estate)) ""))) + (cx:make mode (string-append "lookahead (" + (number->string delayval) + ", tick, " + "buf." name "_writes, " + (cx:c def-val) + idx ")")))) + def-val))) ) @@ -1049,16 +1022,9 @@ (send (op:type op) 'gen-set-quiet estate mode index selector newval) ) -(define (-op-gen-set-quiet-parallel op estate mode index selector newval) - (string-append - (if (op-save-index? op) - (string-append " " -par-operand-macro " (" (-op-index-name op) ")" - " = " (-gen-hw-index index estate) ";\n") - "") - " " - -par-operand-macro " (" (gen-sym op) ")" - " = " (cx:c newval) ";\n") -) +(define (-op-gen-delayed-set-quiet op estate mode index selector newval) + (-op-gen-delayed-set-maybe-trace op estate mode index selector newval #f)) + (define (-op-gen-set-trace op estate mode index selector newval) (string-append @@ -1079,12 +1045,7 @@ ;else (send (op:type op) 'gen-set-quiet estate mode index selector (cx:make-with-atlist mode "opval" (cx:atlist newval)))) - (if (and (with-profile?) - (op:cond? op)) - (string-append " written |= (1ULL << " - (number->string (op:num op)) - ");\n") - "") + ; TRACE_RESULT_ (cpu, abuf, hwnum, opnum, value); ; For each insn record array of operand numbers [or indices into ; operand instance table]. @@ -1122,21 +1083,41 @@ " }\n") ) -(define (-op-gen-set-trace-parallel op estate mode index selector newval) - (string-append - " {\n" - " " (mode:c-type mode) " opval = " (cx:c newval) ";\n" - (if (op-save-index? op) - (string-append " " -par-operand-macro " (" (-op-index-name op) ")" - " = " (-gen-hw-index index estate) ";\n") - "") - " " -par-operand-macro " (" (gen-sym op) ")" - " = opval;\n" - (if (op:cond? op) - (string-append " written |= (1ULL << " - (number->string (op:num op)) - ");\n") - "") +(define (-op-gen-delayed-set-trace op estate mode index selector newval) + (-op-gen-delayed-set-maybe-trace op estate mode index selector newval #t)) + +(define (-op-gen-delayed-set-maybe-trace op estate mode index selector newval do-trace?) + (let* ((pad " ") + (hw (op:type op)) + (delayval (op:delay op)) + (md (mode:c-type mode)) + (name (if + (eq? (obj:name hw) 'h-memory) + (string-append md "_memory") + (gen-c-symbol (obj:name hw)))) + (val (cx:c newval)) + (idx (if index (-gen-hw-index index estate) "")) + (idx-args (if (equal? idx "") "" (string-append ", " idx))) + ) + + (string-append + " {\n" + + (if delayval + + ;; delayed write: push it to the appropriate buffer + (string-append + pad md " opval = " val ";\n" + pad "buf." name "_writes [(tick + " (number->string delayval) + ") % @prefix@::pipe_sz].push (@prefix@::write<" md ">(pc, opval" idx-args "));\n") + + ;; else, uh, we should never have been called! + (error "-op-gen-delayed-set-maybe-trace called on non-delayed operand")) + + + (if do-trace? + + (string-append ; TRACE_RESULT_ (cpu, abuf, hwnum, opnum, value); ; For each insn record array of operand numbers [or indices into ; operand instance table]. @@ -1169,8 +1150,8 @@ "")) "opval << dec << \" \";\n" " }\n") -) - + ;; else no tracing is emitted + "")))) ; Return C code to set the value of an operand. ; NEWVAL is a object of the value to store. @@ -1189,8 +1170,8 @@ (selector (if selector selector (op:selector self)))) (cond ((obj-has-attr? self 'RAW) (send (op:type self) 'gen-set-quiet-raw estate mode index selector newval)) - ((with-parallel-write?) - (-op-gen-set-quiet-parallel self estate mode index selector newval)) + ((op:delay self) + (-op-gen-delayed-set-quiet self estate mode index selector newval)) (else (-op-gen-set-quiet self estate mode index selector newval))))) ) @@ -1212,26 +1193,12 @@ (selector (if selector selector (op:selector self)))) (cond ((obj-has-attr? self 'RAW) (send (op:type self) 'gen-set-quiet-raw estate mode index selector newval)) - ((with-parallel-write?) - (-op-gen-set-trace-parallel self estate mode index selector newval)) + ((op:delay self) + (-op-gen-delayed-set-trace self estate mode index selector newval)) (else (-op-gen-set-trace self estate mode index selector newval))))) ) -; Define and undefine C macros to tuck away details of instruction format used -; in the parallel execution functions. See gen-define-field-macro for a -; similar thing done for extraction/semantic functions. - -(define (gen-define-parallel-operand-macro sfmt) - (string-append "#define " -par-operand-macro "(f) " - "par_exec->operands." - (gen-sym sfmt) - ".f\n") -) - -(define (gen-undef-parallel-operand-macro sfmt) - (string-append "#undef " -par-operand-macro "\n") -) ; Operand profiling and parallel execution support. Index: sid-decode.scm =================================================================== RCS file: /cvs/src/src/cgen/sid-decode.scm,v retrieving revision 1.8 diff -u -p -r1.8 sid-decode.scm --- sid-decode.scm 7 Feb 2002 18:46:19 -0000 1.8 +++ sid-decode.scm 9 Jan 2003 03:22:18 -0000 @@ -47,10 +47,7 @@ bool @prefix@_idesc::idesc_table_initial (if pbb? "0, " (string-append (-gen-sem-fn-name insn) ", ")) - "") - (if (with-parallel?) - (string-append (-gen-write-fn-name sfmt) ", ") - "") + "") "\"" (string-upcase name) "\", " (gen-cpu-insn-enum (current-cpu) insn) ", " @@ -131,25 +128,6 @@ bool @prefix@_idesc::idesc_table_initial ) -;; and the same for writeback functions - -(define (-gen-write-fn-name sfmt) - (string-append "@prefix@_write_" (gen-sym sfmt)) -) - - -(define (-gen-write-fn-decls) - (string-write - "// Decls of each writeback fn.\n\n" - "using @cpu@::@prefix@_write_fn;\n" - (string-list-map (lambda (sfmt) - (string-list "extern @prefix@_write_fn " - (-gen-write-fn-name sfmt) - ";\n")) - (current-sfmt-list)) - "\n" - ) -) ; idesc, argbuf, and scache types @@ -164,14 +142,9 @@ struct @cpu@_cpu; struct @prefix@_scache; " (if (with-parallel?) - "struct @prefix@_parexec;\n" "") - (if (with-parallel?) - "typedef void (@prefix@_sem_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec);" + "typedef void (@prefix@_sem_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem, int tick, @prefix@::write_stacks &buf);" "typedef sem_status (@prefix@_sem_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem);") "\n" - (if (with-parallel?) - "typedef sem_status (@prefix@_write_fn) (@cpu@_cpu* cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec);" - "") "\n" " // Instruction descriptor. @@ -192,12 +165,6 @@ struct @prefix@_idesc { @prefix@_sem_fn* execute;\n\n" "") - (if (with-parallel?) - "\ - // scache write executor for this insn - @prefix@_write_fn* writeback;\n\n" - "") - "\ const char* insn_name; enum @prefix@_insn_type sem_index; @@ -300,15 +267,6 @@ struct @prefix@_scache { // argument buffer @prefix@_sem_fields fields; -" (if (or (with-profile?) (with-parallel-write?)) - (string-append " - // writeback flags - // Only used if profiling or parallel execution support enabled during - // file generation. - unsigned long long written; -") - "") " - // decode given instruction void decode (@cpu@_cpu* current_cpu, PCADDR pc, @prefix@_insn_word base_insn, @prefix@_insn_word entire_insn); }; @@ -718,6 +676,11 @@ void #ifndef @PREFIX@_DECODE_H #define @PREFIX@_DECODE_H +namespace @prefix@ { +// forward declaration of struct in -defs.h +struct write_stacks; +} + namespace @cpu@ { using namespace cgen; @@ -739,10 +702,6 @@ typedef UINT @prefix@_insn_word; ; There's no pressing need for it though. (if (with-scache?) -gen-sem-fn-decls - "") - - (if (with-parallel?) - -gen-write-fn-decls "") "\ Index: sid-cpu.scm =================================================================== RCS file: /cvs/src/src/cgen/sid-cpu.scm,v retrieving revision 1.7 diff -u -p -r1.7 sid-cpu.scm --- sid-cpu.scm 7 Feb 2002 18:46:19 -0000 1.7 +++ sid-cpu.scm 9 Jan 2003 03:22:23 -0000 @@ -199,6 +199,34 @@ namespace @arch@ { (-gen-hardware-struct #f (find hw-need-storage? (current-hw-list)))) ) +(define (-gen-hw-stream-and-destream-fns) + (let* ((sa string-append) + (regs (find hw-need-storage? (current-hw-list))) + (reg-dim (lambda (r) + (let ((dims (-hw-vector-dims r))) + (if (equal? 0 (length dims)) + "0" + (number->string (car dims)))))) + (stream-reg (lambda (r) + (let ((rname (sa "hardware." (gen-c-symbol (obj:name r))))) + (if (hw-scalar? r) + (sa " ost << " rname " << ' ';\n") + (sa " for (int i = 0; i < " (reg-dim r) + "; i++)\n ost << " rname "[i] << ' ';\n"))))) + (destream-reg (lambda (r) + (let ((rname (sa "hardware." (gen-c-symbol (obj:name r))))) + (if (hw-scalar? r) + (sa " ist >> " rname ";\n") + (sa " for (int i = 0; i < " (reg-dim r) + "; i++)\n ist >> " rname "[i];\n")))))) + (sa + " void stream_cgen_hardware (std::ostream &ost) const \n {\n" + (string-map stream-reg regs) + " }\n" + " void destream_cgen_hardware (std::istream &ist) \n {\n" + (string-map destream-reg regs) + " }\n"))) + ; Generate -cpu.h (define (cgen-cpu.h) @@ -222,6 +250,8 @@ public: -gen-hardware-types + -gen-hw-stream-and-destream-fns + " // C++ register access function templates\n" "#define current_cpu this\n\n" (lambda () @@ -295,68 +325,161 @@ typedef struct { ) ) -; Utility of gen-parallel-exec-type to generate the definition of one -; structure in PAREXEC. -; SFMT is an object. -(define (gen-parallel-exec-elm sfmt) - (string-append - " struct { /* " (obj:comment sfmt) " */\n" - (let ((sem-ops - ((if (with-parallel-write?) sfmt-out-ops sfmt-in-ops) sfmt))) - (if (null? sem-ops) - " int empty;\n" - (string-map - (lambda (op) - (logit 2 "Processing operand " (obj:name op) " of format " - (obj:name sfmt) " ...\n") - (if (with-parallel-write?) - (let ((index-type (and (op-save-index? op) - (gen-index-type op sfmt)))) - (string-append " " (gen-type op) - " " (gen-sym op) ";\n" - (if index-type - (string-append " " index-type - " " (gen-sym op) "_idx;\n") - ""))) - (string-append " " - (gen-type op) - " " - (gen-sym op) - ";\n"))) - sem-ops))) - " } " (gen-sym sfmt) ";\n" - ) -) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; begin stack-based write schedule +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + +(define useful-mode-names '(BI QI HI SI DI UQI UHI USI UDI SF DF)) + +;(define (-calculated-memory-write-buffer-size) +; (let* ((is-mem? (lambda (op) (eq? (hw-sem-name (op:type op)) 'h-memory))) +; (count-mem-writes +; (lambda (sfmt) (length (find is-mem? (sfmt-out-ops sfmt)))))) +; (apply max (append '(0) (map count-mem-writes (current-sfmt-list)))))) + + +;; note: this doesn't really correctly approximate the worst case. user-supplied functions +;; might rewrite the pipeline extensively while it's running. +;(define (-worst-case-number-of-writes-to hw-name) +; (let* ((sfmts (current-sfmt-list)) +; (out-ops (map sfmt-out-ops sfmts)) +; (pred (lambda (op) (equal? hw-name (gen-c-symbol (obj:name (op:type op)))))) +; (filtered-ops (map (lambda (ops) (find pred ops)) out-ops))) +; (apply max (cons 0 (map (lambda (ops) (length ops)) filtered-ops))))) + +(define (-hw-gen-write-stack-decl nm mode) + (let* ( +; for the time being, we're disabling this size-estimation stuff and just +; requiring the user to supply a parameter WRITE_BUF_SZ before they include -defs.h +; (pipe-sz (+ 1 (max-delay (cpu-max-delay (current-cpu))))) +; (sz (* pipe-sz (-worst-case-number-of-writes-to nm)))) + + (mode-pad (spaces (- 4 (string-length mode)))) + (stack-name (string-append nm "_writes"))) + (string-append + " write_stack< write<" mode "> >" mode-pad "\t" stack-name "\t[pipe_sz];\n"))) + + +(define (-hw-gen-write-struct-decl) + (let* ((dims (-worst-case-index-dims)) + (sa string-append) + (ns number->string) + (idxs (seq 0 (- dims 1))) + (ctor (sa "write (PCADDR _pc, MODE _val" + (string-map (lambda (x) (sa ", USI _idx" (ns x) "=0")) idxs) + ") : pc(_pc), val(_val)" + (string-map (lambda (x) (sa ", idx" (ns x) "(_idx" (ns x) ")")) idxs) + " {} \n")) + (idx-fields (string-map (lambda (x) (sa " USI idx" (ns x) ";\n")) idxs))) + (sa + "\n\n" + " template \n" + " struct write\n" + " {\n" + " USI pc;\n" + " MODE val;\n" + idx-fields + " " ctor + " write() {}\n" + " };\n" ))) + +(define (-hw-vector-dims hw) (elm-get (hw-type hw) 'dimensions)) +(define (-worst-case-index-dims) + (apply max + (append '(1) ; for memory accesses + (map (lambda (hw) (length (-hw-vector-dims hw))) + (find (lambda (hw) (not (scalar? hw))) (current-hw-list)))))) + +(define (-gen-writestacks) + (let* ((hw (find register? (current-hw-list))) + (modes useful-mode-names) + (hw-pairs (map (lambda (h) (list (gen-c-symbol (obj:name h)) + (obj:name (hw-mode h)))) + hw)) + (mem-pairs (map (lambda (m) (list (string-append m "_memory") m)) + modes)) + (all-pairs (append mem-pairs hw-pairs)) + + (h1 "\n\n// write stacks used in parallel execution\n\n struct write_stacks\n {\n // types of stacks\n\n") + (wb (string-append + "\n\n // unified writeback function (defined in @prefix@-write.cc)" + "\n void writeback (int tick, @cpu@::@cpu@_cpu* current_cpu);" + "\n // unified write-stack clearing function (defined in @prefix@-write.cc)" + "\n void reset ();")) + (zz "\n\n }; // end struct @prefix@::write_stacks \n\n") + (st (string-append + " std::ostream &operator<< (std::ostream &ost, const @prefix@::write_stacks &s);\n" + " std::istream &operator>> (std::istream &ist, @prefix@::write_stacks &s);\n")) + ) + (string-append + (-hw-gen-write-struct-decl) + (foldl (lambda (s pair) (string-append s (apply -hw-gen-write-stack-decl pair))) h1 all-pairs) + wb + zz + st))) + + +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; +;;; end stack-based write schedule +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; + ; Generate the definition of the structure that holds register values, etc. -; for use during parallel execution. When instructions are executed parallelly -; either -; - their inputs are read before their outputs are written. Thus we have to -; fetch the input values of several instructions before executing any of them. -; - or their outputs are queued here first and then written out after all insns -; have executed. -; The fetched/queued values are stored in an array of PAREXEC structs, one -; element per instruction. +; for use during parallel execution. -(define (gen-parallel-exec-type) - (logit 2 "Generating PAREXEC type ...\n") - (string-append - (if (with-parallel-write?) - "/* Queued output values of an instruction. */\n" - "/* Fetched input values of an instruction. */\n") - "\ +(define (gen-write-stack-structure) + (let (;(membuf-sz (-calculated-memory-write-buffer-size)) + (max-delay (cpu-max-delay (current-cpu)))) + (logit 2 "Generating write stack structure ...\n") + (string-append + " static const int max_delay = " + (number->string max-delay) ";\n" + " static const int pipe_sz = " + (number->string (+ 1 max-delay)) "; // max_delay + 1\n" -struct @prefix@_parexec { - union {\n" - (string-map gen-parallel-exec-elm (current-sfmt-list)) - "\ - } operands; - /* For conditionally written operands, bitmask of which ones were. */ - unsigned long long written; -};\n\n" - ) -) +" +#ifndef WRITE_BUF_SZ +#define WRITE_BUF_SZ 1 +#endif + + template + struct write_stack + { + int t; + const int sz; + ELT buf[WRITE_BUF_SZ]; + + write_stack () : t(-1), sz(WRITE_BUF_SZ) {} + inline bool empty () { return (t == -1); } + inline void clear () { t = -1; } + inline void pop () { assert (t > -1); t--;} + inline void push (const ELT &e) { assert (t+1 < sz); buf [++t] = e;} + inline ELT &top () { return buf [t>0 ? ( t + inline VAL lookahead (int dist, int base, STKS &st, VAL def, int idx=0) + { + for (; dist > 0; --dist) + { + write_stack &v = st [(base + dist) % pipe_sz]; + for (int i = v.t; i > 0; --i) + if (v.buf [i].idx0 == idx) return v.buf [i]; + } + return def; + } + +" + + (-gen-writestacks) + ))) ; Generate the TRACE_RECORD struct definition. @@ -375,16 +498,26 @@ typedef struct @prefix@_trace_record { ; Generate -defs.h +(define semantics-processed? #f) + (define (cgen-defs.h) (logit 1 "Generating " (gen-cpu-name) " defs.h ...\n") (assert-keep-one) - + ; Turn parallel execution support on if cpu needs it. (set-with-parallel?! (state-parallel-exec?)) ; Initialize rtl->c generation. (rtl-c-config! #:rtl-cover-fns? #t) + (sim-analyze-insns!) + + ; ensure semantc analysis has happened, in time + ; for the pipeline size to be calculated + (if (and (with-parallel?) + (not semantics-processed?)) + (error "defs.h must be generated after sem.cxx for parallel-execution type CPUs")) + (string-write (gen-copyright "CPU family header for @cpu@ / @prefix@." copyright-red-hat package-red-hat-simulators) @@ -392,15 +525,26 @@ typedef struct @prefix@_trace_record { #ifndef DEFS_@PREFIX@_H #define DEFS_@PREFIX@_H +#include +#include \"cgen-types.h\" + +// forward declaration\n\n namespace @cpu@ { +struct @cpu@_cpu; +} + +namespace @prefix@ { + +using namespace cgen; + \n" (if (with-parallel?) - gen-parallel-exec-type - "") + gen-write-stack-structure + "// no parallel-execution support\n") "\ -} // end @cpu@ namespace +} // end @prefix@ namespace #endif /* DEFS_@PREFIX@_H */\n" ) @@ -417,47 +561,132 @@ namespace @cpu@ { ; Return C code to fetch and save all output operands to instructions with ; SFMT. -(define (-gen-write-args sfmt) - (string-map (lambda (op) (op:write op sfmt)) - (sfmt-out-ops sfmt)) -) +; Generate -write.cxx. -; Utility of gen-write-fns to generate a writer function for SFMT. -(define (-gen-write-fn sfmt) - (logit 2 "Processing write function for \"" (obj:name sfmt) "\" ...\n") - (string-list - "\nsem_status\n" - (-gen-write-fn-name sfmt) " (@cpu@_cpu* current_cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec)\n" - "{\n" - (if (with-scache?) - (gen-define-field-macro sfmt) - "") - (gen-define-parallel-operand-macro sfmt) - " @prefix@_scache* abuf = sem;\n" - " unsigned long long written = abuf->written;\n" - " PCADDR pc = abuf->addr;\n" - " PCADDR npc = 0; // dummy value for branches\n" - " sem_status status = SEM_STATUS_NORMAL; // ditto\n" - "\n" - (-gen-write-args sfmt) - "\n" - " return status;\n" - (gen-undef-parallel-operand-macro sfmt) - (if (with-scache?) - (gen-undef-field-macro sfmt) - "") - "}\n\n") -) +(define (-gen-register-writer nm mode dims) + (let* ((pad " ") + (sa string-append) + (idx-args (string-map (lambda (x) (sa "w.idx" (number->string x) ", ")) + (seq 0 (- dims 1))))) + (sa pad "while (! " nm "_writes[tick].empty())\n" + pad "{\n" + pad " write<" mode "> &w = " nm "_writes[tick].top();\n" + pad " current_cpu->" nm "_set(" idx-args "w.val);\n" + pad " " nm "_writes[tick].pop();\n" + pad "}\n\n"))) + +(define (-gen-memory-writer nm mode dims) + (let* ((pad " ") + (sa string-append) + (idx-args (string-map (lambda (x) (sa ", w.idx" (number->string x) "")) + (seq 0 (- dims 1))))) + (sa pad "while (! " nm "_writes[tick].empty())\n" + pad "{\n" + pad " write<" mode "> &w = " nm "_writes[tick].top();\n" + pad " current_cpu->SETMEM" mode " (w.pc" idx-args ", w.val);\n" + pad " " nm "_writes[tick].pop();\n" + pad "}\n\n"))) + + +(define (-gen-reset-fn) + (let* ((sa string-append) + (objs (append (map (lambda (h) (gen-c-symbol (obj:name h))) + (find register? (current-hw-list))) + (map (lambda (m) (sa m "_memory")) useful-mode-names))) + (clr (lambda (elt) (sa " clear_stacks (" elt "_writes);\n")))) + (sa + " template \n" + " static void clear_stacks (ST &st)\n" + " {\n" + " for (int i = 0; i < @prefix@::pipe_sz; i++)\n" + " st[i].clear();\n" + " }\n\n" + " void @prefix@::write_stacks::reset ()\n {\n" + (string-map clr objs) + " }"))) + +(define (-gen-unified-write-fn) + (let* ((hw (find register? (current-hw-list))) + (modes useful-mode-names) + (hw-triples (map (lambda (h) (list (gen-c-symbol (obj:name h)) + (obj:name (hw-mode h)) + (length (-hw-vector-dims h)))) + hw)) + (mem-triples (map (lambda (m) (list (string-append m "_memory") m 1)) + modes))) -(define (-gen-write-fns) - (logit 2 "Processing writer functions ...\n") - (string-write-map (lambda (sfmt) (-gen-write-fn sfmt)) - (current-sfmt-list)) -) + (logit 2 "Generating writer function ...\n") + (string-append + " + + void @prefix@::write_stacks::writeback (int tick, @cpu@::@cpu@_cpu* current_cpu) + { +" + "\n // register writeback loops\n" + (string-map (lambda (t) (apply -gen-register-writer t)) hw-triples) + "\n // memory writeback loops\n" + (string-map (lambda (t) (apply -gen-memory-writer t)) mem-triples) +" + } +"))) -; Generate -write.cxx. +(define (-gen-stacks-stream-and-destream-fns) + (let* ((sa string-append) + (regs (find hw-need-storage? (current-hw-list))) + (reg-dim (lambda (r) + (let ((dims (-hw-vector-dims r))) + (if (equal? 0 (length dims)) + "0" + (number->string (car dims)))))) + (write-stacks + (map (lambda (n) (sa n "_writes")) + (append (map (lambda (r) (gen-c-symbol (obj:name r))) regs) + (map (lambda (m) (sa m "_memory")) useful-mode-names)))) + (stream-stacks (lambda (s) (sa " stream_stacks ( s." s ", ost);\n"))) + (destream-stacks (lambda (s) (sa " destream_stacks ( s." s ", ist);\n"))) + (stack-boilerplate + (sa + " template \n" + " void stream_stacks (const ST &st, std::ostream &ost)\n" + " {\n" + " for (int i = 0; i < @prefix@::pipe_sz; i++)\n" + " {\n" + " ost << st[i].t << ' ';\n" + " for (int j = 0; j <= st[i].t; j++)\n" + " {\n" + " ost << st[i].buf[j].pc << ' ';\n" + " ost << st[i].buf[j].val << ' ';\n" + " ost << st[i].buf[j].idx0 << ' ';\n" + " }\n" + " }\n" + " }\n" + " \n" + " template \n" + " void destream_stacks (ST &st, std::istream &ist)\n" + " {\n" + " for (int i = 0; i < @prefix@::pipe_sz; i++)\n" + " {\n" + " ist >> st[i].t;\n" + " for (int j = 0; j <= st[i].t; j++)\n" + " {\n" + " ist >> st[i].buf[j].pc;\n" + " ist >> st[i].buf[j].val;\n" + " ist >> st[i].buf[j].idx0;\n" + " }\n" + " }\n" + " }\n" + " \n"))) + (sa stack-boilerplate + " std::ostream & @prefix@::operator<< (std::ostream &ost, const @prefix@::write_stacks &s)\n {\n" + (string-map stream-stacks write-stacks) + "\n return ost;\n" + " }\n" + " std::istream & @prefix@::operator>> (std::istream &ist, @prefix@::write_stacks &s)\n {\n" + (string-map destream-stacks write-stacks) + "\n return ist;\n" + " }\n"))) (define (cgen-write.cxx) (logit 1 "Generating " (gen-cpu-name) " write.cxx ...\n") @@ -465,8 +694,8 @@ namespace @cpu@ { (sim-analyze-insns!) - ; Turn parallel execution support off. - (set-with-parallel?! #f) + ; Turn parallel execution support on if needed. + (set-with-parallel?! (state-parallel-exec?)) ; Tell the rtx->c translator we are the simulator. (rtl-c-config! #:rtl-cover-fns? #t) @@ -478,12 +707,18 @@ namespace @cpu@ { "\ #include \"@cpu@.h\" -using namespace @cpu@; - +#include " - -gen-write-fns + (if (with-parallel?) + (string-append + (-gen-reset-fn) + (-gen-unified-write-fn) + (-gen-stacks-stream-and-destream-fns)) + + "// no write-stack functions required\n") ) ) + ; ****************** ; cgen-semantics.cxx @@ -521,19 +756,14 @@ using namespace @cpu@; "sem_status\n") "@prefix@_sem_" (gen-sym insn) (if (with-parallel?) - " (@cpu@_cpu* current_cpu, @prefix@_scache* sem, @prefix@_parexec* par_exec)\n" + (string-append " (@cpu@_cpu* current_cpu, @prefix@_scache* sem, const int tick, \n\t" + "@prefix@::write_stacks &buf)\n") " (@cpu@_cpu* current_cpu, @prefix@_scache* sem)\n") "{\n" (gen-define-field-macro (insn-sfmt insn)) - (if (with-parallel?) - (gen-define-parallel-operand-macro (insn-sfmt insn)) - "") " sem_status status = SEM_STATUS_NORMAL;\n" " @prefix@_scache* abuf = sem;\n" - ; Unconditionally written operands are not recorded here. - (if (or (with-profile?) (with-parallel-write?)) - " unsigned long long written = 0;\n" - "") + ; The address of this insn, needed by extraction and semantic code. ; Note that the address recorded in the cpu state struct is not used. ; For faster engines that copy will be out of date. @@ -542,23 +772,12 @@ using namespace @cpu@; "\n" (gen-semantic-code insn) "\n" - ; Only update what's been written if some are conditionally written. - ; Otherwise we know they're all written so there's no point in - ; keeping track. - (if (or (with-profile?) (with-parallel-write?)) - (if (-any-cond-written? (insn-sfmt insn)) - " abuf->written = written;\n" - "") - "") (if cti? " current_cpu->done_cti_insn (npc, status);\n" " current_cpu->done_insn (npc, status);\n") (if (with-parallel?) "" " return status;\n") - (if (with-parallel?) - (gen-undef-parallel-operand-macro (insn-sfmt insn)) - "") (gen-undef-field-macro (insn-sfmt insn)) "}\n\n" )) @@ -576,13 +795,14 @@ using namespace @cpu@; ; Each instruction is implemented in its own function. (define (cgen-semantics.cxx) - (logit 1 "Generating " (gen-cpu-name) " semantics.cxx ...\n") + (logit 1 "Generating " (gen-cpu-name) " semantics.cxx ") (assert-keep-one) (sim-analyze-insns!) ; Turn parallel execution support on if cpu needs it. (set-with-parallel?! (state-parallel-exec?)) + (logit 1 (if (state-parallel-exec?) " (parallel) ...\n" "...\n")) ; Tell the rtx->c translator we are the simulator. (rtl-c-config! #:rtl-cover-fns? #t) @@ -590,6 +810,8 @@ using namespace @cpu@; ; Indicate we're currently not generating a pbb engine. (set-current-pbb-engine?! #f) + (set! semantics-processed? #t) + (string-write (gen-copyright "Simulator instruction semantics for @prefix@." copyright-red-hat package-red-hat-simulators) @@ -598,6 +820,7 @@ using namespace @cpu@; #include \"@cpu@.h\" using namespace @cpu@; // FIXME: namespace organization still wip +using namespace @prefix@; // FIXME: namespace organization still wip #define GET_ATTR(name) GET_ATTR_##name () @@ -655,9 +878,6 @@ using namespace @cpu@; // FIXME: namespa (if (with-scache?) (gen-define-field-macro (insn-sfmt insn)) "") - (if parallel? - (gen-define-parallel-operand-macro (insn-sfmt insn)) - "") ; Unconditionally written operands are not recorded here. (if (or (with-profile?) (with-parallel-write?)) " unsigned long long written = 0;\n" @@ -694,9 +914,6 @@ using namespace @cpu@; // FIXME: namespa (string-append " pbb_br_npc = npc;\n" " pbb_br_status = br_status;\n") "") - (if parallel? - (gen-undef-parallel-operand-macro (insn-sfmt insn)) - "") (if (with-scache?) (gen-undef-field-macro (insn-sfmt insn)) "") @@ -950,9 +1167,6 @@ struct @prefix@_pbb_label { " vpc = vpc + 1;\n") "") (gen-define-field-macro (sfrag-sfmt frag)) - (if parallel? - (gen-define-parallel-operand-macro (sfrag-sfmt frag)) - "") ; Unconditionally written operands are not recorded here. (if (or (with-profile?) (with-parallel-write?)) " unsigned long long written = 0;\n" @@ -992,9 +1206,6 @@ struct @prefix@_pbb_label { (sfrag-trailer? frag)) (string-append " pbb_br_npc = npc;\n" " pbb_br_status = br_status;\n") - "") - (if parallel? - (gen-undef-parallel-operand-macro (sfrag-sfmt frag)) "") (gen-undef-field-macro (sfrag-sfmt frag)) " }\n" Index: rtl-c.scm =================================================================== RCS file: /cvs/src/src/cgen/rtl-c.scm,v retrieving revision 1.4 diff -u -p -r1.4 rtl-c.scm --- rtl-c.scm 8 Sep 2000 22:18:37 -0000 1.4 +++ rtl-c.scm 9 Jan 2003 03:22:25 -0000 @@ -1304,7 +1304,23 @@ "bad arg to `operand'" object-or-name))) ) -(define-fn xop (estate options mode object) object) +(define-fn xop (estate options mode object) + (let ((delayed (assoc '#:delay (estate-modifiers estate)))) + (if (and delayed + (equal? APPLICATION 'SID-SIMULATOR) + (operand? object)) + ;; if we're looking at an operand inside a (delay ...) rtx, then we + ;; are talking about a _delayed_ operand, which is a different + ;; beast. rather than try to work out what context we were + ;; constructed within, we just clone the operand instance and set + ;; the new one to have a delayed value. the setters and getters + ;; will work it out. + (let ((obj (object-copy object)) + (amount (cadr delayed))) + (op:set-delay! obj amount) + obj) + ;; else return the normal object + object))) (define-fn local (estate options mode object-or-name) (cond ((rtx-temp? object-or-name) @@ -1363,9 +1379,38 @@ (cx:make VOID "; /*clobber*/\n") ) -(define-fn delay (estate options mode n rtx) - (s-sequence (estate-with-modifiers estate '((#:delay))) VOID '() rtx) ; wip! -) + +(define-fn delay (estate options mode num-node rtx) + (case APPLICATION + ((SID-SIMULATOR) + (let* ((n (cadddr num-node)) + (old-delay (let ((old (assoc '#:delay (estate-modifiers estate)))) + (if old (cadr old) 0))) + (new-delay (+ n old-delay))) + (begin + ;; check for proper usage + (if (let* ((hw (case (car rtx) + ((operand) (op:type (rtx-operand-obj rtx))) + ((xop) (op:type (rtx-xop-obj rtx))) + (else #f)))) + (not (and hw (or (pc? hw) (memory? hw) (register? hw))))) + (context-error + (estate-context estate) + (string-append + "(delay ...) rtx applied to wrong type of operand '" (car rtx) "'. should be pc, register or memory"))) + ;; signal an error if we're delayed and not in a "parallel-insns" CPU + (if (not (with-parallel?)) + (context-error + (estate-context estate) + "delayed operand in a non-parallel cpu")) + ;; update cpu-global pipeline bound + (cpu-set-max-delay! (current-cpu) (max (cpu-max-delay (current-cpu)) new-delay)) + ;; pass along new delay to embedded rtx + (rtx-eval-with-estate rtx mode (estate-with-modifiers estate `((#:delay ,new-delay))))))) + + ;; not in sid-land + (else (s-sequence (estate-with-modifiers estate '((#:delay))) VOID '() rtx)))) + ; Gets expanded as a macro. ;(define-fn annul (estate yes?) Index: operand.scm =================================================================== RCS file: /cvs/src/src/cgen/operand.scm,v retrieving revision 1.5 diff -u -p -r1.5 operand.scm --- operand.scm 20 Dec 2002 06:39:04 -0000 1.5 +++ operand.scm 9 Jan 2003 03:22:29 -0000 @@ -90,6 +90,9 @@ ; referenced. #f means the operand is always referenced by ; the instruction. (cond? . #f) + + ; whether (and by how much) this instance of the operand is delayed. + (delayed . #f) ) nil) ) @@ -135,6 +138,8 @@ (define op:set-num! (elm-make-setter 'num)) (define op:cond? (elm-make-getter 'cond?)) (define op:set-cond?! (elm-make-setter 'cond?)) +(define op:delay (elm-make-getter 'delayed)) +(define op:set-delay! (elm-make-setter 'delayed)) ; Compute the hardware type lazily. ; FIXME: op:type should be named op:hwtype or some such. Index: mach.scm =================================================================== RCS file: /cvs/src/src/cgen/mach.scm,v retrieving revision 1.2 diff -u -p -r1.2 mach.scm --- mach.scm 12 Jul 2001 02:32:25 -0000 1.2 +++ mach.scm 9 Jan 2003 03:22:31 -0000 @@ -755,8 +755,7 @@ (apply min (cons 65535 (map insn-length (find (lambda (insn) (and (not (has-attr? insn 'ALIAS)) - (eq? (obj-attr-value insn 'ISA) - (obj:name isa)))) + (isa-supports? isa insn))) (non-multi-insns (current-insn-list)))))) ) @@ -765,9 +764,8 @@ ; [a language with infinite precision can't have max-reduce-iota-0 :-)] (apply max (cons 0 (map insn-length (find (lambda (insn) - (and (not (has-attr? insn 'ALIAS)) - (eq? (obj-attr-value insn 'ISA) - (obj:name isa)))) + (and (not (has-attr? insn 'ALIAS)) + (isa-supports? isa insn))) (non-multi-insns (current-insn-list)))))) ) @@ -1008,13 +1006,19 @@ ; Allow a cpu family to override the isa parallel-insns spec. ; ??? Concession to the m32r port which can go away, in time. parallel-insns + + ; Computed: maximum number of insns which may pass before there + ; an insn writes back its output operands. + max-delay + ) nil) ) ; Accessors. -(define-getters cpu (word-bitsize insn-chunk-bitsize file-transform parallel-insns)) +(define-getters cpu (word-bitsize insn-chunk-bitsize file-transform parallel-insns max-delay)) +(define-setters cpu (max-delay)) ; Return endianness of instructions. @@ -1064,7 +1068,9 @@ word-bitsize insn-chunk-bitsize file-transform - parallel-insns) + parallel-insns + 0 ; default max-delay. will compute correct value + ) (begin (logit 2 "Ignoring " name ".\n") #f))) ; cpu is not to be kept @@ -1284,13 +1290,13 @@ ; Assert only one cpu family has been selected. (assert-keep-one) - (let ((par-insns (map isa-parallel-insns (current-isa-list))) + (let ((false->zero (lambda (x) (if x x 0))) + (par-insns (map isa-parallel-insns (current-isa-list))) (cpu-par-insns (cpu-parallel-insns (current-cpu)))) ; ??? The m32r does have parallel execution, but to keep support for the ; base mach simpler, a cpu family is allowed to override the isa spec. - (or cpu-par-insns - ; FIXME: ensure all have same value. - (car par-insns))) + (max (false->zero cpu-par-insns) + (apply max (map false->zero par-insns)))) ) ; Return boolean indicating if parallel execution support is required. Index: dev.scm =================================================================== RCS file: /cvs/src/src/cgen/dev.scm,v retrieving revision 1.5 diff -u -p -r1.5 dev.scm --- dev.scm 21 Dec 2002 22:22:33 -0000 1.5 +++ dev.scm 9 Jan 2003 03:22:31 -0000 @@ -115,7 +115,7 @@ (load "sid-model") (load "sid-decode") (set! verbose-level 3) - (set! APPLICATION 'SIMULATOR) + (set! APPLICATION 'SID-SIMULATOR) ) (define (load-sim) Index: doc/rtl.texi =================================================================== RCS file: /cvs/src/src/cgen/doc/rtl.texi,v retrieving revision 1.17 diff -u -p -r1.17 rtl.texi --- doc/rtl.texi 22 Dec 2002 04:49:26 -0000 1.17 +++ doc/rtl.texi 9 Jan 2003 03:22:34 -0000 @@ -1833,7 +1833,7 @@ This is a character string consisting of Fields are denoted by @code{$operand} or @code{$@{operand@}}@footnote{Support for @code{$@{operand@}} is work-in-progress.}. If a @samp{$} is required in the syntax, it is -specified with @samp{\$}. At most one white-space character may be +specified with @samp{$$}. At most one white-space character may be present and it must be a blank separating the instruction mnemonic from the operands. This doesn't restrict the user's assembler, this is @c Is this reasonable? @@ -2257,10 +2257,39 @@ first argument. Indicate that @samp{object} is written in mode @samp{mode}, without saying how. This could be useful in conjunction with the C escape hooks. -@item (delay mode num expr) -Indicate that there are @samp{num} delay slots in the processing of -@samp{expr}. When using this rtx in instruction semantics, CGEN will -infer that the instruction has the DELAY-SLOT attribute. +@item (delay num expr) +In older "sim" simulators, indicates that there are @samp{num} delay +slots in the processing of @samp{expr}. When using this rtx in instruction +semantics, CGEN will infer that the instruction has the DELAY-SLOT +attribute. + +In newer "sid" simulators, evaluates to the writeback queue for hardware +operand @samp{expr}, at @samp{num} instruction cycles in the +future. @samp{expr} @emph{must} be a hardware operand in this case. + +For example, @code{(set (delay 3 pc) (+ pc 1))} will schedule write to +the @samp{pc} register in the writeback phase of the 3rd instruction +after the current. Alternatively, @code{(set gr1 (delay 3 gr2))} will +immediately update the @samp{gr1} register with the @emph{latest write} +to the @samp{gr2} register scheduled between the present and 3 +instructions in the future. @code{(delay 0 ...)} refers to the +writeback phase of the current instruction. + +This effect is modeled with a circular buffer of "write stacks" for each +hardware element (register banks get a single stack). The size of the +circular buffer is calculated from the uses of @code{(delay ...)} +rtxs. When a delayed write occurs, the simulator pushes the write onto +the appropriate write stack in the "future" of the circular buffer for +the written-to hardware element. At the end of each instruction cycle, +the simulator executes all writes in all write stacks for the time slice +just ending. When a delayed read (essentially a pipeline bypass) occurs, +the simulator looks ahead in the circular buffer for any writes +scheduled in the future write stack. If it doesn't find one, it +progressively backs off towards the "current" instruction cycle's write +stack, and if it still finds no scheduled writes then it returns the +current state of the CPU. Thus while delayed writes are fast, delayed +reads are potentially slower in a simulator with long pipelines and very +large register banks. @item (annul yes?) @c FIXME: put annul into the glossary.