[PATCH]: GCC Scheduler support for R10000 on MIPS

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH]: GCC Scheduler support for R10000 on MIPS
@ 2008-08-01  1:53 Kumba
  2008-08-02  4:29 ` Kumba
  2008-08-02  9:48 ` Richard Sandiford
  0 siblings, 2 replies; 22+ messages in thread
From: Kumba @ 2008-08-01  1:53 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Sandiford, mips


Hi all,

Included is a patch that adds support to GCC for scheduling on the R10000 MIPS 
processor family, including the R12000 & R14000 (which I've tested on for quite 
some time).  R16000 should similarly function quite well with this patch, as 
they're all just die shrinks of the original R10000.

It adds the 'r10000' through 'r16000' params (which all mean the same thing) to 
the -march and -mtune arguments.

I've sent this up before, but I believe it just got lost in the shuffle.  Is 
there anything else that I can do to assist in getting this into GCC?


Thanks!,

Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org



gcc/
	* config/mips/10000.md: Add R10000 scheduler
	* config/mips/mips.c: Add r1x000 params & costs
	* config/mips/mips.h: Add constants
	* config/mips/mips.md: Add r1x000 params & incl 10000.md


diff -Naurp gcc-4.3.1/gcc/config/mips/10000.md 
gcc-4.3.1.r10k/gcc/config/mips/10000.md
--- gcc-4.3.1/gcc/config/mips/10000.md	1969-12-31 19:00:00.000000000 -0500
+++ gcc-4.3.1.r10k/gcc/config/mips/10000.md	2008-07-28 01:07:41.000000000 -0400
@@ -0,0 +1,246 @@
+;; VR1x000 pipeline description.
+;;   Copyright (C) 2005, 2006 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 2, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING.  If not, write to the
+;; Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
+;; MA 02110-1301, USA.
+
+
+;; This file overrides parts of generic.md.  It is derived from the
+;; old define_function_unit description.
+
+
+
+;; R12K/R14K/R16K are derivatives of R10K, thus copy its description
+;; until specific tuning for each is added
+
+
+;; R10000 has int queue, fp queue, address queue
+(define_automaton "r10k_int, r10k_fp, r10k_addr")
+
+;; R10000 has 2 integer ALUs, fp-adder and fp-multiplier, load/store
+(define_cpu_unit "r10k_alu1" "r10k_int")
+(define_cpu_unit "r10k_alu2" "r10k_int")
+(define_cpu_unit "r10k_fpadd" "r10k_fp")
+(define_cpu_unit "r10k_fpmpy" "r10k_fp")
+(define_cpu_unit "r10k_loadstore" "r10k_addr")
+
+;; R10000 has separate fp-div and fp-sqrt units as well and these can
+;; execute in parallel, however their issue & completion logic is shared
+;; by the fp-multiplier
+(define_cpu_unit "r10k_fpdiv" "r10k_fp")
+(define_cpu_unit "r10k_fpsqrt" "r10k_fp")
+
+
+
+
+;; loader
+(define_insn_reservation "r10k_load" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "load,prefetch,prefetchx"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_store" 0
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "store,fpstore,fpidxstore"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_fpload" 3
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fpload,fpidxload"))
+  "r10k_loadstore")
+
+
+
+
+;; Integer add/sub + logic ops, and mf/mt hi/lo can be done by alu1 or alu2
+;; Miscellaneous arith goes here too (this is a guess)
+(define_insn_reservation "r10k_arith" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "arith,mfhilo,mthilo,slt,clz,const,nop,trap"))
+  "r10k_alu1 | r10k_alu2")
+
+
+
+
+;; ALU1 handles shifts, branch eval, and condmove
+;;
+;; Brancher is separate, but part of ALU1, but can only
+;; do one branch per cycle (needs implementing??)
+;;
+;; jump, call - unsure if brancher handles these too (added for now)
+(define_insn_reservation "r10k_shift" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "shift,branch,jump,call"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_int_cmove" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SI,DI")))
+  "r10k_alu1")
+
+
+
+
+;; Coprocessor Moves
+;; mtc1/dmtc1 are handled by ALU1
+;; mfc1/dmfc1 are handled by the fp-multiplier
+(define_insn_reservation "r10k_mt_xfer" 3
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "mtc"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_mf_xfer" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "mfc"))
+  "r10k_fpmpy")
+
+
+
+
+;; Only ALU2 does int multiplications and divisions
+;; R10K allows an int insn using register Lo to be issued
+;; one cycle earlier than an insn using register Hi for
+;; the insns below, however, we skip on doing this
+;; for now until correct usage of lo_operand() is figured
+;; out.
+;;
+;; Divides keep ALU2 busy, but this isn't expressed here (I think...?)
+(define_insn_reservation "r10k_imul_single" 6
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "imul,imul3,imadd")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 6")
+
+(define_insn_reservation "r10k_imul_double" 10
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "imul,imul3,imadd")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 10")
+
+(define_insn_reservation "r10k_idiv_single" 35
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 35")
+
+(define_insn_reservation "r10k_idiv_double" 67
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 67")
+
+
+
+
+;; FP add/sub, mul, abs value, neg, comp, & moves
+(define_insn_reservation "r10k_fp_miscadd" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fadd,fabs,fneg,fcmp"))
+  "r10k_fpadd")
+
+(define_insn_reservation "r10k_fp_miscmul" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fmul,fmove"))
+  "r10k_fpmpy")
+
+(define_insn_reservation "r10k_fp_cmove" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SF,DF")))
+  "r10k_fpmpy")
+
+
+
+
+;; fcvt.s.[wl] has latency 4, repeat 2
+;; All other fcvt have latency 2, repeat 1
+(define_insn_reservation "r10k_fcvt_single" 4
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "I2S")))
+  "r10k_fpadd * 2")
+
+(define_insn_reservation "r10k_fcvt_other" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "!I2S")))
+  "r10k_fpadd")
+
+
+
+
+;; fmadd -  Runs through fp-adder first, then fp-multiplier
+;;
+;; The latency for fmadd is 2 cycles if the result is used
+;; by another fmadd instruction
+(define_insn_reservation "r10k_fmadd" 4
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fmadd"))
+  "r10k_fpadd, r10k_fpmpy")
+
+(define_bypass 2 "r10k_fmadd" "r10k_fmadd")
+
+
+
+
+;; fp Divisions & square roots
+(define_insn_reservation "r10k_fdiv_single" 12
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "SF")))
+  "r10k_fpdiv * 14")
+
+(define_insn_reservation "r10k_fdiv_double" 19
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "DF")))
+  "r10k_fpdiv * 21")
+
+(define_insn_reservation "r10k_fsqrt_single" 18
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_fsqrt_double" 33
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+(define_insn_reservation "r10k_frsqrt_single" 30
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_frsqrt_double" 52
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+
+
+
+;; Unknown/multi (this is a guess)
+(define_insn_reservation "r10k_unknown" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "unknown,multi"))
+  "r10k_alu1 + r10k_alu2")
+
diff -Naurp gcc-4.3.1/gcc/config/mips/mips.c gcc-4.3.1.r10k/gcc/config/mips/mips.c
--- gcc-4.3.1/gcc/config/mips/mips.c	2008-06-04 14:29:51.000000000 -0400
+++ gcc-4.3.1.r10k/gcc/config/mips/mips.c	2008-07-28 01:07:42.000000000 -0400
@@ -587,6 +587,10 @@ static const struct mips_cpu_info mips_c

    /* MIPS IV processors. */
    { "r8000", PROCESSOR_R8000, 4, 0 },
+  { "r10000", PROCESSOR_R10000, 4, 0 },
+  { "r12000", PROCESSOR_R12000, 4, 0 },
+  { "r14000", PROCESSOR_R14000, 4, 0 },
+  { "r16000", PROCESSOR_R16000, 4, 0 },
    { "vr5000", PROCESSOR_R5000, 4, 0 },
    { "vr5400", PROCESSOR_R5400, 4, 0 },
    { "vr5500", PROCESSOR_R5500, 4, PTF_AVOID_BRANCHLIKELY },
@@ -975,6 +979,58 @@ static const struct mips_rtx_cost_data m
  		     1,           /* branch_cost */
  		     4            /* memory_latency */
    },
+  { /* R10000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
+  { /* R12000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
+  { /* R14000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
+  { /* R16000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
    { /* SB1 */
      /* These costs are the same as the SB-1A below.  */
      COSTS_N_INSNS (4),            /* fp_add */
@@ -9540,7 +9596,13 @@ mips_issue_rate (void)
  	 but in reality only a maximum of 3 insns can be issued as
  	 floating-point loads and stores also require a slot in the
  	 AGEN pipe.  */
-     return 4;
+    case PROCESSOR_R10000:
+    case PROCESSOR_R12000:
+    case PROCESSOR_R14000:
+    case PROCESSOR_R16000:
+      /* All R10K Processors are quad-issue (being the first MIPS
+         processors to support this feature). */
+      return 4;

      case PROCESSOR_20KC:
      case PROCESSOR_R4130:
diff -Naurp gcc-4.3.1/gcc/config/mips/mips.h gcc-4.3.1.r10k/gcc/config/mips/mips.h
--- gcc-4.3.1/gcc/config/mips/mips.h	2008-01-26 05:22:14.000000000 -0500
+++ gcc-4.3.1.r10k/gcc/config/mips/mips.h	2008-07-28 01:07:42.000000000 -0400
@@ -64,6 +64,10 @@ enum processor_type {
    PROCESSOR_R7000,
    PROCESSOR_R8000,
    PROCESSOR_R9000,
+  PROCESSOR_R10000,
+  PROCESSOR_R12000,
+  PROCESSOR_R14000,
+  PROCESSOR_R16000,
    PROCESSOR_SB1,
    PROCESSOR_SB1A,
    PROCESSOR_SR71000,
@@ -234,6 +238,10 @@ enum mips_code_readable_setting {
  #define TARGET_MIPS5500             (mips_arch == PROCESSOR_R5500)
  #define TARGET_MIPS7000             (mips_arch == PROCESSOR_R7000)
  #define TARGET_MIPS9000             (mips_arch == PROCESSOR_R9000)
+#define TARGET_MIPS10000            (mips_arch == PROCESSOR_R10000)
+#define TARGET_MIPS12000            (mips_arch == PROCESSOR_R12000)
+#define TARGET_MIPS14000            (mips_arch == PROCESSOR_R14000)
+#define TARGET_MIPS16000            (mips_arch == PROCESSOR_R16000)
  #define TARGET_SB1                  (mips_arch == PROCESSOR_SB1		\
  				     || mips_arch == PROCESSOR_SB1A)
  #define TARGET_SR71K                (mips_arch == PROCESSOR_SR71000)
@@ -250,6 +258,10 @@ enum mips_code_readable_setting {
  #define TUNE_MIPS6000               (mips_tune == PROCESSOR_R6000)
  #define TUNE_MIPS7000               (mips_tune == PROCESSOR_R7000)
  #define TUNE_MIPS9000               (mips_tune == PROCESSOR_R9000)
+#define TUNE_MIPS10000              (mips_tune == PROCESSOR_R10000)
+#define TUNE_MIPS12000              (mips_tune == PROCESSOR_R12000)
+#define TUNE_MIPS14000              (mips_tune == PROCESSOR_R14000)
+#define TUNE_MIPS16000              (mips_tune == PROCESSOR_R16000)
  #define TUNE_SB1                    (mips_tune == PROCESSOR_SB1		\
  				     || mips_tune == PROCESSOR_SB1A)
  #define TUNE_24K		    (mips_tune == PROCESSOR_24KC	\
diff -Naurp gcc-4.3.1/gcc/config/mips/mips.md gcc-4.3.1.r10k/gcc/config/mips/mips.md
--- gcc-4.3.1/gcc/config/mips/mips.md	2008-06-04 14:29:51.000000000 -0400
+++ gcc-4.3.1.r10k/gcc/config/mips/mips.md	2008-07-28 01:07:43.000000000 -0400
@@ -411,7 +411,7 @@
  ;; Attribute describing the processor.  This attribute must match exactly
  ;; with the processor_type enumeration in mips.h.
  (define_attr "cpu"
- 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000"
+ 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,r10000,r12000,r14000,r16000,sb1,sb1a,sr71000"
    (const (symbol_ref "mips_tune")))

  ;; The type of hardware hazard associated with this instruction.
@@ -718,6 +718,7 @@
  (include "6000.md")
  (include "7000.md")
  (include "9000.md")
+(include "10000.md")
  (include "sb1.md")
  (include "sr71k.md")
  (include "generic.md")


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-01  1:53 [PATCH]: GCC Scheduler support for R10000 on MIPS Kumba
@ 2008-08-02  4:29 ` Kumba
  2008-08-02  9:48 ` Richard Sandiford
  1 sibling, 0 replies; 22+ messages in thread
From: Kumba @ 2008-08-02  4:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: rsandifo, mips

Kumba wrote:
> 
> Hi all,
> 
> Included is a patch that adds support to GCC for scheduling on the 
> R10000 MIPS processor family, including the R12000 & R14000 (which I've 
> tested on for quite some time).  R16000 should similarly function quite 
> well with this patch, as they're all just die shrinks of the original 
> R10000.
> 
> It adds the 'r10000' through 'r16000' params (which all mean the same 
> thing) to the -march and -mtune arguments.
> 
> I've sent this up before, but I believe it just got lost in the 
> shuffle.  Is there anything else that I can do to assist in getting this 
> into GCC?


My bad, I sent the wrong patch up.  That one was against gcc-4.3.1.  Here's the 
version against gcc-trunk.

gcc/
	* config/mips/10000.md: Add R10000 scheduler
	* config/mips/mips.c: Add r1x000 params & costs
	* config/mips/mips.h: Add constants
	* config/mips/mips.md: Add r1x000 params & incl 10000.md


diff -Naurp gcc.orig/gcc/config/mips/10000.md gcc/gcc/config/mips/10000.md
--- gcc.orig/gcc/config/mips/10000.md	1969-12-31 19:00:00.000000000 -0500
+++ gcc/gcc/config/mips/10000.md	2008-08-01 22:33:46.000000000 -0400
@@ -0,0 +1,246 @@
+;; VR1x000 pipeline description.
+;;   Copyright (C) 2005, 2006 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 2, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING.  If not, write to the
+;; Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
+;; MA 02110-1301, USA.
+
+
+;; This file overrides parts of generic.md.  It is derived from the
+;; old define_function_unit description.
+
+
+
+;; R12K/R14K/R16K are derivatives of R10K, thus copy its description
+;; until specific tuning for each is added
+
+
+;; R10000 has int queue, fp queue, address queue
+(define_automaton "r10k_int, r10k_fp, r10k_addr")
+
+;; R10000 has 2 integer ALUs, fp-adder and fp-multiplier, load/store
+(define_cpu_unit "r10k_alu1" "r10k_int")
+(define_cpu_unit "r10k_alu2" "r10k_int")
+(define_cpu_unit "r10k_fpadd" "r10k_fp")
+(define_cpu_unit "r10k_fpmpy" "r10k_fp")
+(define_cpu_unit "r10k_loadstore" "r10k_addr")
+
+;; R10000 has separate fp-div and fp-sqrt units as well and these can
+;; execute in parallel, however their issue & completion logic is shared
+;; by the fp-multiplier
+(define_cpu_unit "r10k_fpdiv" "r10k_fp")
+(define_cpu_unit "r10k_fpsqrt" "r10k_fp")
+
+
+
+
+;; loader
+(define_insn_reservation "r10k_load" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "load,prefetch,prefetchx"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_store" 0
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "store,fpstore,fpidxstore"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_fpload" 3
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fpload,fpidxload"))
+  "r10k_loadstore")
+
+
+
+
+;; Integer add/sub + logic ops, and mf/mt hi/lo can be done by alu1 or alu2
+;; Miscellaneous arith goes here too (this is a guess)
+(define_insn_reservation "r10k_arith" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "arith,mfhilo,mthilo,slt,clz,const,nop,trap"))
+  "r10k_alu1 | r10k_alu2")
+
+
+
+
+;; ALU1 handles shifts, branch eval, and condmove
+;;
+;; Brancher is separate, but part of ALU1, but can only
+;; do one branch per cycle (needs implementing??)
+;;
+;; jump, call - unsure if brancher handles these too (added for now)
+(define_insn_reservation "r10k_shift" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "shift,branch,jump,call"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_int_cmove" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SI,DI")))
+  "r10k_alu1")
+
+
+
+
+;; Coprocessor Moves
+;; mtc1/dmtc1 are handled by ALU1
+;; mfc1/dmfc1 are handled by the fp-multiplier
+(define_insn_reservation "r10k_mt_xfer" 3
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "mtc"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_mf_xfer" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "mfc"))
+  "r10k_fpmpy")
+
+
+
+
+;; Only ALU2 does int multiplications and divisions
+;; R10K allows an int insn using register Lo to be issued
+;; one cycle earlier than an insn using register Hi for
+;; the insns below, however, we skip on doing this
+;; for now until correct usage of lo_operand() is figured
+;; out.
+;;
+;; Divides keep ALU2 busy, but this isn't expressed here (I think...?)
+(define_insn_reservation "r10k_imul_single" 6
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "imul,imul3,imadd")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 6")
+
+(define_insn_reservation "r10k_imul_double" 10
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "imul,imul3,imadd")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 10")
+
+(define_insn_reservation "r10k_idiv_single" 35
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 35")
+
+(define_insn_reservation "r10k_idiv_double" 67
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 67")
+
+
+
+
+;; FP add/sub, mul, abs value, neg, comp, & moves
+(define_insn_reservation "r10k_fp_miscadd" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fadd,fabs,fneg,fcmp"))
+  "r10k_fpadd")
+
+(define_insn_reservation "r10k_fp_miscmul" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fmul,fmove"))
+  "r10k_fpmpy")
+
+(define_insn_reservation "r10k_fp_cmove" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SF,DF")))
+  "r10k_fpmpy")
+
+
+
+
+;; fcvt.s.[wl] has latency 4, repeat 2
+;; All other fcvt have latency 2, repeat 1
+(define_insn_reservation "r10k_fcvt_single" 4
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "I2S")))
+  "r10k_fpadd * 2")
+
+(define_insn_reservation "r10k_fcvt_other" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "!I2S")))
+  "r10k_fpadd")
+
+
+
+
+;; fmadd -  Runs through fp-adder first, then fp-multiplier
+;;
+;; The latency for fmadd is 2 cycles if the result is used
+;; by another fmadd instruction
+(define_insn_reservation "r10k_fmadd" 4
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fmadd"))
+  "r10k_fpadd, r10k_fpmpy")
+
+(define_bypass 2 "r10k_fmadd" "r10k_fmadd")
+
+
+
+
+;; fp Divisions & square roots
+(define_insn_reservation "r10k_fdiv_single" 12
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "SF")))
+  "r10k_fpdiv * 14")
+
+(define_insn_reservation "r10k_fdiv_double" 19
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "DF")))
+  "r10k_fpdiv * 21")
+
+(define_insn_reservation "r10k_fsqrt_single" 18
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_fsqrt_double" 33
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+(define_insn_reservation "r10k_frsqrt_single" 30
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_frsqrt_double" 52
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+
+
+
+;; Unknown/multi (this is a guess)
+(define_insn_reservation "r10k_unknown" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "unknown,multi"))
+  "r10k_alu1 + r10k_alu2")
+
diff -Naurp gcc.orig/gcc/config/mips/mips.c gcc/gcc/config/mips/mips.c
--- gcc.orig/gcc/config/mips/mips.c	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.c	2008-08-01 22:33:46.000000000 -0400
@@ -593,6 +593,10 @@ static const struct mips_cpu_info mips_c

    /* MIPS IV processors. */
    { "r8000", PROCESSOR_R8000, 4, 0 },
+  { "r10000", PROCESSOR_R10000, 4, 0 },
+  { "r12000", PROCESSOR_R12000, 4, 0 },
+  { "r14000", PROCESSOR_R14000, 4, 0 },
+  { "r16000", PROCESSOR_R16000, 4, 0 },
    { "vr5000", PROCESSOR_R5000, 4, 0 },
    { "vr5400", PROCESSOR_R5400, 4, 0 },
    { "vr5500", PROCESSOR_R5500, 4, PTF_AVOID_BRANCHLIKELY },
@@ -988,6 +992,58 @@ static const struct mips_rtx_cost_data m
  		     1,           /* branch_cost */
  		     4            /* memory_latency */
    },
+  { /* R10000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
+  { /* R12000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
+  { /* R14000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
+  { /* R16000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
    { /* SB1 */
      /* These costs are the same as the SB-1A below.  */
      COSTS_N_INSNS (4),            /* fp_add */
@@ -9872,7 +9928,13 @@ mips_issue_rate (void)
  	 but in reality only a maximum of 3 insns can be issued as
  	 floating-point loads and stores also require a slot in the
  	 AGEN pipe.  */
-     return 4;
+    case PROCESSOR_R10000:
+    case PROCESSOR_R12000:
+    case PROCESSOR_R14000:
+    case PROCESSOR_R16000:
+      /* All R10K Processors are quad-issue (being the first MIPS
+         processors to support this feature). */
+      return 4;

      case PROCESSOR_20KC:
      case PROCESSOR_R4130:
diff -Naurp gcc.orig/gcc/config/mips/mips.h gcc/gcc/config/mips/mips.h
--- gcc.orig/gcc/config/mips/mips.h	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.h	2008-08-01 22:33:46.000000000 -0400
@@ -66,6 +66,10 @@ enum processor_type {
    PROCESSOR_R7000,
    PROCESSOR_R8000,
    PROCESSOR_R9000,
+  PROCESSOR_R10000,
+  PROCESSOR_R12000,
+  PROCESSOR_R14000,
+  PROCESSOR_R16000,
    PROCESSOR_SB1,
    PROCESSOR_SB1A,
    PROCESSOR_SR71000,
@@ -241,6 +245,10 @@ enum mips_code_readable_setting {
  #define TARGET_MIPS5500             (mips_arch == PROCESSOR_R5500)
  #define TARGET_MIPS7000             (mips_arch == PROCESSOR_R7000)
  #define TARGET_MIPS9000             (mips_arch == PROCESSOR_R9000)
+#define TARGET_MIPS10000            (mips_arch == PROCESSOR_R10000)
+#define TARGET_MIPS12000            (mips_arch == PROCESSOR_R12000)
+#define TARGET_MIPS14000            (mips_arch == PROCESSOR_R14000)
+#define TARGET_MIPS16000            (mips_arch == PROCESSOR_R16000)
  #define TARGET_SB1                  (mips_arch == PROCESSOR_SB1		\
  				     || mips_arch == PROCESSOR_SB1A)
  #define TARGET_SR71K                (mips_arch == PROCESSOR_SR71000)
@@ -267,6 +275,10 @@ enum mips_code_readable_setting {
  #define TUNE_MIPS6000               (mips_tune == PROCESSOR_R6000)
  #define TUNE_MIPS7000               (mips_tune == PROCESSOR_R7000)
  #define TUNE_MIPS9000               (mips_tune == PROCESSOR_R9000)
+#define TUNE_MIPS10000              (mips_tune == PROCESSOR_R10000)
+#define TUNE_MIPS12000              (mips_tune == PROCESSOR_R12000)
+#define TUNE_MIPS14000              (mips_tune == PROCESSOR_R14000)
+#define TUNE_MIPS16000              (mips_tune == PROCESSOR_R16000)
  #define TUNE_SB1                    (mips_tune == PROCESSOR_SB1		\
  				     || mips_tune == PROCESSOR_SB1A)

diff -Naurp gcc.orig/gcc/config/mips/mips.md gcc/gcc/config/mips/mips.md
--- gcc.orig/gcc/config/mips/mips.md	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.md	2008-08-01 23:05:01.000000000 -0400
@@ -553,7 +553,7 @@
  ;; Attribute describing the processor.  This attribute must match exactly
  ;; with the processor_type enumeration in mips.h.
  (define_attr "cpu"
- 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,xlr"
+ 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,r10000,r12000,r14000,r16000,sb1,sb1a,sr71000,xlr"
    (const (symbol_ref "mips_tune")))

  ;; The type of hardware hazard associated with this instruction.
@@ -903,6 +903,7 @@
  (include "6000.md")
  (include "7000.md")
  (include "9000.md")
+(include "10000.md")
  (include "sb1.md")
  (include "sr71k.md")
  (include "xlr.md")







-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org

"The past tempts us, the present confuses us, the future frightens us.  And our 
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-01  1:53 [PATCH]: GCC Scheduler support for R10000 on MIPS Kumba
  2008-08-02  4:29 ` Kumba
@ 2008-08-02  9:48 ` Richard Sandiford
  2008-08-03  3:37   ` Kumba
  1 sibling, 1 reply; 22+ messages in thread
From: Richard Sandiford @ 2008-08-02  9:48 UTC (permalink / raw)
  To: Kumba; +Cc: gcc-patches, mips

Kumba <kumba@gentoo.org> writes:
> I've sent this up before, but I believe it just got lost in the shuffle.  Is 
> there anything else that I can do to assist in getting this into GCC?

Hmm.  Out of interest, when did you last submit it?  The last patch
I could see was:

    http://article.gmane.org/gmane.comp.gcc.patches/109344

which seemed like work in progress rather than a submission.  You said
you'd submit an updated patch:

    http://article.gmane.org/gmane.comp.gcc.patches/109444

but I don't remember seeing one, and I can't find any record of it
in the archives.

Sorry if you've been frustrated by the lack of action here.

Have you run the patch through the gcc testsuites?

> +;; VR1x000 pipeline description.
> +;;   Copyright (C) 2005, 2006 Free Software Foundation, Inc.

You had to update some of the types for 4.4, so I assume this should
include at least 2008.

> +;; This file is part of GCC.
> +
> +;; GCC is free software; you can redistribute it and/or modify it
> +;; under the terms of the GNU General Public License as published
> +;; by the Free Software Foundation; either version 2, or (at your
> +;; option) any later version.
> +
> +;; GCC is distributed in the hope that it will be useful, but WITHOUT
> +;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
> +;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
> +;; License for more details.
> +
> +;; You should have received a copy of the GNU General Public License
> +;; along with GCC; see the file COPYING.  If not, write to the
> +;; Free Software Foundation, 51 Franklin Street, Fifth Floor, Boston,
> +;; MA 02110-1301, USA.

Needs to be GPLv3.

> +;; This file overrides parts of generic.md.  It is derived from the
> +;; old define_function_unit description.

I don't think this is useful, given that there was no R10k
define_functin_unit description in FSF sources.

Minor nit, but you've sometimes used many blank lines to separate things.
Seems a bit excessive: two should be enough.

Coding convention nit: all comments should start with a capital letter
and end with ".", even if they aren't real sentences.

> +;; R12K/R14K/R16K are derivatives of R10K, thus copy its description
> +;; until specific tuning for each is added
> +
> +
> +;; R10000 has int queue, fp queue, address queue
> +(define_automaton "r10k_int, r10k_fp, r10k_addr")
> +
> +;; R10000 has 2 integer ALUs, fp-adder and fp-multiplier, load/store
> +(define_cpu_unit "r10k_alu1" "r10k_int")
> +(define_cpu_unit "r10k_alu2" "r10k_int")
> +(define_cpu_unit "r10k_fpadd" "r10k_fp")
> +(define_cpu_unit "r10k_fpmpy" "r10k_fp")
> +(define_cpu_unit "r10k_loadstore" "r10k_addr")
> +
> +;; R10000 has separate fp-div and fp-sqrt units as well and these can
> +;; execute in parallel, however their issue & completion logic is shared
> +;; by the fp-multiplier
> +(define_cpu_unit "r10k_fpdiv" "r10k_fp")
> +(define_cpu_unit "r10k_fpsqrt" "r10k_fp")

Do you actually model the relationship with the multiplier?  If not,
it might be worth a comment saying so.

Given that both r10k_fpdiv and r10k_fpsqrt are long-latency insns,
and that they can be used independently, we might get smaller
automata if we split r10k_fp into three: r10k_fp, r10k_fpdiv and
r10k_fpsqrt.  Could you try this and see how it affects the total
number of states?

> +;; Integer add/sub + logic ops, and mf/mt hi/lo can be done by alu1 or alu2
> +;; Miscellaneous arith goes here too (this is a guess)
> +(define_insn_reservation "r10k_arith" 1
> +  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
> +       (eq_attr "type" "arith,mfhilo,mthilo,slt,clz,const,nop,trap"))
> +  "r10k_alu1 | r10k_alu2")

"logical" seems to be missing from the .md file.  Should it be in
this reservation?

> +;; Only ALU2 does int multiplications and divisions
> +;; R10K allows an int insn using register Lo to be issued
> +;; one cycle earlier than an insn using register Hi for
> +;; the insns below, however, we skip on doing this
> +;; for now until correct usage of lo_operand() is figured
> +;; out.

The way to implement that sort of thing is define_bypass.  E.g.
make the default latency work for LO, then add a define_bypass that is
conditional on the int insn using HI.  (I assume that way round would be
better because using LO is the common case.)  You'd have to use a custom
predicate, along the lines of mips_store_data_bypass_p.

No need to do that if you don't want, but I think the last sentence is a
little misleading as it stands.

> @@ -587,6 +587,10 @@ static const struct mips_cpu_info mips_c
>
>     /* MIPS IV processors. */
>     { "r8000", PROCESSOR_R8000, 4, 0 },
> +  { "r10000", PROCESSOR_R10000, 4, 0 },
> +  { "r12000", PROCESSOR_R12000, 4, 0 },
> +  { "r14000", PROCESSOR_R14000, 4, 0 },
> +  { "r16000", PROCESSOR_R16000, 4, 0 },
>     { "vr5000", PROCESSOR_R5000, 4, 0 },
>     { "vr5400", PROCESSOR_R5400, 4, 0 },
>     { "vr5500", PROCESSOR_R5500, 4, PTF_AVOID_BRANCHLIKELY },

The formatting here looks odd, but maybe it's just mailer mangling.

I'd prefer to map r12000, r14000 and r16000 to PROCESSOR_R10000,
rather than have four PROCESSOR_* values that do the same thing.
I take your point about leaving processor-specific tuning as
future work, but I think the split should be introduced as part
of that work rather than here.

> @@ -234,6 +238,10 @@ enum mips_code_readable_setting {
>   #define TARGET_MIPS5500             (mips_arch == PROCESSOR_R5500)
>   #define TARGET_MIPS7000             (mips_arch == PROCESSOR_R7000)
>   #define TARGET_MIPS9000             (mips_arch == PROCESSOR_R9000)
> +#define TARGET_MIPS10000            (mips_arch == PROCESSOR_R10000)
> +#define TARGET_MIPS12000            (mips_arch == PROCESSOR_R12000)
> +#define TARGET_MIPS14000            (mips_arch == PROCESSOR_R14000)
> +#define TARGET_MIPS16000            (mips_arch == PROCESSOR_R16000)
>   #define TARGET_SB1                  (mips_arch == PROCESSOR_SB1		\
>   				     || mips_arch == PROCESSOR_SB1A)
>   #define TARGET_SR71K                (mips_arch == PROCESSOR_SR71000)
> @@ -250,6 +258,10 @@ enum mips_code_readable_setting {
>   #define TUNE_MIPS6000               (mips_tune == PROCESSOR_R6000)
>   #define TUNE_MIPS7000               (mips_tune == PROCESSOR_R7000)
>   #define TUNE_MIPS9000               (mips_tune == PROCESSOR_R9000)
> +#define TUNE_MIPS10000              (mips_tune == PROCESSOR_R10000)
> +#define TUNE_MIPS12000              (mips_tune == PROCESSOR_R12000)
> +#define TUNE_MIPS14000              (mips_tune == PROCESSOR_R14000)
> +#define TUNE_MIPS16000              (mips_tune == PROCESSOR_R16000)
>   #define TUNE_SB1                    (mips_tune == PROCESSOR_SB1		\
>   				     || mips_tune == PROCESSOR_SB1A)
>   #define TUNE_24K		    (mips_tune == PROCESSOR_24KC	\

I generally try to avoid having TARGET_* and TUNE_* macros that
aren't used.  (Again, it's a case of "add it when you need it".)
There are some old macros that are unused, but still.

Looks good otherwise, thanks.

Richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-02  9:48 ` Richard Sandiford
@ 2008-08-03  3:37   ` Kumba
  2008-08-03  7:20     ` Ralf Wildenhues
  2008-08-03 10:40     ` Richard Sandiford
  0 siblings, 2 replies; 22+ messages in thread
From: Kumba @ 2008-08-03  3:37 UTC (permalink / raw)
  To: Kumba, gcc-patches, mips, rdsandiford

Richard Sandiford wrote:
> Kumba <kumba@gentoo.org> writes:
>> I've sent this up before, but I believe it just got lost in the shuffle.  Is 
>> there anything else that I can do to assist in getting this into GCC?
> 
> Hmm.  Out of interest, when did you last submit it?  The last patch
> I could see was:
> 
>     http://article.gmane.org/gmane.comp.gcc.patches/109344
> 
> which seemed like work in progress rather than a submission.  You said
> you'd submit an updated patch:

Yeah, I was going to do more work in trying to further optimize R10K stuff, 
based on how the R10K manual described it.  Further review pointed out that a 
lot of the further optimizations would offer minimal gain, though.  The patch 
as-is then was a big step in helping GCC schedule code for this processor way 
better than it was originally.

Other things got in the way, though, and it fell by the wayside.  I've got some 
free time now, so I'm aiming to get it cleaned for submission and knock it out 
of the stadium.  I figured, once it's in gcc, it might get some better looking 
at by others and perhaps someone with deeper knowledge of this processor can 
fine-tune it.



>     http://article.gmane.org/gmane.comp.gcc.patches/109444
> 
> but I don't remember seeing one, and I can't find any record of it
> in the archives.
> 
> Sorry if you've been frustrated by the lack of action here.

Nah, no frustrations.  Just been a lack of time on my part :)


> Have you run the patch through the gcc testsuites?

Not yet.  Are there any quick guides on doing do?  Is that as simple and 
completing a compile, then running 'make test'?


> You had to update some of the types for 4.4, so I assume this should
> include at least 2008.

Updated.

> Needs to be GPLv3.

Ditto.


> I don't think this is useful, given that there was no R10k
> define_functin_unit description in FSF sources.

Yeah, I think I had that blurb there because generic.md was what I modeled off 
of.  The very original patch that started all of this was actually submitted by 
someone else for gcc-3.0 several years ago, and I've pretty much kept forward 
porting it for my own use since.  When DFA was introduced and generic.md got 
split up, that's where this blurb originated from.


> Minor nit, but you've sometimes used many blank lines to separate things.
> Seems a bit excessive: two should be enough.
> 
> Coding convention nit: all comments should start with a capital letter
> and end with ".", even if they aren't real sentences.

Cleaned these up -- how does it look now?


> Do you actually model the relationship with the multiplier?  If not,
> it might be worth a comment saying so.

Not real sure.  It's been awhile since I looked at the details, but I know the 
R10K had a rather complex multiplier, and I wasn't real sure how to properly 
model it.  It was also my first stab at DFA and pipeline descriptors in general, 
so there's no telling how far off I was.


> Given that both r10k_fpdiv and r10k_fpsqrt are long-latency insns,
> and that they can be used independently, we might get smaller
> automata if we split r10k_fp into three: r10k_fp, r10k_fpdiv and
> r10k_fpsqrt.  Could you try this and see how it affects the total
> number of states?

I can give this a shot, but memory wants to recall I did have several messages 
to gcc-patches in which I was concerned about the disproportionately high number 
of states that were being generated for one of the automatons (I forget which, I 
think it was the multiplier).  I also wasn't sure of the purpose of automata 
completely, and figured they represented at best, the physically different 
sections of the R10K.  Hence that's how I set them up.

I'll try adding those two additional automata, and tweaking the cpu_units for 
them.  As far as the number of states, is there a quick way to have gcc calc the 
number of states per automata versus actually running the build?

My Octane, which does the building, has a pretty decent 550MHz R14000, but the 
new fractional data types in gcc make a full build take almost 8 hours, which is 
why I ask (unless I can just dump fractional data types)


> "logical" seems to be missing from the .md file.  Should it be in
> this reservation?

I haven't kept track of the newer insns, so I went on ahead and added it here. 
How about signext?  I see that one in the 7000.md file.  Good fit here too?  I 
matched the insns that I could recognize.  'logical', does this refer to 
conditional statements like if clauses and such?


> The way to implement that sort of thing is define_bypass.  E.g.
> make the default latency work for LO, then add a define_bypass that is
> conditional on the int insn using HI.  (I assume that way round would be
> better because using LO is the common case.)  You'd have to use a custom
> predicate, along the lines of mips_store_data_bypass_p.
> 
> No need to do that if you don't want, but I think the last sentence is a
> little misleading as it stands.


Hmm, sounds like something to try.  I admit, I know nothing about defining 
custom predicates, though.  Any tips on that?  And does it add much optimization 
to be worth implementing?  The way I was thinking of checking was to use 
lo_operand in one of the define_insn_reservation lines as one of the tests, 
generating one latency if the insn was LO and the other latency if the insn was 
HI in a different define_insn_reservation.  But I wasn't able to find docs that 
explained lo_operand() and what its return values are.  I don't even know if 
it's still around anymore.

So, that comment basically indicates what the R10K manual was saying, and that 
due to lack of understanding lo_operand(), why we weren't attempting to go to 
that level of optimization.  I figured the processor probably won't fuss too 
much about having the same latency on LO or HI, and at best, it's a minor 
deficiency.


> The formatting here looks odd, but maybe it's just mailer mangling.

I'll check the source file to make sure I'm not putting a tab where there are 
plain spaces.  But I am using Thunderbird, if that offers any insight into the 
mangling.  I can attach the file too, if preferred.


> I'd prefer to map r12000, r14000 and r16000 to PROCESSOR_R10000,
> rather than have four PROCESSOR_* values that do the same thing.
> I take your point about leaving processor-specific tuning as
> future work, but I think the split should be introduced as part
> of that work rather than here.

I changed these all to R10000, and removed the R12000-R16000 macros from mips.h. 
  Do I need to remove the separate insn costs from each as well (Except for R10000)?

Also, How does one calculate insn costs?  I never found much detail on that, so 
I think my defaults are copied from something else (been awhile).


> I generally try to avoid having TARGET_* and TUNE_* macros that
> aren't used.  (Again, it's a case of "add it when you need it".)
> There are some old macros that are unused, but still.

I added these mostly so those with an R12000 or R14000 processor can just use 
-march=r12000 and get the same code as an R10K without having to look up the 
info that they need to use -march=r10000.

Should I instead define -march=r1x000?  And later on, when someone can find 
errata for R12K/R14K/R16K, add the processor-specific bits.  Actually, I have an 
R10K manual with R12K notes/errata in it, but none relate to latencies and costs 
as far as I can tell, so I think it's safe for those two processors to be 
replicas of each other.

The R14K's, though, there's no errata documents anywhere that I know of.  SGI's 
likely dumped whatever they have into a black hole somewheres.  The Octane is 
the only Linux-capable box that can run these CPUs anyways, and I'm probably one 
of the few with such a machine.  Mapping it to R10K has worked fine so far.

R16K is on the Tezro's and Origin 3000's only.  Neither has a Linux port yet, 
and that's probably years off.  I only included it for completion, since R16000 
is the last of that line (SGI shelved the R18000).  It's a die shrink of R14000, 
so it probably functions the same, but that's a blind assumption.



Thanks for the feedback!  With the attached patch, I've added most of your 
recommendations except for the LO/HI section.  I've got gcc rebuilding w/ the 
automata changes and I'll see what those state counts are, and see if they match 
up with other mips automata.


Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org


gcc/
     * config/mips/10000.md: Add R10000 scheduler
     * config/mips/mips.c: Add r1x000 params & costs
     * config/mips/mips.h: Add constants
     * config/mips/mips.md: Add r1x000 params & incl 10000.md


diff -Naurp gcc.orig/gcc/config/mips/10000.md gcc/gcc/config/mips/10000.md
--- gcc.orig/gcc/config/mips/10000.md	1969-12-31 19:00:00.000000000 -0500
+++ gcc/gcc/config/mips/10000.md	2008-08-02 13:11:13.000000000 -0400
@@ -0,0 +1,222 @@
+;; DFA-based pipeline description for the VR1x000.
+;;   Copyright (C) 2005, 2006, 2008 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; R12K/R14K/R16K are derivatives of R10K, thus copy its description
+;; until specific tuning for each is added.
+
+;; R10000 has int queue, fp queue, address queue.
+;; We split the fp queue into standard fp, fp division, and
+;; fp square root to further optimize the automata, though.
+(define_automaton "r10k_int, r10k_fp, r10k_fpdivision,
+                   r10k_fpsqroot, r10k_addr")
+
+;; R10000 has 2 integer ALUs, fp-adder and fp-multiplier, load/store.
+(define_cpu_unit "r10k_alu1" "r10k_int")
+(define_cpu_unit "r10k_alu2" "r10k_int")
+(define_cpu_unit "r10k_fpadd" "r10k_fp")
+(define_cpu_unit "r10k_fpmpy" "r10k_fp")
+(define_cpu_unit "r10k_loadstore" "r10k_addr")
+
+;; R10000 has separate fp-div and fp-sqrt units as well and these can
+;; execute in parallel, however their issue & completion logic is shared
+;; by the fp-multiplier.
+(define_cpu_unit "r10k_fpdiv" "r10k_fpdivision")
+(define_cpu_unit "r10k_fpsqrt" "r10k_fpsqroot")
+
+
+;; R10k Loader.
+(define_insn_reservation "r10k_load" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "load,prefetch,prefetchx"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_store" 0
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "store,fpstore,fpidxstore"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_fpload" 3
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fpload,fpidxload"))
+  "r10k_loadstore")
+
+
+;; Integer add/sub + logic ops, and mf/mt hi/lo can be done by alu1 or alu2.
+;; Miscellaneous arith goes here too (this is a guess).
+(define_insn_reservation "r10k_arith" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "arith,mfhilo,mthilo,slt,clz,const,nop,trap,logical"))
+  "r10k_alu1 | r10k_alu2")
+
+
+;; ALU1 handles shifts, branch eval, and condmove.
+;;
+;; Brancher is separate, but part of ALU1, but can only
+;; do one branch per cycle (needs implementing?).
+;;
+;; Unsure if the brancher handles jumps and calls as well, but since
+;; they're related, we'll add them here for now.
+(define_insn_reservation "r10k_shift" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "shift,branch,jump,call"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_int_cmove" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SI,DI")))
+  "r10k_alu1")
+
+
+;; Coprocessor Moves.
+;; mtc1/dmtc1 are handled by ALU1.
+;; mfc1/dmfc1 are handled by the fp-multiplier.
+(define_insn_reservation "r10k_mt_xfer" 3
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "mtc"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_mf_xfer" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "mfc"))
+  "r10k_fpmpy")
+
+
+;; Only ALU2 does int multiplications and divisions.
+;; R10K allows an int insn using register Lo to be issued
+;; one cycle earlier than an insn using register Hi for
+;; the insns below, however, we skip on doing this
+;; for now until correct usage of lo_operand() is figured
+;; out.
+;;
+;; Divides keep ALU2 busy, but this isn't expressed here (I think?).
+(define_insn_reservation "r10k_imul_single" 6
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "imul,imul3,imadd")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 6")
+
+(define_insn_reservation "r10k_imul_double" 10
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "imul,imul3,imadd")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 10")
+
+(define_insn_reservation "r10k_idiv_single" 35
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 35")
+
+(define_insn_reservation "r10k_idiv_double" 67
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 67")
+
+
+;; Floating point add/sub, mul, abs value, neg, comp, & moves.
+(define_insn_reservation "r10k_fp_miscadd" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fadd,fabs,fneg,fcmp"))
+  "r10k_fpadd")
+
+(define_insn_reservation "r10k_fp_miscmul" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fmul,fmove"))
+  "r10k_fpmpy")
+
+(define_insn_reservation "r10k_fp_cmove" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SF,DF")))
+  "r10k_fpmpy")
+
+
+;; The fcvt.s.[wl] insn has latency 4, repeat 2.
+;; All other fcvt have latency 2, repeat 1.
+(define_insn_reservation "r10k_fcvt_single" 4
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "I2S")))
+  "r10k_fpadd * 2")
+
+(define_insn_reservation "r10k_fcvt_other" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "!I2S")))
+  "r10k_fpadd")
+
+
+;; Run the fmadd insn through fp-adder first, then fp-multiplier.
+;;
+;; The latency for fmadd is 2 cycles if the result is used
+;; by another fmadd instruction.
+(define_insn_reservation "r10k_fmadd" 4
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fmadd"))
+  "r10k_fpadd, r10k_fpmpy")
+
+(define_bypass 2 "r10k_fmadd" "r10k_fmadd")
+
+
+;; Floating point Divisions & square roots.
+(define_insn_reservation "r10k_fdiv_single" 12
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "SF")))
+  "r10k_fpdiv * 14")
+
+(define_insn_reservation "r10k_fdiv_double" 19
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "DF")))
+  "r10k_fpdiv * 21")
+
+(define_insn_reservation "r10k_fsqrt_single" 18
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_fsqrt_double" 33
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+(define_insn_reservation "r10k_frsqrt_single" 30
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_frsqrt_double" 52
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+
+;; Handle unknown/multi insns here (this is a guess).
+(define_insn_reservation "r10k_unknown" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "unknown,multi"))
+  "r10k_alu1 + r10k_alu2")
diff -Naurp gcc.orig/gcc/config/mips/mips.c gcc/gcc/config/mips/mips.c
--- gcc.orig/gcc/config/mips/mips.c	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.c	2008-08-02 12:13:48.000000000 -0400
@@ -593,6 +593,10 @@ static const struct mips_cpu_info mips_c

    /* MIPS IV processors. */
    { "r8000", PROCESSOR_R8000, 4, 0 },
+  { "r10000", PROCESSOR_R10000, 4, 0 },
+  { "r12000", PROCESSOR_R10000, 4, 0 },
+  { "r14000", PROCESSOR_R10000, 4, 0 },
+  { "r16000", PROCESSOR_R10000, 4, 0 },
    { "vr5000", PROCESSOR_R5000, 4, 0 },
    { "vr5400", PROCESSOR_R5400, 4, 0 },
    { "vr5500", PROCESSOR_R5500, 4, PTF_AVOID_BRANCHLIKELY },
@@ -988,6 +992,58 @@ static const struct mips_rtx_cost_data m
  		     1,           /* branch_cost */
  		     4            /* memory_latency */
    },
+  { /* R10000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
+  { /* R12000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
+  { /* R14000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
+  { /* R16000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (6),            /* int_mult_si */
+    COSTS_N_INSNS (10),           /* int_mult_di */
+    COSTS_N_INSNS (35),           /* int_div_si */
+    COSTS_N_INSNS (67),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
    { /* SB1 */
      /* These costs are the same as the SB-1A below.  */
      COSTS_N_INSNS (4),            /* fp_add */
@@ -9872,7 +9928,10 @@ mips_issue_rate (void)
  	 but in reality only a maximum of 3 insns can be issued as
  	 floating-point loads and stores also require a slot in the
  	 AGEN pipe.  */
-     return 4;
+    case PROCESSOR_R10000:
+      /* All R10K Processors are quad-issue (being the first MIPS
+         processors to support this feature). */
+      return 4;

      case PROCESSOR_20KC:
      case PROCESSOR_R4130:
diff -Naurp gcc.orig/gcc/config/mips/mips.h gcc/gcc/config/mips/mips.h
--- gcc.orig/gcc/config/mips/mips.h	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.h	2008-08-02 12:14:38.000000000 -0400
@@ -66,6 +66,7 @@ enum processor_type {
    PROCESSOR_R7000,
    PROCESSOR_R8000,
    PROCESSOR_R9000,
+  PROCESSOR_R10000,
    PROCESSOR_SB1,
    PROCESSOR_SB1A,
    PROCESSOR_SR71000,
@@ -241,6 +242,10 @@ enum mips_code_readable_setting {
  #define TARGET_MIPS5500             (mips_arch == PROCESSOR_R5500)
  #define TARGET_MIPS7000             (mips_arch == PROCESSOR_R7000)
  #define TARGET_MIPS9000             (mips_arch == PROCESSOR_R9000)
+#define TARGET_MIPS10000            (mips_arch == PROCESSOR_R10000)
+#define TARGET_MIPS12000            (mips_arch == PROCESSOR_R10000)
+#define TARGET_MIPS14000            (mips_arch == PROCESSOR_R10000)
+#define TARGET_MIPS16000            (mips_arch == PROCESSOR_R10000)
  #define TARGET_SB1                  (mips_arch == PROCESSOR_SB1		\
  				     || mips_arch == PROCESSOR_SB1A)
  #define TARGET_SR71K                (mips_arch == PROCESSOR_SR71000)
@@ -267,6 +272,10 @@ enum mips_code_readable_setting {
  #define TUNE_MIPS6000               (mips_tune == PROCESSOR_R6000)
  #define TUNE_MIPS7000               (mips_tune == PROCESSOR_R7000)
  #define TUNE_MIPS9000               (mips_tune == PROCESSOR_R9000)
+#define TUNE_MIPS10000              (mips_tune == PROCESSOR_R10000)
+#define TUNE_MIPS12000              (mips_tune == PROCESSOR_R10000)
+#define TUNE_MIPS14000              (mips_tune == PROCESSOR_R10000)
+#define TUNE_MIPS16000              (mips_tune == PROCESSOR_R10000)
  #define TUNE_SB1                    (mips_tune == PROCESSOR_SB1		\
  				     || mips_tune == PROCESSOR_SB1A)

diff -Naurp gcc.orig/gcc/config/mips/mips.md gcc/gcc/config/mips/mips.md
--- gcc.orig/gcc/config/mips/mips.md	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.md	2008-08-01 23:05:01.000000000 -0400
@@ -553,7 +553,7 @@
  ;; Attribute describing the processor.  This attribute must match exactly
  ;; with the processor_type enumeration in mips.h.
  (define_attr "cpu"
- 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,xlr"
+ 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,r10000,r12000,r14000,r16000,sb1,sb1a,sr71000,xlr"
    (const (symbol_ref "mips_tune")))

  ;; The type of hardware hazard associated with this instruction.
@@ -903,6 +903,7 @@
  (include "6000.md")
  (include "7000.md")
  (include "9000.md")
+(include "10000.md")
  (include "sb1.md")
  (include "sr71k.md")
  (include "xlr.md")

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-03  3:37   ` Kumba
@ 2008-08-03  7:20     ` Ralf Wildenhues
  2008-08-03 10:40     ` Richard Sandiford
  1 sibling, 0 replies; 22+ messages in thread
From: Ralf Wildenhues @ 2008-08-03  7:20 UTC (permalink / raw)
  To: Kumba; +Cc: gcc-patches, mips, rdsandiford

Hello,

* Kumba wrote on Sun, Aug 03, 2008 at 05:36:55AM CEST:
> Richard Sandiford wrote:
>> Have you run the patch through the gcc testsuites?
>
> Not yet.  Are there any quick guides on doing do?  Is that as simple and  
> completing a compile, then running 'make test'?

Yes, and yes: <http://gcc.gnu.org/contribute.html> has all the details.

Cheers,
Ralf

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-03  3:37   ` Kumba
  2008-08-03  7:20     ` Ralf Wildenhues
@ 2008-08-03 10:40     ` Richard Sandiford
  2008-08-04  7:20       ` Kumba
  1 sibling, 1 reply; 22+ messages in thread
From: Richard Sandiford @ 2008-08-03 10:40 UTC (permalink / raw)
  To: Kumba; +Cc: gcc-patches, mips

Kumba <kumba@gentoo.org> writes:
>> Have you run the patch through the gcc testsuites?
>
> Not yet.  Are there any quick guides on doing do?  Is that as simple and 
> completing a compile, then running 'make test'?

I see Ralf's already answered this.

>> Minor nit, but you've sometimes used many blank lines to separate things.
>> Seems a bit excessive: two should be enough.
>> 
>> Coding convention nit: all comments should start with a capital letter
>> and end with ".", even if they aren't real sentences.
>
> Cleaned these up -- how does it look now?

A lot better, thanks.

>> Do you actually model the relationship with the multiplier?  If not,
>> it might be worth a comment saying so.
>
> Not real sure.  It's been awhile since I looked at the details, but I
> know the R10K had a rather complex multiplier, and I wasn't real sure
> how to properly model it.  It was also my first stab at DFA and
> pipeline descriptors in general, so there's no telling how far off I
> was.

There doesn't seem to be anything in the description linking the
FP multiplier cpu_unit with the division and sqare root cpu_units,
so I'm pretty sure it isn't modelled.  Which is fine.  I just think
you should add something like "We don't model this at present."
to the end of the comment.

(There's no shame in that.  It's common to omit some details from the
DFA description, and only mention them in the comments.  The aim after
all is to get good code, not to describe the pipeline with complete accuracy.
Sometimes omitting details gives better code.)

>> Given that both r10k_fpdiv and r10k_fpsqrt are long-latency insns,
>> and that they can be used independently, we might get smaller
>> automata if we split r10k_fp into three: r10k_fp, r10k_fpdiv and
>> r10k_fpsqrt.  Could you try this and see how it affects the total
>> number of states?
> [...]
> I'll try adding those two additional automata, and tweaking the
> cpu_units for them.

Thanks.

> As far as the number of states, is there a quick
> way to have gcc calc the number of states per automata versus actually
> running the build?

Add:

    (automata_option "v")

to one of the .md files and do "make insn-automata.c".  This will
create a file called "mips.dfa" in the build directory.  At the end
of that file is a summary of the automata.  The interesting thing is
the number of DFA states and DFA arcs in the r10k_* automata.

(Hadn't realised it was so hard to get at this information these days.
It used to be printed on stderr.  There's also support for adding "-v"
to the genautomata command line, but it seems to have bitrotted and
no longer works.)

>> "logical" seems to be missing from the .md file.  Should it be in
>> this reservation?
>
> I haven't kept track of the newer insns, so I went on ahead and added
> it here.  How about signext?  I see that one in the 7000.md file.
> Good fit here too?  I matched the insns that I could recognize.
> 'logical', does this refer to conditional statements like if clauses
> and such?

"logical" means things like AND, OR, XOR and NOR.  These insns used
to be lumped into "arith", but were split out for the benefit of a
pipeline that doesn't issue all old-"arith" insns in the same way.

(That's the general model.  We split "type" attributes up on an
as-needed basis, rather than trying to predict in advance what
would be the finest useful granularity.)

"signext" is the new MIPS32r2/MIPS64r2 SEB and SEH instructions,
which the R10K doesn't have.

>> The way to implement that sort of thing is define_bypass.  E.g.
>> make the default latency work for LO, then add a define_bypass that is
>> conditional on the int insn using HI.  (I assume that way round would be
>> better because using LO is the common case.)  You'd have to use a custom
>> predicate, along the lines of mips_store_data_bypass_p.
>> 
>> No need to do that if you don't want, but I think the last sentence is a
>> little misleading as it stands.
>
> Hmm, sounds like something to try.  I admit, I know nothing about
> defining custom predicates, though.  Any tips on that?

Look at mips_store_data_bypass_p and mips_linked_madd_p for examples.

> And does it add much optimization to be worth implementing?

TBH, the only way to know is to try it and measure the result.

And like I say, there's absolutely no need to try it.  I was just trying
to say that the comment should mention bypasses instead of lo_operand.

> The way I was thinking of checking was to use lo_operand in one of the
> define_insn_reservation lines as one of the tests, generating one
> latency if the insn was LO and the other latency if the insn was HI in
> a different define_insn_reservation.  But I wasn't able to find docs
> that explained lo_operand() and what its return values are.  I don't
> even know if it's still around anymore.

It is still around.  But the point is that the latency of a
define_insn_reservation is determined solely by the insn it applies to.
And that insn is the definition side of the dependency, not the use.
In other words, the insns matched by these insn_reservations are the
multiplications and divisions themselves.  Those insns clobber both
HI _and_ LO.

Thus you can't use define_insn_reservation tests to tell between a
use of HI and a ues of LO when setting the latency of a multiplication
or division.  That's what bypasses are for.

As to documentation, lo_operand is an operand predicate.
See the Predicates section of the internals documentation
for more info about those.  (Just if you're curious; like I say,
it won't really help here.)

>> I'd prefer to map r12000, r14000 and r16000 to PROCESSOR_R10000,
>> rather than have four PROCESSOR_* values that do the same thing.
>> I take your point about leaving processor-specific tuning as
>> future work, but I think the split should be introduced as part
>> of that work rather than here.
>
> I changed these all to R10000, and removed the R12000-R16000 macros
> from mips.h.

Thanks.

> Do I need to remove the separate insn costs from each as well (Except
> for R10000)?

Yes.  The costs array is indexed by "enum processor_type".

You also need to remove the r12000, r14000 and r16000 "cpu" attributes,
because "cpu" must be a carbon copy of "enum processor_type".
(It's a nasty wart of the infrastructure that we need to define both.)

> Also, How does one calculate insn costs?  I never found much detail on
> that, so I think my defaults are copied from something else (been
> awhile).

Experimentation, basically.  Costs are used to choose between
two equivalent implementations of an operation.  E.g. multiplication
by a constant can be done using a single multiplication insn or by
a sequence of shifts and adds.

The target-independent code calculates the cost of a sequence of
insns simply by adding them up.  It doesn't take into account how
the pipeline might issue them, or what the repeat rates are.

So COSTS_N_INSNS (latency) is a good start, but is often too high on
superscalar pipelines, where breaking a monolithic operation into
smaller operations can exploit the parallelism better.  For example,
if multiplication takes 5 cycles on a dual-issue target, a multiplication
is often (but not always!) more expensive than 5 single-cycle insns.

The costs are just heuristics, and you have to accept that any given
choice of values is going to make some things better and some things
worse.  When I've done scheduling work in the past, I simply tried
various values and run the result through (commercial) benchmarks.

>> I generally try to avoid having TARGET_* and TUNE_* macros that
>> aren't used.  (Again, it's a case of "add it when you need it".)
>> There are some old macros that are unused, but still.
>
> I added these mostly so those with an R12000 or R14000 processor can just use 
> -march=r12000 and get the same code as an R10K without having to look up the 
> info that they need to use -march=r10000.
>
> Should I instead define -march=r1x000?  And later on, when someone can
> find errata for R12K/R14K/R16K, add the processor-specific bits.
> [...]

I think you've misunderstood what I meant.  I was simply saying
that you shouldn't define those new TARGET_* and TUNE_* macros.
They're not used anywhere in your patch, so they're just dead code.

TARGET_FOO should only be defined if some code tests TARGET_FOO.
Likewise TUNE_FOO.

I certainly wasn't talking about changing the -march options.
Please keep them all, but map them to PROCESSOR_R10000, which is
exactly what your revised patch did.

You need to add the new options to doc/invoke.texi.

With those issues fixed, the patch should be ready to go in.

Richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-03 10:40     ` Richard Sandiford
@ 2008-08-04  7:20       ` Kumba
  2008-08-04 19:23         ` Richard Sandiford
  0 siblings, 1 reply; 22+ messages in thread
From: Kumba @ 2008-08-04  7:20 UTC (permalink / raw)
  To: gcc-patches, mips, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 22485 bytes --]

Richard Sandiford wrote:
> 
> I see Ralf's already answered this.

Yup, and I ran it earlier.  Took quite a bit to finish, but I figured it wasn't 
going to be a quick endeavor.  There's a couple of failures, but I'm guessing 
some failures are expected.  Not sure what counts and what doesn't, so I've 
attached it.  It was run from a fully-compiled gcc, not bootstrap, so I'm unsure 
if that affects the output any.

 > There doesn't seem to be anything in the description linking the
> FP multiplier cpu_unit with the division and sqare root cpu_units,
> so I'm pretty sure it isn't modelled.  Which is fine.  I just think
> you should add something like "We don't model this at present."
> to the end of the comment.
> 
> (There's no shame in that.  It's common to omit some details from the
> DFA description, and only mention them in the comments.  The aim after
> all is to get good code, not to describe the pipeline with complete accuracy.
> Sometimes omitting details gives better code.)

Ah!  That's probably because I wasn't sure how to link the division and
square-root units to the multiplier.  I knew that they had to be linked, because 
as the R10K manual stated, they're separate/parallel units, but their issue &
completion logic is shared by the multiplier.  So I know that if the multiplier
is busy in either of these two stages, it'll cause a delay for these other two 
units, right?.

That I think is why I had only three automata, and was funneling squareroot and
division into the r10k_fp automata.  I figured this represented "linking" the
multiplier and these two units.  I suppose that wasn't accurate, though?

Can we even model the issue and completion stages of a cpu unit?

> Add:
> 
>     (automata_option "v")
> 
> to one of the .md files and do "make insn-automata.c".  This will
> create a file called "mips.dfa" in the build directory.  At the end
> of that file is a summary of the automata.  The interesting thing is
> the number of DFA states and DFA arcs in the r10k_* automata.
> 
> (Hadn't realised it was so hard to get at this information these days.
> It used to be printed on stderr.  There's also support for adding "-v"
> to the genautomata command line, but it seems to have bitrotted and
> no longer works.)

Thanks, this worked great.  I captured the entire build output looking for this
verbosity, but didn't see it;  I guess it was hidden at some point.

Breaking out those two units into their own automata changes things quite a bit.
  The resulting mips.dfa file is only about 300,000 lines long, and the r10k_fp
automaton now has only 8 states (division has 22 and square root has 36 states.
Originally, the r10k_fp automaton had 6336 states (and the mips.dfa file was 
500,000+ lines long).  So this seems to bring the state numbers down to look 
more like the other mips cpu automatons.  I can pass those along if you're 
interested.

And yeah, I looked at the option parsing bit in genautomata.c.  It looks like it
should work, but that if-then-else construct it's got going just seems to fail
somehow.

  > "logical" means things like AND, OR, XOR and NOR.  These insns used
> to be lumped into "arith", but were split out for the benefit of a
> pipeline that doesn't issue all old-"arith" insns in the same way.
> 
> (That's the general model.  We split "type" attributes up on an
> as-needed basis, rather than trying to predict in advance what
> would be the finest useful granularity.)

Ah, so R10K is probably in the older class of just handling arith & logical as
one, thus it's safe to lump them into the same insn reservation.

> TBH, the only way to know is to try it and measure the result.
> 
> And like I say, there's absolutely no need to try it.  I was just trying
> to say that the comment should mention bypasses instead of lo_operand.

Curiosity demands I at least look it up :)

I wrote that comment based on what I thought was the way to check -- I wasn't
aware that multiplications and divisions clobbered both HI and LO, so I can see
why bypasses are the way to go.

Speaking of predicates, I get what to do now.  Define a custom predicate in 
mips.c (I guess, "mips_check_insn_hi_p" ?).  Here's what I think so far by 
looking at those two predicates that you mentioned:

mips_check_insn_hi_p(rtx insn)
{
   return IS_INSN_HI(insn)
}

Then in 10000.md, something like:

(define_bypass 6 "r10k_imul_single" "mips_check_insn_hi_p")

And I use IS_INSN_HI as a placeholder because I have no idea what function/macro 
in the gcc internals checks an insn to see if it's a HI or LO one.  Is there 
such a check?  I perused mips.c and poked into ia64.c looking for something that 
checks for HI or LO, but nothing stood out to me really.  Probably cause I'm not 
real sure what I need to be looking for.

But I assume that would be a decent predicate definition and usage, right? 
Assuming there's a basic "Is this HI?" mechanism that returns true if yes and 
false if no, it makes sense to assign that straight to that predicates return 
value, right?  Then define_bypass knows to use latency 6 for that insn if it 
knows that it's on the HI side of things.

FYI, I tweaked the 10000.md file to use the LO latencies by default, per your 
earlier mention that using LO is the common case.  I also re-wrote that comment 
in case I can't figure out a working predicate to check for HI.

And do you know what the attr type is for MULTU or DMULTU?  imul, imul3, and 
imadd don't seem to fit (I am assuming MULTU/DMULTU and friends are for 
unsigned?).  The R10K manual has different latencies for those, and it looks 
like I don't have insn reservations defined for those.  Or is this another 
define_bypass + custom predicate to check for signed/unsigned?

> Yes.  The costs array is indexed by "enum processor_type".
> 
> You also need to remove the r12000, r14000 and r16000 "cpu" attributes,
> because "cpu" must be a carbon copy of "enum processor_type".
> (It's a nasty wart of the infrastructure that we need to define both.)

Done.

> Experimentation, basically.  Costs are used to choose between
> two equivalent implementations of an operation.  E.g. multiplication
> by a constant can be done using a single multiplication insn or by
> a sequence of shifts and adds.
> 
> The target-independent code calculates the cost of a sequence of
> insns simply by adding them up.  It doesn't take into account how
> the pipeline might issue them, or what the repeat rates are.
> 
> So COSTS_N_INSNS (latency) is a good start, but is often too high on
> superscalar pipelines, where breaking a monolithic operation into
> smaller operations can exploit the parallelism better.  For example,
> if multiplication takes 5 cycles on a dual-issue target, a multiplication
> is often (but not always!) more expensive than 5 single-cycle insns.
> 
> The costs are just heuristics, and you have to accept that any given
> choice of values is going to make some things better and some things
> worse.  When I've done scheduling work in the past, I simply tried
> various values and run the result through (commercial) benchmarks.

Well, I know of no commercial benchmarking tools for Linux/Mips on SGI systems, 
and since it sounds like it's mostly guesswork to begin with, I guess using the 
same values as the latencies should be kosher.  I don't suppose there's any rule 
of thumb involving superscalar pipelines out there that might say, slice a 
couple digits off these default latencies?

> I think you've misunderstood what I meant.  I was simply saying
> that you shouldn't define those new TARGET_* and TUNE_* macros.
> They're not used anywhere in your patch, so they're just dead code.
> 
> TARGET_FOO should only be defined if some code tests TARGET_FOO.
> Likewise TUNE_FOO.
> 
> I certainly wasn't talking about changing the -march options.
> Please keep them all, but map them to PROCESSOR_R10000, which is
> exactly what your revised patch did.

Yup, I see what you were getting at.  I pulled the unneeded TARGET_R1[246]000 
and TUNE_1[246]000 options, as well as the three insn costs enums related to them.

> You need to add the new options to doc/invoke.texi.

Done.

Attached is round three.  Other changes not mentioned above include adding 
frdiv1/2 and frsqrt1/2 insns to the existing reservations.  No idea if R10K 
supports these, but better safe than sorry.  I also added the 'move' insn, even 
though the manual makes no explicit mention of an unconditional integer register 
move operand (only integer condmove).

Also, the R10K manual doesn't seem to differentiate betweem fmadd and imadd.  In 
the latency table, it simply states "MADD" -- might I infer this to assume that 
R10K itself doesn't distingush between imadd or fmadd, treats them the same, and 
so I need to follow suit? (I've got imadd set to run on ALU2, whereas fmadd runs 
on the fp multiplier).

And happen to know what kind of insns LWC1/LDC1/LWXC1/LDXC1 match? 
fpload/fpidxload by chance?  Referenced in the manual, they look like loads, but 
they have a different latency (which is how I coded them, but wanted to double 
check).

Thanks for the feedback!

gcc/
     * config/mips/10000.md: Add R10000 scheduler
     * config/mips/mips.c: Add r10000 params & costs
     * config/mips/mips.h: Add R10k constant
     * config/mips/mips.md: Add r10000 params & incl 10000.md

diff -Naurp gcc.orig/gcc/config/mips/10000.md gcc/gcc/config/mips/10000.md
--- gcc.orig/gcc/config/mips/10000.md	1969-12-31 19:00:00.000000000 -0500
+++ gcc/gcc/config/mips/10000.md	2008-08-04 02:37:13.000000000 -0400
@@ -0,0 +1,223 @@
+;; DFA-based pipeline description for the VR1x000.
+;;   Copyright (C) 2005, 2006, 2008 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; R12K/R14K/R16K are derivatives of R10K, thus copy its description
+;; until specific tuning for each is added.
+
+;; R10000 has int queue, fp queue, address queue.
+;; We split the fp queue into standard fp, fp division, and
+;; fp square root to further optimize the automata, though.
+(define_automaton "r10k_int, r10k_fp, r10k_fpdivision,
+                   r10k_fpsqroot, r10k_addr")
+
+;; R10000 has 2 integer ALUs, fp-adder and fp-multiplier, load/store.
+(define_cpu_unit "r10k_alu1" "r10k_int")
+(define_cpu_unit "r10k_alu2" "r10k_int")
+(define_cpu_unit "r10k_fpadd" "r10k_fp")
+(define_cpu_unit "r10k_fpmpy" "r10k_fp")
+(define_cpu_unit "r10k_loadstore" "r10k_addr")
+
+;; R10000 has separate fp-div and fp-sqrt units as well and these can
+;; execute in parallel, however their issue & completion logic is shared
+;; by the fp-multiplier.
+(define_cpu_unit "r10k_fpdiv" "r10k_fpdivision")
+(define_cpu_unit "r10k_fpsqrt" "r10k_fpsqroot")
+
+
+;; R10k Loader.
+(define_insn_reservation "r10k_load" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "load,prefetch,prefetchx"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_store" 0
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "store,fpstore,fpidxstore"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_fpload" 3
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fpload,fpidxload"))
+  "r10k_loadstore")
+
+
+;; Integer add/sub + logic ops, and mf/mt hi/lo can be done by alu1 or alu2.
+;; Miscellaneous arith goes here too (this is a guess).
+(define_insn_reservation "r10k_arith" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "arith,mfhilo,mthilo,slt,clz,const,nop,trap,logical"))
+  "r10k_alu1 | r10k_alu2")
+
+
+;; ALU1 handles shifts, branch eval, and condmove.
+;;
+;; Brancher is separate, but part of ALU1, but can only
+;; do one branch per cycle (needs implementing?).
+;;
+;; Unsure if the brancher handles jumps and calls as well, but since
+;; they're related, we'll add them here for now.
+(define_insn_reservation "r10k_shift" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "shift,branch,jump,call,move"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_int_cmove" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SI,DI")))
+  "r10k_alu1")
+
+
+;; Coprocessor Moves.
+;; mtc1/dmtc1 are handled by ALU1.
+;; mfc1/dmfc1 are handled by the fp-multiplier.
+(define_insn_reservation "r10k_mt_xfer" 3
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "mtc"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_mf_xfer" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "mfc"))
+  "r10k_fpmpy")
+
+
+;; Only ALU2 does int multiplications and divisions.
+;;
+;; According to the Vr10000 series user manual,
+;; integer mult and div insns can be issued one
+;; cycle earlier if using register Lo, but this is
+;; not modeled here.  We use the latency for the
+;; Lo register, however, as this is the common case.
+;;
+;; Divides keep ALU2 busy, but this isn't expressed here (I think?).
+(define_insn_reservation "r10k_imul_single" 5
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "imul,imul3,imadd")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 6")
+
+(define_insn_reservation "r10k_imul_double" 9
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "imul,imul3,imadd")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 10")
+
+(define_insn_reservation "r10k_idiv_single" 34
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 35")
+
+(define_insn_reservation "r10k_idiv_double" 66
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 67")
+
+
+;; Floating point add/sub, mul, abs value, neg, comp, & moves.
+(define_insn_reservation "r10k_fp_miscadd" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fadd,fabs,fneg,fcmp"))
+  "r10k_fpadd")
+
+(define_insn_reservation "r10k_fp_miscmul" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fmul,fmove"))
+  "r10k_fpmpy")
+
+(define_insn_reservation "r10k_fp_cmove" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SF,DF")))
+  "r10k_fpmpy")
+
+
+;; The fcvt.s.[wl] insn has latency 4, repeat 2.
+;; All other fcvt have latency 2, repeat 1.
+(define_insn_reservation "r10k_fcvt_single" 4
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "I2S")))
+  "r10k_fpadd * 2")
+
+(define_insn_reservation "r10k_fcvt_other" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "!I2S")))
+  "r10k_fpadd")
+
+
+;; Run the fmadd insn through fp-adder first, then fp-multiplier.
+;;
+;; The latency for fmadd is 2 cycles if the result is used
+;; by another fmadd instruction.
+(define_insn_reservation "r10k_fmadd" 4
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fmadd"))
+  "r10k_fpadd, r10k_fpmpy")
+
+(define_bypass 2 "r10k_fmadd" "r10k_fmadd")
+
+
+;; Floating point Divisions & square roots.
+(define_insn_reservation "r10k_fdiv_single" 12
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fdiv,frdiv,frdiv1,frdiv2")
+            (eq_attr "mode" "SF")))
+  "r10k_fpdiv * 14")
+
+(define_insn_reservation "r10k_fdiv_double" 19
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fdiv,frdiv,frdiv1,frdiv2")
+            (eq_attr "mode" "DF")))
+  "r10k_fpdiv * 21")
+
+(define_insn_reservation "r10k_fsqrt_single" 18
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_fsqrt_double" 33
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+(define_insn_reservation "r10k_frsqrt_single" 30
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "frsqrt,frsqrt1,frsqrt2")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_frsqrt_double" 52
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "frsqrt,frsqrt1,frsqrt2")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+
+;; Handle unknown/multi insns here (this is a guess).
+(define_insn_reservation "r10k_unknown" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "unknown,multi"))
+  "r10k_alu1 + r10k_alu2")
diff -Naurp gcc.orig/gcc/config/mips/mips.c gcc/gcc/config/mips/mips.c
--- gcc.orig/gcc/config/mips/mips.c	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.c	2008-08-04 01:26:49.000000000 -0400
@@ -593,6 +593,10 @@ static const struct mips_cpu_info mips_c

    /* MIPS IV processors. */
    { "r8000", PROCESSOR_R8000, 4, 0 },
+  { "r10000", PROCESSOR_R10000, 4, 0 },
+  { "r12000", PROCESSOR_R10000, 4, 0 },
+  { "r14000", PROCESSOR_R10000, 4, 0 },
+  { "r16000", PROCESSOR_R10000, 4, 0 },
    { "vr5000", PROCESSOR_R5000, 4, 0 },
    { "vr5400", PROCESSOR_R5400, 4, 0 },
    { "vr5500", PROCESSOR_R5500, 4, PTF_AVOID_BRANCHLIKELY },
@@ -988,6 +992,19 @@ static const struct mips_rtx_cost_data m
  		     1,           /* branch_cost */
  		     4            /* memory_latency */
    },
+  { /* R1x000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (5),            /* int_mult_si */
+    COSTS_N_INSNS (9),           /* int_mult_di */
+    COSTS_N_INSNS (34),           /* int_div_si */
+    COSTS_N_INSNS (66),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
    { /* SB1 */
      /* These costs are the same as the SB-1A below.  */
      COSTS_N_INSNS (4),            /* fp_add */
@@ -9872,7 +9889,10 @@ mips_issue_rate (void)
  	 but in reality only a maximum of 3 insns can be issued as
  	 floating-point loads and stores also require a slot in the
  	 AGEN pipe.  */
-     return 4;
+    case PROCESSOR_R10000:
+      /* All R10K Processors are quad-issue (being the first MIPS
+         processors to support this feature). */
+      return 4;

      case PROCESSOR_20KC:
      case PROCESSOR_R4130:
diff -Naurp gcc.orig/gcc/config/mips/mips.h gcc/gcc/config/mips/mips.h
--- gcc.orig/gcc/config/mips/mips.h	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.h	2008-08-04 00:05:27.000000000 -0400
@@ -66,6 +66,7 @@ enum processor_type {
    PROCESSOR_R7000,
    PROCESSOR_R8000,
    PROCESSOR_R9000,
+  PROCESSOR_R10000,
    PROCESSOR_SB1,
    PROCESSOR_SB1A,
    PROCESSOR_SR71000,
@@ -241,6 +242,7 @@ enum mips_code_readable_setting {
  #define TARGET_MIPS5500             (mips_arch == PROCESSOR_R5500)
  #define TARGET_MIPS7000             (mips_arch == PROCESSOR_R7000)
  #define TARGET_MIPS9000             (mips_arch == PROCESSOR_R9000)
+#define TARGET_MIPS10000            (mips_arch == PROCESSOR_R10000)
  #define TARGET_SB1                  (mips_arch == PROCESSOR_SB1		\
  				     || mips_arch == PROCESSOR_SB1A)
  #define TARGET_SR71K                (mips_arch == PROCESSOR_SR71000)
@@ -267,6 +269,7 @@ enum mips_code_readable_setting {
  #define TUNE_MIPS6000               (mips_tune == PROCESSOR_R6000)
  #define TUNE_MIPS7000               (mips_tune == PROCESSOR_R7000)
  #define TUNE_MIPS9000               (mips_tune == PROCESSOR_R9000)
+#define TUNE_MIPS10000              (mips_tune == PROCESSOR_R10000)
  #define TUNE_SB1                    (mips_tune == PROCESSOR_SB1		\
  				     || mips_tune == PROCESSOR_SB1A)

diff -Naurp gcc.orig/gcc/config/mips/mips.md gcc/gcc/config/mips/mips.md
--- gcc.orig/gcc/config/mips/mips.md	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.md	2008-08-01 23:05:01.000000000 -0400
@@ -553,7 +553,7 @@
  ;; Attribute describing the processor.  This attribute must match exactly
  ;; with the processor_type enumeration in mips.h.
  (define_attr "cpu"
- 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,xlr"
+ 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,r10000,r12000,r14000,r16000,sb1,sb1a,sr71000,xlr"
    (const (symbol_ref "mips_tune")))

  ;; The type of hardware hazard associated with this instruction.
@@ -903,6 +903,7 @@
  (include "6000.md")
  (include "7000.md")
  (include "9000.md")
+(include "10000.md")
  (include "sb1.md")
  (include "sr71k.md")
  (include "xlr.md")
diff -Naurp gcc.orig/gcc/doc/invoke.texi gcc/gcc/doc/invoke.texi
--- gcc.orig/gcc/doc/invoke.texi	2008-08-01 21:51:46.000000000 -0400
+++ gcc/gcc/doc/invoke.texi	2008-08-04 00:09:12.000000000 -0400
@@ -11980,6 +11980,7 @@ The processor names are:
  @samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400},
  @samp{r4600}, @samp{r4650}, @samp{r6000}, @samp{r8000},
  @samp{rm7000}, @samp{rm9000},
+@samp{r10000}, @samp{r12000}, @samp{r14000}, @samp{r16000},
  @samp{sb1},
  @samp{sr71000},
  @samp{vr4100}, @samp{vr4111}, @samp{vr4120}, @samp{vr4130}, @samp{vr4300},

[-- Attachment #2: gcc-tests-20080803.txt --]
[-- Type: text/plain, Size: 35971 bytes --]

 make -k check
make[1]: Entering directory `/usr/cvsroot/gcc'
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/fixincludes'
autogen -T ../.././fixincludes/check.tpl ../.././fixincludes/inclhack.def
make[2]: autogen: Command not found
make[2]: *** [check] Error 127
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/fixincludes'
make[1]: *** [check-fixincludes] Error 2
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc'
test -d testsuite || mkdir testsuite
test -d testsuite/gcc || mkdir testsuite/gcc
(rootme=`${PWDCMD-pwd}`; export rootme; \
        srcdir=`cd ../.././gcc; ${PWDCMD-pwd}` ; export srcdir ; \
        cd testsuite/gcc; \
        rm -f tmp-site.exp; \
        sed '/set tmpdir/ s|testsuite|testsuite/gcc|' \
                < ../../site.exp > tmp-site.exp; \
        /bin/sh ${srcdir}/../move-if-change tmp-site.exp site.exp; \
        EXPECT=expect ; export EXPECT ; \
        if [ -f ${rootme}/../expect/expect ] ; then  \
           TCL_LIBRARY=`cd .. ; cd ${srcdir}/../tcl/library ; ${PWDCMD-pwd}` ; \
            export TCL_LIBRARY ; fi ; \
        GCC_EXEC_PREFIX="/usr/lib/gcc/" ; export GCC_EXEC_PREFIX ; \
        runtest --tool gcc )
WARNING: Couldn't find the global config file.
Test Run By root on Sun Aug  3 13:48:24 2008
Native configuration is mips-unknown-linux-gnu

                === gcc tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/gcc/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.c-torture/compile/compile.exp ...

Running /usr/cvsroot/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.c-torture/execute/execute.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.c-torture/execute/ieee/ieee.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.c-torture/unsorted/unsorted.exp ...

Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/autopar/autopar.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/charset/charset.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/compat/compat.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/compat/struct-layout-1.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/cpp/cpp.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/cpp/trad/trad.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/debug/debug.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp ...
FAIL: gcc.dg/debug/dwarf2/dwarf-die3.c scan-assembler-not DW_AT_inline
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/dfp/dfp.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/dg.exp ...
WARNING: program timed out.
FAIL: gcc.dg/20020425-1.c (test for excess errors)
FAIL: gcc.dg/pr35729.c scan-rtl-dump-times loop2_invariant "Decided to move invariant" 0
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/fixed-point/fixed-point.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/format/format.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/gomp/gomp.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/ipa/ipa.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/matrix/matrix.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/noncompile/noncompile.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/pch/pch.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/special/mips-abi.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/special/special.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/struct/struct-reorg.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/tls/tls.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/torture/stackalign/stackalign.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/tree-ssa/tree-ssa.exp ...
XPASS: gcc.dg/tree-ssa/data-dep-1.c scan-tree-dump-times ltrans "4, \+, 1" 0
XPASS: gcc.dg/tree-ssa/ltrans-3.c scan-tree-dump-times ltrans "transformed loop" 1
XPASS: gcc.dg/tree-ssa/ssa-fre-13.c scan-tree-dump fre "Inserted .* &a"
XPASS: gcc.dg/tree-ssa/ssa-fre-13.c scan-tree-dump fre "Replaced tmp1_.\(D\)->data"
XPASS: gcc.dg/tree-ssa/ssa-fre-14.c scan-tree-dump fre "Inserted .* &a"
XPASS: gcc.dg/tree-ssa/ssa-fre-14.c scan-tree-dump fre "Replaced tmp1.data"
XPASS: gcc.dg/tree-ssa/ssa-fre-17.c scan-tree-dump fre "Replaced f.doms\[0\].dom with i_"
FAIL: gcc.dg/tree-ssa/ssa-store-ccp-3.c scan-tree-dump-times optimized "conststaticvariable" 1
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vect/costmodel/spu/spu-costmodel-vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vect/vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vmx/vmx.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vxworks/vxworks.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/weak/weak.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/acker1.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/arm-isr.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/bprob.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/dectest.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/dhry.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/gcov.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/linkage.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/matrix1.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/mg-2.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/mg.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/options.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/sieve.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/sort2.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/alpha/alpha.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/arm/arm.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/arm/neon/neon.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/avr/avr.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/avr/torture/avr-torture.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/bfin/bfin.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/cris/cris.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/cris/torture/cris-torture.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/frv/frv.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/i386/i386.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/i386/math-torture/math-torture.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/i386/stackalign/stackalign.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/ia64/ia64.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/m68k/m68k.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/mips/inter/mips16-inter.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/mips/mips.exp ...
FAIL: gcc.target/mips/ext-1.c scan-assembler \tdext\t
FAIL: gcc.target/mips/ext-1.c scan-assembler-not and
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/s390/s390.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/sh/sh.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/sparc/sparc.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/spu/spu.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/x86_64/abi/abi-x86_64.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/xstormy16/xstormy16.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.test-framework/test-framework.exp ...
skipping test framework tests, CHECK_TEST_FRAMEWORK is not defined

                === gcc Summary ===

# of expected passes            48731
# of unexpected failures        6
# of unexpected successes       7
# of expected failures          127
# of unsupported tests          488
/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/xgcc  version 4.4.0 20080802 (experimental) (GCC)

make[2]: [check-gcc] Error 1 (ignored)
test -d testsuite || mkdir testsuite
test -d testsuite/g++ || mkdir testsuite/g++
(rootme=`${PWDCMD-pwd}`; export rootme; \
        srcdir=`cd ../.././gcc; ${PWDCMD-pwd}` ; export srcdir ; \
        cd testsuite/g++; \
        rm -f tmp-site.exp; \
        sed '/set tmpdir/ s|testsuite|testsuite/g++|' \
                < ../../site.exp > tmp-site.exp; \
        /bin/sh ${srcdir}/../move-if-change tmp-site.exp site.exp; \
        EXPECT=expect ; export EXPECT ; \
        if [ -f ${rootme}/../expect/expect ] ; then  \
           TCL_LIBRARY=`cd .. ; cd ${srcdir}/../tcl/library ; ${PWDCMD-pwd}` ; \
            export TCL_LIBRARY ; fi ; \
        GCC_EXEC_PREFIX="/usr/lib/gcc/" ; export GCC_EXEC_PREFIX ; \
        runtest --tool g++ )
WARNING: Couldn't find the global config file.
Test Run By root on Sun Aug  3 17:24:18 2008
Native configuration is mips-unknown-linux-gnu

                === g++ tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/gcc/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/bprob/bprob.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/charset/charset.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/compat/compat.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/compat/struct-layout-1.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/debug/debug.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/debug/dwarf2/dwarf2.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/dg.exp ...
FAIL: g++.dg/ipa/iinline-1.C scan-ipa-dump inline "String::funcOne[^\n]*inline copy in int main"
FAIL: g++.dg/lookup/crash7.C  (test for errors, line 8)
FAIL: g++.dg/lookup/crash7.C (test for excess errors)
FAIL: g++.dg/other/PR23205.C scan-assembler .stabs.*foobar:c=i
FAIL: g++.dg/other/error25.C  (test for errors, line 4)
FAIL: g++.dg/other/error25.C (test for excess errors)
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/gcov/gcov.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/gomp/gomp.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/pch/pch.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/special/ecos.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/tls/tls.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/torture/dg-torture.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/torture/stackalign/stackalign.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/vect/vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.old-deja/old-deja.exp ...

                === g++ Summary ===

# of expected passes            17903
# of unexpected failures        6
# of expected failures          81
# of unsupported tests          143
/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/testsuite/g++/../../g++  version 4.4.0 20080802 (experimental) (GCC)

make[2]: [check-g++] Error 1 (ignored)
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc'
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/intl'
make[2]: Nothing to be done for `check'.
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/intl'
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libcpp'
make[2]: Nothing to be done for `check'.
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libcpp'
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libdecnumber'
make[2]: Nothing to be done for `check'.
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libdecnumber'
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libiberty'
make[3]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libiberty/testsuite'
mips-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -I.. -I../../.././libiberty/testsuite/../../include  -o test-demangle \
                ../../.././libiberty/testsuite/test-demangle.c ../libiberty.a
./test-demangle < ../../.././libiberty/testsuite/demangle-expected
./test-demangle: 770 tests, 0 failures
mips-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -I.. -I../../.././libiberty/testsuite/../../include  -DHAVE_CONFIG_H -I.. -o test-pexecute \
                ../../.././libiberty/testsuite/test-pexecute.c ../libiberty.a
./test-pexecute
mips-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -I.. -I../../.././libiberty/testsuite/../../include  -DHAVE_CONFIG_H -I.. -o test-expandargv \
                ../../.././libiberty/testsuite/test-expandargv.c ../libiberty.a
./test-expandargv
PASS: test-expandargv-0.
PASS: test-expandargv-1.
PASS: test-expandargv-2.
PASS: test-expandargv-3.
make[3]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libiberty/testsuite'
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libiberty'
make[1]: Target `check-host' not remade because of errors.
make[2]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3'
Making check in include
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/include'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/include'
Making check in libsupc++
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/libsupc++'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/libsupc++'
Making check in libmath
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/libmath'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/libmath'
Making check in doc
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/doc'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/doc'
Making check in src
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/src'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/src'
Making check in po
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/po'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/po'
Making check in testsuite
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/testsuite'
make  check-DEJAGNU
make[4]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/testsuite'
Making a new site.exp file...
srcdir=`CDPATH="${ZSH_VERSION+.}:" && cd ../../.././libstdc++-v3/testsuite && pwd`; export srcdir; \
        EXPECT=expect; export EXPECT; \
        runtest=runtest; \
        if /bin/sh -c "$runtest --version" > /dev/null 2>&1; then \
          l='libstdc++'; for tool in $l; do \
            $runtest  --tool $tool --srcdir $srcdir ; \
          done; \
        else echo "WARNING: could not find \`runtest'" 1>&2; :;\
        fi
WARNING: Couldn't find the global config file.
Test Run By root on Sun Aug  3 18:50:10 2008
Native configuration is mips-unknown-linux-gnu

                === libstdc++ tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/libstdc++-v3/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp.
ERROR: could not compile testsuite_allocator.cc
    while executing
"error "could not compile $f""
    (procedure "v3-build_support" line 61)
    invoked from within
"v3-build_support"
    (file "/usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp" line 22)
    invoked from within
"source /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""
Running /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp.
ERROR: could not compile testsuite_allocator.cc
    while executing
"error "could not compile $f""
    (procedure "v3-build_support" line 61)
    invoked from within
"v3-build_support"
    (file "/usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp" line 25)
    invoked from within
"source /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""

                === libstdc++ Summary ===

make[4]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/testsuite'
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/testsuite'
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3'
true "AR_FLAGS=rc" "CC_FOR_BUILD=mips-unknown-linux-gnu-gcc" "CC_FOR_TARGET=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/xgcc -B/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/ -B/usr/mips-unknown-linux-gnu/bin/ -B/usr/mips-unknown-linux-gnu/lib/ -isystem /usr/mips-unknown-linux-gnu/include -isystem /usr/mips-unknown-linux-gnu/sys-include" "CFLAGS=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb  " "CXXFLAGS=-g -O2   -D_GNU_SOURCE  " "CFLAGS_FOR_BUILD=-O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb" "CFLAGS_FOR_TARGET=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb" "INSTALL=/usr/bin/install -c" "INSTALL_DATA=/usr/bin/install -c -m 644" "INSTALL_PROGRAM=/usr/bin/install -c" "INSTALL_SCRIPT=/usr/bin/install -c" "LDFLAGS=" "LIBCFLAGS=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb  " "LIBCFLAGS_FOR_TARGET=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb" "MAKE=make" "MAKEINFO=makeinfo --split-size=5000000 --split-size=5000000  " "PICFLAG=" "PICFLAG_FOR_TARGET=" "SHELL=/bin/sh" "RUNTESTFLAGS=" "exec_prefix=/usr" "infodir=/usr/share/gcc-data/mips-unknown-linux-gnu/gcc-trunk/info" "libdir=/usr/lib" "includedir=/usr/lib/gcc/mips-unknown-linux-gnu/gcc-trunk/include" "prefix=/usr" "tooldir=/usr/mips-unknown-linux-gnu" "gxx_include_dir=/usr/lib/gcc/mips-unknown-linux-gnu/gcc-trunk/include/g++-v4" "AR=/usr/mips-unknown-linux-gnu/bin/ar" "AS=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/as" "LD=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/collect-ld" "RANLIB=/usr/mips-unknown-linux-gnu/bin/ranlib" "NM=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/nm" "NM_FOR_BUILD=" "NM_FOR_TARGET=/usr/mips-unknown-linux-gnu/bin/nm" "DESTDIR=" "WERROR=" DO=all multi-do # make
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3'
make[2]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3'
make[2]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap'
Making check in testsuite
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap/testsuite'
make  check-DEJAGNU
make[4]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap/testsuite'
Making a new site.exp file...
srcdir=`CDPATH="${ZSH_VERSION+.}:" && cd ../../.././libmudflap/testsuite && pwd`; export srcdir; \
        EXPECT=`if [ -f ../../expect/expect ] ; then echo ../../expect/expect ; else echo expect ; fi`; export EXPECT; \
        runtest=`if [ -f ../../.././libmudflap/testsuite/../../dejagnu/runtest ] ; then echo ../../.././libmudflap/testsuite/../../dejagnu/runtest ; else echo runtest ;  fi`; \
        if /bin/sh -c "$runtest --version" > /dev/null 2>&1; then \
          l='libmudflap'; for tool in $l; do \
            $runtest  --tool $tool --srcdir $srcdir ; \
          done; \
        else echo "WARNING: could not find \`runtest'" 1>&2; :;\
        fi
WARNING: Couldn't find the global config file.
Test Run By root on Sun Aug  3 18:51:00 2008
Native configuration is mips-unknown-linux-gnu

                === libmudflap tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/libmudflap/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/cfrags.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/cfrags.exp.
ERROR: couldn't execute "/xgcc": no such file or directory
    while executing
"exec ${gccdir}/xgcc --print-multi-lib"
    (procedure "libmudflap-init" line 38)
    invoked from within
"libmudflap-init c"
    (file "/usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/cfrags.exp" line 4)
    invoked from within
"source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/cfrags.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/cfrags.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""
Running /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/externs.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/externs.exp.
ERROR: couldn't execute "/xgcc": no such file or directory
    while executing
"exec ${gccdir}/xgcc --print-multi-lib"
    (procedure "libmudflap-init" line 38)
    invoked from within
"libmudflap-init c"
    (file "/usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/externs.exp" line 4)
    invoked from within
"source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/externs.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/externs.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""
Running /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/c++frags.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/c++frags.exp.
ERROR: couldn't execute "/xgcc": no such file or directory
    while executing
"exec ${gccdir}/xgcc --print-multi-lib"
    (procedure "libmudflap-init" line 38)
    invoked from within
"libmudflap-init c++"
    (file "/usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/c++frags.exp" line 4)
    invoked from within
"source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/c++frags.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/c++frags.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""
Running /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/ctors.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/ctors.exp.
ERROR: couldn't execute "/xgcc": no such file or directory
    while executing
"exec ${gccdir}/xgcc --print-multi-lib"
    (procedure "libmudflap-init" line 38)
    invoked from within
"libmudflap-init c++"
    (file "/usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/ctors.exp" line 4)
    invoked from within
"source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/ctors.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/ctors.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""
Running /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.cth/cthfrags.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.cth/cthfrags.exp.
ERROR: couldn't execute "/xgcc": no such file or directory
    while executing
"exec ${gccdir}/xgcc --print-multi-lib"
    (procedure "libmudflap-init" line 38)
    invoked from within
"libmudflap-init c"
    (file "/usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.cth/cthfrags.exp" line 4)
    invoked from within
"source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.cth/cthfrags.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.cth/cthfrags.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""

                === libmudflap Summary ===

make[4]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap/testsuite'
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap/testsuite'
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap'
true "AR_FLAGS=rc" "CC_FOR_BUILD=mips-unknown-linux-gnu-gcc" "CFLAGS=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb  " "CXXFLAGS=-g -O2   -D_GNU_SOURCE  " "CFLAGS_FOR_BUILD=-O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb" "CFLAGS_FOR_TARGET=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb" "INSTALL=/usr/bin/install -c" "INSTALL_DATA=/usr/bin/install -c -m 644" "INSTALL_PROGRAM=/usr/bin/install -c" "INSTALL_SCRIPT=/usr/bin/install -c" "JC1FLAGS=" "LDFLAGS=" "LIBCFLAGS=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb  " "LIBCFLAGS_FOR_TARGET=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb" "MAKE=make" "MAKEINFO=makeinfo --split-size=5000000 --split-size=5000000  " "PICFLAG=" "PICFLAG_FOR_TARGET=" "SHELL=/bin/sh" "RUNTESTFLAGS=" "exec_prefix=/usr" "infodir=/usr/share/gcc-data/mips-unknown-linux-gnu/gcc-trunk/info" "libdir=/usr/lib" "prefix=/usr" "includedir=/usr/lib/gcc/mips-unknown-linux-gnu/gcc-trunk/include" "AR=/usr/mips-unknown-linux-gnu/bin/ar" "AS=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/as" "CC=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/xgcc -B/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/ -B/usr/mips-unknown-linux-gnu/bin/ -B/usr/mips-unknown-linux-gnu/lib/ -isystem /usr/mips-unknown-linux-gnu/include -isystem /usr/mips-unknown-linux-gnu/sys-include" "CXX=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/g++ -B/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/ -nostdinc++ -nostdinc++ -I/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/include/mips-unknown-linux-gnu -I/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/include -I/usr/cvsroot/gcc/libstdc++-v3/libsupc++ -I/usr/cvsroot/gcc/libstdc++-v3/include/backward -I/usr/cvsroot/gcc/libstdc++-v3/testsuite/util -L/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/src -L/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/src/.libs -B/usr/mips-unknown-linux-gnu/bin/ -B/usr/mips-unknown-linux-gnu/lib/ -isystem /usr/mips-unknown-linux-gnu/include -isystem /usr/mips-unknown-linux-gnu/sys-include" "LD=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/collect-ld" "LIBCFLAGS=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb  " "NM=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/nm" "PICFLAG=" "RANLIB=/usr/mips-unknown-linux-gnu/bin/ranlib" "DESTDIR=" DO=all multi-do # make
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap'
make[2]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap'
make[2]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libiberty'
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libiberty/testsuite'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libiberty/testsuite'
make[2]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libiberty'
make[2]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp'
Making check in testsuite
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp/testsuite'
make  check-DEJAGNU
make[4]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp/testsuite'
Making a new site.exp file...
srcdir=`CDPATH="${ZSH_VERSION+.}:" && cd ../../.././libgomp/testsuite && pwd`; export srcdir; \
        EXPECT=expect; export EXPECT; \
        runtest=runtest; \
        if /bin/sh -c "$runtest --version" > /dev/null 2>&1; then \
          l='libgomp'; for tool in $l; do \
            $runtest  --tool $tool --srcdir $srcdir ; \
          done; \
        else echo "WARNING: could not find \`runtest'" 1>&2; :;\
        fi
WARNING: Couldn't find the global config file.
Test Run By root on Sun Aug  3 18:51:06 2008
Native configuration is mips-unknown-linux-gnu

                === libgomp tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/libgomp/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/libgomp/testsuite/libgomp.c/c.exp ...
WARNING: program timed out.
FAIL: libgomp.c/appendix-a/a.18.1.c execution test
FAIL: libgomp.c/barrier-1.c (test for excess errors)
WARNING: libgomp.c/barrier-1.c compilation failed to produce executable
FAIL: libgomp.c/collapse-1.c (test for excess errors)
WARNING: libgomp.c/collapse-1.c compilation failed to produce executable
FAIL: libgomp.c/collapse-2.c (test for excess errors)
WARNING: libgomp.c/collapse-2.c compilation failed to produce executable
FAIL: libgomp.c/collapse-3.c (test for excess errors)
WARNING: libgomp.c/collapse-3.c compilation failed to produce executable
FAIL: libgomp.c/critical-1.c (test for excess errors)
WARNING: libgomp.c/critical-1.c compilation failed to produce executable
FAIL: libgomp.c/debug-1.c (internal compiler error)
FAIL: libgomp.c/debug-1.c (test for excess errors)
WARNING: libgomp.c/debug-1.c compilation failed to produce executable
FAIL: libgomp.c/icv-1.c execution test
FAIL: libgomp.c/lib-2.c (test for excess errors)
WARNING: libgomp.c/lib-2.c compilation failed to produce executable
FAIL: libgomp.c/lock-1.c execution test
FAIL: libgomp.c/lock-2.c execution test
FAIL: libgomp.c/loop-1.c (test for excess errors)
WARNING: libgomp.c/loop-1.c compilation failed to produce executable
FAIL: libgomp.c/loop-10.c execution test
FAIL: libgomp.c/loop-2.c (test for excess errors)
WARNING: libgomp.c/loop-2.c compilation failed to produce executable
FAIL: libgomp.c/loop-3.c execution test
FAIL: libgomp.c/loop-5.c (test for excess errors)
WARNING: libgomp.c/loop-5.c compilation failed to produce executable
FAIL: libgomp.c/loop-6.c (test for excess errors)
WARNING: libgomp.c/loop-6.c compilation failed to produce executable
FAIL: libgomp.c/loop-7.c (test for excess errors)
WARNING: libgomp.c/loop-7.c compilation failed to produce executable
FAIL: libgomp.c/loop-8.c (test for excess errors)
WARNING: libgomp.c/loop-8.c compilation failed to produce executable
FAIL: libgomp.c/loop-9.c (test for excess errors)
WARNING: libgomp.c/loop-9.c compilation failed to produce executable
FAIL: libgomp.c/nested-3.c (test for excess errors)
WARNING: libgomp.c/nested-3.c compilation failed to produce executable
FAIL: libgomp.c/nestedfn-6.c (internal compiler error)
FAIL: libgomp.c/nestedfn-6.c (test for excess errors)
WARNING: libgomp.c/nestedfn-6.c compilation failed to produce executable
FAIL: libgomp.c/omp_workshare3.c  (test for errors, line 33)
FAIL: libgomp.c/omp_workshare3.c (test for excess errors)
FAIL: libgomp.c/ordered-1.c (test for excess errors)
WARNING: libgomp.c/ordered-1.c compilation failed to produce executable
FAIL: libgomp.c/ordered-2.c (test for excess errors)
WARNING: libgomp.c/ordered-2.c compilation failed to produce executable
FAIL: libgomp.c/parallel-1.c (test for excess errors)
WARNING: libgomp.c/parallel-1.c compilation failed to produce executable
FAIL: libgomp.c/pr26943-2.c  (test for warnings, line 23)
FAIL: libgomp.c/pr26943-2.c  (test for warnings, line 34)
FAIL: libgomp.c/pr26943-3.c  (test for warnings, line 29)
FAIL: libgomp.c/pr26943-3.c  (test for warnings, line 40)
FAIL: libgomp.c/pr26943-4.c  (test for warnings, line 30)
FAIL: libgomp.c/pr26943-4.c  (test for warnings, line 41)
FAIL: libgomp.c/reduction-5.c (internal compiler error)
FAIL: libgomp.c/reduction-5.c (test for excess errors)
WARNING: libgomp.c/reduction-5.c compilation failed to produce executable
FAIL: libgomp.c/sections-1.c (test for excess errors)
WARNING: libgomp.c/sections-1.c compilation failed to produce executable
FAIL: libgomp.c/single-1.c (test for excess errors)
WARNING: libgomp.c/single-1.c compilation failed to produce executable
FAIL: libgomp.c/task-1.c execution test
Running /usr/cvsroot/gcc/libgomp/testsuite/libgomp.c++/c++.exp ...
No libstdc++ library found, will not execute c++ tests
Running /usr/cvsroot/gcc/libgomp/testsuite/libgomp.fortran/fortran.exp ...

                === libgomp Summary ===

# of expected passes            182
# of unexpected failures        40
make[4]: *** [check-DEJAGNU] Error 1
make[4]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp/testsuite'
make[3]: *** [check-am] Error 2
make[3]: Target `check' not remade because of errors.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp/testsuite'
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp'
true  DO=all multi-do # make
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp'
make[2]: *** [check-recursive] Error 1
make[2]: Target `check' not remade because of errors.
make[2]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp'
make[1]: *** [check-target-libgomp] Error 2
make[1]: Target `check-target' not remade because of errors.
make[1]: Leaving directory `/usr/cvsroot/gcc'
make: *** [do-check] Error 2
make: Target `check' not remade because of errors.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-04  7:20       ` Kumba
@ 2008-08-04 19:23         ` Richard Sandiford
  2008-08-04 19:30           ` contribute.html: compare pre/post patch testresults (was: [PATCH]: GCC Scheduler support for R10000 on MIPS) Ralf Wildenhues
  2008-08-05  2:48           ` [PATCH]: GCC Scheduler support for R10000 on MIPS Kumba
  0 siblings, 2 replies; 22+ messages in thread
From: Richard Sandiford @ 2008-08-04 19:23 UTC (permalink / raw)
  To: Kumba; +Cc: gcc-patches, mips

Kumba <kumba@gentoo.org> writes:
> Richard Sandiford wrote:
>> 
>> I see Ralf's already answered this.
>
> Yup, and I ran it earlier.  Took quite a bit to finish, but I figured
> it wasn't going to be a quick endeavor.  There's a couple of failures,
> but I'm guessing some failures are expected.  Not sure what counts and
> what doesn't, so I've attached it.

In general, the thing to do is compare the post-patch results with the
pre-patch results.

I certainly wouldn't have expected so many libgomp failures,
and it's a bad sign if the libstdc++ testsuite can't run.  But these
are probably system-specific issues rather than problems with the patch.
The results for the "gcc" subtests look OK.

> It was run from a fully-compiled gcc, not bootstrap, so I'm unsure if
> that affects the output any.

Shouldn't matter.

>> There doesn't seem to be anything in the description linking the
>> FP multiplier cpu_unit with the division and sqare root cpu_units,
>> so I'm pretty sure it isn't modelled.  Which is fine.  I just think
>> you should add something like "We don't model this at present."
>> to the end of the comment.
>> 
>> (There's no shame in that.  It's common to omit some details from the
>> DFA description, and only mention them in the comments.  The aim after
>> all is to get good code, not to describe the pipeline with complete accuracy.
>> Sometimes omitting details gives better code.)
>
> Ah!  That's probably because I wasn't sure how to link the division
> and square-root units to the multiplier.  I knew that they had to be
> linked, because as the R10K manual stated, they're separate/parallel
> units, but their issue & completion logic is shared by the multiplier.
> So I know that if the multiplier is busy in either of these two
> stages, it'll cause a delay for these other two units, right?.
>
> That I think is why I had only three automata, and was funneling
> squareroot and division into the r10k_fp automata.  I figured this
> represented "linking" the multiplier and these two units.  I suppose
> that wasn't accurate, though?

No.

> Can we even model the issue and completion stages of a cpu unit?

Sure.  Just create issue and completion cpu_units and add them to
the insn resservations.  If you're interested in doing this, have a look
at other scheduler descriptions for inspiration.  (Not just MIPS ones.)

>> Add:
>> 
>>     (automata_option "v")
>> 
>> to one of the .md files and do "make insn-automata.c".  This will
>> create a file called "mips.dfa" in the build directory.  At the end
>> of that file is a summary of the automata.  The interesting thing is
>> the number of DFA states and DFA arcs in the r10k_* automata.
>> 
>> (Hadn't realised it was so hard to get at this information these days.
>> It used to be printed on stderr.  There's also support for adding "-v"
>> to the genautomata command line, but it seems to have bitrotted and
>> no longer works.)
>
> Thanks, this worked great.  I captured the entire build output looking
> for this verbosity, but didn't see it; I guess it was hidden at some
> point.
>
> Breaking out those two units into their own automata changes things
> quite a bit.  The resulting mips.dfa file is only about 300,000 lines
> long, and the r10k_fp automaton now has only 8 states (division has 22
> and square root has 36 states.  Originally, the r10k_fp automaton had
> 6336 states (and the mips.dfa file was 500,000+ lines long).  So this
> seems to bring the state numbers down to look more like the other mips
> cpu automatons.  I can pass those along if you're interested.

FWIW, the size of mips.dfa doesn't really count for much.
The reduction in states is very big though, so yeah, this is
something we should keep.

> And yeah, I looked at the option parsing bit in genautomata.c.  It
> looks like it should work, but that if-then-else construct it's got
> going just seems to fail somehow.

The loop isn't even reached in the problem cases.  init_md_reader
expects to understand all arguments.

The right fix would be to call init_md_reader_args_cb instead, with a
custom callback.

>> TBH, the only way to know is to try it and measure the result.
>> 
>> And like I say, there's absolutely no need to try it.  I was just trying
>> to say that the comment should mention bypasses instead of lo_operand.
>
> Curiosity demands I at least look it up :)
>
> I wrote that comment based on what I thought was the way to check -- I
> wasn't aware that multiplications and divisions clobbered both HI and
> LO, so I can see why bypasses are the way to go.
>
> Speaking of predicates, I get what to do now.  Define a custom predicate in 
> mips.c (I guess, "mips_check_insn_hi_p" ?).  Here's what I think so far by 
> looking at those two predicates that you mentioned:
>
> mips_check_insn_hi_p(rtx insn)
> {
>    return IS_INSN_HI(insn)
> }
>
> Then in 10000.md, something like:
>
> (define_bypass 6 "r10k_imul_single" "mips_check_insn_hi_p")

You need to add the names of the target insn reservations too.
Probably the ones for mfhilo.

In fact, if you split the mfhilo reservation into two, one for mfhi and
one for mflo, you could avoid the predicate.  (And yes, you'd be using
lo_operand to do the split; see sb1.md for how.  But it's the bypass
that's the key.)

> And do you know what the attr type is for MULTU or DMULTU?  imul, imul3, and 
> imadd don't seem to fit (I am assuming MULTU/DMULTU and friends are for 
> unsigned?).  The R10K manual has different latencies for those, and it looks 
> like I don't have insn reservations defined for those.  Or is this another 
> define_bypass + custom predicate to check for signed/unsigned?

MULTU and DMULTU are classed as IMUL.  If you want to split it into
signed and unsigned -- a reasonable thing to do -- you need to add
a new value to the "type" attribute.  You'd then need to update all
uses of IMUL in config/mips to check for both the signed and unsigned
versions.

If you do this, please do the split as a separate patch.  It's easier
to review that way.

>> Experimentation, basically.  Costs are used to choose between
>> two equivalent implementations of an operation.  E.g. multiplication
>> by a constant can be done using a single multiplication insn or by
>> a sequence of shifts and adds.
>> 
>> The target-independent code calculates the cost of a sequence of
>> insns simply by adding them up.  It doesn't take into account how
>> the pipeline might issue them, or what the repeat rates are.
>> 
>> So COSTS_N_INSNS (latency) is a good start, but is often too high on
>> superscalar pipelines, where breaking a monolithic operation into
>> smaller operations can exploit the parallelism better.  For example,
>> if multiplication takes 5 cycles on a dual-issue target, a multiplication
>> is often (but not always!) more expensive than 5 single-cycle insns.
>> 
>> The costs are just heuristics, and you have to accept that any given
>> choice of values is going to make some things better and some things
>> worse.  When I've done scheduling work in the past, I simply tried
>> various values and run the result through (commercial) benchmarks.
>
> Well, I know of no commercial benchmarking tools for Linux/Mips on SGI
> systems, and since it sounds like it's mostly guesswork to begin with,
> I guess using the same values as the latencies should be kosher.

Doesn't have to be commerical, and most benchmarks aren't hugely
system-specific.  As always with these things: pick something
(a program or a benchmark) that matters to you.  Maybe an
example of your typical workload, or whatever.

Again, I'm not saying you should run any performance tests.
But it is certainly possible to run them on this system.

> I don't suppose there's any rule of thumb involving superscalar
> pipelines out there that might say, slice a couple digits off these
> default latencies?

No, it's too target-specific.  Like I say, it really is a case of
trying it and seeing.

> Attached is round three.  Other changes not mentioned above include adding 
> frdiv1/2 and frsqrt1/2 insns to the existing reservations.  No idea if R10K 
> supports these, but better safe than sorry.

The R10K doesn't support them, but I guess it's OK to add them anyway.

> I also added the 'move' insn, even though the manual makes no explicit
> mention of an unconditional integer register move operand (only
> integer condmove).

There is no unconditional register move instruction (except in MIPS16).
As the comment says, it's just a special form of addition:

;; move		integer register move ({,D}ADD{,U} with rt = 0)

So yes, adding this is the right thing to do.

> Also, the R10K manual doesn't seem to differentiate betweem fmadd and
> imadd.  In the latency table, it simply states "MADD" -- might I infer
> this to assume that R10K itself doesn't distingush between imadd or
> fmadd, treats them the same, and so I need to follow suit? (I've got
> imadd set to run on ALU2, whereas fmadd runs on the fp multiplier).

The R10K doesn't have integer multiply-accumulate instructions.
I.e. no IMADD.  MADD == MADD.fmt, i.e. FMADD.

> And happen to know what kind of insns LWC1/LDC1/LWXC1/LDXC1 match?
> fpload/fpidxload by chance?  Referenced in the manual, they look like
> loads, but they have a different latency (which is how I coded them,
> but wanted to double check).

Yes, LWC1 and LDC1 are FPLOAD, and LWXC1 and LDXC1 are FPIDXLOAD.

Not sure whether the patch you attached was the latest one.
It didn't mention MOVE, FPDIV1, etc.  And like I say, "cpu"
must exactly mirror "enum processor_type", so you need to
remove "r12000", "r14000" and "r16000" from "cpu" too.

> +;; R10000 has int queue, fp queue, address queue.
> +;; We split the fp queue into standard fp, fp division, and
> +;; fp square root to further optimize the automata, though.

"reduce the size of the automata" might be clearer.

> +(define_automaton "r10k_int, r10k_fp, r10k_fpdivision,
> +                   r10k_fpsqroot, r10k_addr")

Are cpu_units and automata allowed to have the same name?
If so, I'd prefer to call them r10k_fpdiv and f10k_fpsqrt.

Richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* contribute.html: compare pre/post patch testresults (was: [PATCH]:  GCC Scheduler support for R10000 on MIPS)
  2008-08-04 19:23         ` Richard Sandiford
@ 2008-08-04 19:30           ` Ralf Wildenhues
  2008-08-06 14:51             ` Ian Lance Taylor
  2008-08-05  2:48           ` [PATCH]: GCC Scheduler support for R10000 on MIPS Kumba
  1 sibling, 1 reply; 22+ messages in thread
From: Ralf Wildenhues @ 2008-08-04 19:30 UTC (permalink / raw)
  To: Kumba, gcc-patches, mips, rdsandiford

* Richard Sandiford wrote on Mon, Aug 04, 2008 at 09:05:36PM CEST:
> 
> In general, the thing to do is compare the post-patch results with the
> pre-patch results.

Well, if that is the thing to do _in general_, then it should be
openly stated as such.  OK for the web?

Thanks,
Ralf

2008-08-04  Ralf Wildenhues  <Ralf.Wildenhues@gmx.de>

	* htdocs/contribute.html (testing): Test results should be
	compared to pre-patch or gcc-testresults list data.

Index: htdocs/contribute.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/contribute.html,v
retrieving revision 1.69
diff -u -r1.69 contribute.html
--- htdocs/contribute.html	24 Feb 2008 14:03:12 -0000	1.69
+++ htdocs/contribute.html	4 Aug 2008 19:19:00 -0000
@@ -90,7 +90,10 @@
 feature.  If the test framework permits, you should automate these
 tests and add them to GCC's testsuite.  You must also perform
 regression tests to ensure that your patch does not break anything
-else.</p>
+else.  Typically, this means comparing post-patch test results to
+pre-patch results by testing twice or comparing with recent posts to
+the <a href="http://gcc.gnu.org/ml/gcc-testresults/">gcc-testresults
+list</a>.</p>
 
 <h3>Which tests to perform</h3>
 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-04 19:23         ` Richard Sandiford
  2008-08-04 19:30           ` contribute.html: compare pre/post patch testresults (was: [PATCH]: GCC Scheduler support for R10000 on MIPS) Ralf Wildenhues
@ 2008-08-05  2:48           ` Kumba
  2008-08-05 18:29             ` Richard Sandiford
  1 sibling, 1 reply; 22+ messages in thread
From: Kumba @ 2008-08-05  2:48 UTC (permalink / raw)
  To: gcc-patches, mips, rdsandiford

Richard Sandiford wrote:
> 
> In general, the thing to do is compare the post-patch results with the
> pre-patch results.

Gotcha, I'll keep this in mind for when I roll the final patch.

> Sure.  Just create issue and completion cpu_units and add them to
> the insn resservations.  If you're interested in doing this, have a look
> at other scheduler descriptions for inspiration.  (Not just MIPS ones.)

Hmm, the catch is in how I think, I've largely modeled this R10k pipeline 
descriptor spatially in my head, according to the block diagram the manual gives 
and factoring in some of the errata notes.  Hence why I initially created three 
automata for the three pipelines (int queue, fp queue, and address queue), and 
then defined the five cpu units as the five main blocks (alu1, alu2 (fed by int 
queue); fp-multiply, fp-add (fed by fp queue); and load/store (fed by address 
queue)).

The multiplier gets tricky because it's in its own right subdivided into 
issue/completion, and fp-divide, fp-sqrt, and fp-multiply components, the latter 
three of which run parallel to each it other.  It's when you define all these 
cpu units, that all tie to the fp queue, that the state count launches itself to 
the moon.

So might it be more reasonable to create the fp multiplier as a separate 
automata from the fp-queue (and fp adder), and define cpu_units to match the 
issue and completion stages, multiply, divide, and square root stages?  Then we 
can multiply like this:

(define_insn_reservation "r10k_fp_miscmul" 2
   (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
        (eq_attr "type" "fmul,fmove"))
   "r10k_mpy_issue + r10k_fpmpy + r10k_mpr_compl")

And fdiv and fsqrts like this:

(define_insn_reservation "r10k_fdiv_single" 12
   (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
        (and (eq_attr "type" "fdiv,frdiv")
             (eq_attr "mode" "SF")))
   "(r10k_mpy_issue + r10k_fpdiv + r10k_mpy_compl) * 14")

(define_insn_reservation "r10k_fsqrt_single" 18
   (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
        (and (eq_attr "type" "fsqrt")
             (eq_attr "mode" "SF")))
   "(r10k_mpy_issue + r10k_fpsqrt  + r10k_mpy_compl) * 20")

This should setup the "sharing" of the multiplier's issue and completion units, 
right?

> You need to add the names of the target insn reservations too.
> Probably the ones for mfhilo.
> 
> In fact, if you split the mfhilo reservation into two, one for mfhi and
> one for mflo, you could avoid the predicate.  (And yes, you'd be using
> lo_operand to do the split; see sb1.md for how.  But it's the bypass
> that's the key.)

Okay, this might be easier than the predicates.  How does something like this look?:

(define_insn_reservation "r10k_mfhi" 1
   (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
        (and (eq_attr "type" "mfhilo")
             (not (match_operand 1 "lo_operand"))))
   "r10k_alu1 | r10k_alu2")

(define_insn_reservation "r10k_mflo" 1
   (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
        (and (eq_attr "type" "mfhilo")
             (match_operand 1 "lo_operand")))
   "r10k_alu1 | r10k_alu2")

And then for the bypasses:

(define_bypass 6 "r10k_imul_single" "r10k_mfhi")
(define_bypass 10 "r10k_imul_double" "r10k_mfhi")
(define_bypass 35 "r10k_idiv_single" "r10k_mfhi")
(define_bypass 67 "r10k_idiv_double" "r10k_mfhi")

We'll set the default latency for the insn_reservations to the LO latency, and 
use bypasses for the HI latency.  Sound kosher?

I also assume we need not worry about the mthilo insn, right?

> MULTU and DMULTU are classed as IMUL.  If you want to split it into
> signed and unsigned -- a reasonable thing to do -- you need to add
> a new value to the "type" attribute.  You'd then need to update all
> uses of IMUL in config/mips to check for both the signed and unsigned
> versions.
> 
> If you do this, please do the split as a separate patch.  It's easier
> to review that way.

Reasonable, but that would probably take a while for me to research on.  I'm 
still fairly new to gcc internals, and still don't quite know my way around 
things.  Defining entirely new types is not something I'll figure out very 
easily.  Probably worth looking at after getting this patch in.

So I'll ask this then.  The R10K Manual states that MULT latency is 5/6 (Lo/Hi) 
while MULTU is 6/7 (Lo/Hi).  If the "imul" type combines both MULT and MULTU, 
then should we compromise on the latency and use 6 instead of 5 or 7?  Later on, 
if I work out a way to add that type (or someone else does), then we can 
re-tweak 10000.md to relfect the change.

On a different thought, though, does a predicate exist like lo_operand that can 
determine signed versus unsigned?  Seems like it could be used to create 
insn_reservations of say, r10k_imul_single_signed and r10k_imul_single_unsigned, 
set their default latencies to the LO register, then use more define_bypasses on 
the mfhi bit to set the HI latencies.

> Doesn't have to be commerical, and most benchmarks aren't hugely
> system-specific.  As always with these things: pick something
> (a program or a benchmark) that matters to you.  Maybe an
> example of your typical workload, or whatever.

The typical workload on this box is compiling, and pretty much anything that 
speeds up gcc is a blessing.  I'm still hunting for one of those elusive dual 
R14K modules for this machine (at a reasonable price, which $2,000 is not).

> The R10K doesn't have integer multiply-accumulate instructions.
> I.e. no IMADD.  MADD == MADD.fmt, i.e. FMADD.

Okay, I moved imadd down to the reservation that handles unknown and multi as a 
precaution.  (i.e., Murphy's law)

> Not sure whether the patch you attached was the latest one.
> It didn't mention MOVE, FPDIV1, etc.  And like I say, "cpu"
> must exactly mirror "enum processor_type", so you need to
> remove "r12000", "r14000" and "r16000" from "cpu" too.

Hmm, it might've not been.  I'll double check before I send the next one.

And by removing r12000-r16000, you mean in mips.md, from the "cpu" field, right? 
  I thought that specifies the list of valid params to -march.  Or, do you mean 
from the "cpu" attr in 10000.md where we do all our checking in each insn?  Does 
gcc somehow translate r12000 passed to -march to requal r10000 via the mapping 
to PROCESSOR_R10000?

I wanted to keep r12000 through r16000 around as valid values, even if they map 
back to r10000 in the end.  Might it make more sense still to use r1x000 to 
reference the entire family based on current knowledge?

> Are cpu_units and automata allowed to have the same name?
> If so, I'd prefer to call them r10k_fpdiv and f10k_fpsqrt.

That's a very good question.  I wasn't sure, so I spelled them out due to a lack 
of creativity in the naming.

Maybe I should called the automata r10k_a_int, r10k_a_fp, r10k_a_fpdiv, 
r10k_a_fpsqrt, and r10k_a_addr, then map cpu units to those?

Cheers!,

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org

"The past tempts us, the present confuses us, the future frightens us.  And our 
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-05  2:48           ` [PATCH]: GCC Scheduler support for R10000 on MIPS Kumba
@ 2008-08-05 18:29             ` Richard Sandiford
  2008-08-06  7:58               ` Kumba
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Sandiford @ 2008-08-05 18:29 UTC (permalink / raw)
  To: Kumba; +Cc: gcc-patches, mips

Kumba <kumba@gentoo.org> writes:
>> Sure.  Just create issue and completion cpu_units and add them to
>> the insn resservations.  If you're interested in doing this, have a look
>> at other scheduler descriptions for inspiration.  (Not just MIPS ones.)
>
> Hmm, the catch is in how I think, I've largely modeled this R10k
> pipeline descriptor spatially in my head, according to the block
> diagram the manual gives and factoring in some of the errata notes.
> Hence why I initially created three automata for the three pipelines
> (int queue, fp queue, and address queue), and then defined the five
> cpu units as the five main blocks (alu1, alu2 (fed by int queue);
> fp-multiply, fp-add (fed by fp queue); and load/store (fed by address
> queue)).
>
> The multiplier gets tricky because it's in its own right subdivided
> into issue/completion, and fp-divide, fp-sqrt, and fp-multiply
> components, the latter three of which run parallel to each it other.
> It's when you define all these cpu units, that all tie to the fp
> queue, that the state count launches itself to the moon.
>
> So might it be more reasonable to create the fp multiplier as a
> separate automata from the fp-queue (and fp adder), and define
> cpu_units to match the issue and completion stages, multiply, divide,
> and square root stages?  Then we can multiply like this:
>
> (define_insn_reservation "r10k_fp_miscmul" 2
>    (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
>         (eq_attr "type" "fmul,fmove"))
>    "r10k_mpy_issue + r10k_fpmpy + r10k_mpr_compl")
>
> And fdiv and fsqrts like this:
>
> (define_insn_reservation "r10k_fdiv_single" 12
>    (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
>         (and (eq_attr "type" "fdiv,frdiv")
>              (eq_attr "mode" "SF")))
>    "(r10k_mpy_issue + r10k_fpdiv + r10k_mpy_compl) * 14")
>
> (define_insn_reservation "r10k_fsqrt_single" 18
>    (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
>         (and (eq_attr "type" "fsqrt")
>              (eq_attr "mode" "SF")))
>    "(r10k_mpy_issue + r10k_fpsqrt  + r10k_mpy_compl) * 20")

I think you want "foo, bar" (foo one cycle, then bar the next)
rather than "foo + bar" (foo and bar simultaneously).  And you don't
want to tie up the issue and completion units for more than one cycle.

>> You need to add the names of the target insn reservations too.
>> Probably the ones for mfhilo.
>> 
>> In fact, if you split the mfhilo reservation into two, one for mfhi and
>> one for mflo, you could avoid the predicate.  (And yes, you'd be using
>> lo_operand to do the split; see sb1.md for how.  But it's the bypass
>> that's the key.)
>
> Okay, this might be easier than the predicates.  How does something like this look?:
>
> (define_insn_reservation "r10k_mfhi" 1
>    (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
>         (and (eq_attr "type" "mfhilo")
>              (not (match_operand 1 "lo_operand"))))
>    "r10k_alu1 | r10k_alu2")
>
> (define_insn_reservation "r10k_mflo" 1
>    (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
>         (and (eq_attr "type" "mfhilo")
>              (match_operand 1 "lo_operand")))
>    "r10k_alu1 | r10k_alu2")
>
> And then for the bypasses:
>
> (define_bypass 6 "r10k_imul_single" "r10k_mfhi")
> (define_bypass 10 "r10k_imul_double" "r10k_mfhi")
> (define_bypass 35 "r10k_idiv_single" "r10k_mfhi")
> (define_bypass 67 "r10k_idiv_double" "r10k_mfhi")
>
> We'll set the default latency for the insn_reservations to the LO
> latency, and use bypasses for the HI latency.  Sound kosher?

Looks good.

> I also assume we need not worry about the mthilo insn, right?

Not as targets of these bypasses, no.

>> MULTU and DMULTU are classed as IMUL.  If you want to split it into
>> signed and unsigned -- a reasonable thing to do -- you need to add
>> a new value to the "type" attribute.  You'd then need to update all
>> uses of IMUL in config/mips to check for both the signed and unsigned
>> versions.
>> 
>> If you do this, please do the split as a separate patch.  It's easier
>> to review that way.
>
> Reasonable, but that would probably take a while for me to research on.  I'm 
> still fairly new to gcc internals, and still don't quite know my way around 
> things.  Defining entirely new types is not something I'll figure out very 
> easily.  Probably worth looking at after getting this patch in.
>
> So I'll ask this then.  The R10K Manual states that MULT latency is
> 5/6 (Lo/Hi) while MULTU is 6/7 (Lo/Hi).  If the "imul" type combines
> both MULT and MULTU, then should we compromise on the latency and use
> 6 instead of 5 or 7?  Later on, if I work out a way to add that type
> (or someone else does), then we can re-tweak 10000.md to relfect the
> change.

Again, I'm afraid it's really a case of trying and seeing what gives
the best performance. ;)

> On a different thought, though, does a predicate exist like lo_operand
> that can determine signed versus unsigned?  Seems like it could be
> used to create insn_reservations of say, r10k_imul_single_signed and
> r10k_imul_single_unsigned, set their default latencies to the LO
> register, then use more define_bypasses on the mfhi bit to set the HI
> latencies.

This isn't the preferred approach.  Splitting mfhilo is on my TODO list,
so that we don't need to use match_operand in the schedulers.  But using
lo_operand for mfhilo is better than using a predicate for signedness vs.
unsignedness, for two reasons:

  - Predicates are really for defining insns.  lo_operand is useful
    for that, so we'd have it regardless of whether the schedulers
    needed it.

  - Operand 1 of an mfhilo is guaranteed to be the source, so a simple
    predicate check is enough.  On the other hand, there's no defined
    mapping between the multiplication operator (MULT, MULTU) and the
    operands.  So you couldn't really have:

        (match_operand N "signed_multiplication")

    because there's no known value for N.

>> Doesn't have to be commerical, and most benchmarks aren't hugely
>> system-specific.  As always with these things: pick something
>> (a program or a benchmark) that matters to you.  Maybe an
>> example of your typical workload, or whatever.
>
> The typical workload on this box is compiling, and pretty much anything that 
> speeds up gcc is a blessing.  I'm still hunting for one of those elusive dual 
> R14K modules for this machine (at a reasonable price, which $2,000 is not).

OK.  The time taken to compile something is certainly a valid benchmark.
(It forms part of SPECINT, of course.)

>> The R10K doesn't have integer multiply-accumulate instructions.
>> I.e. no IMADD.  MADD == MADD.fmt, i.e. FMADD.
>
> Okay, I moved imadd down to the reservation that handles unknown and
> multi as a precaution.  (i.e., Murphy's law)

Nothing goes wrong if you fail to handle an instruction.  The compiler
won't crash or anything.

At this stage we're dealing with things like "-march=r4130 -mtune=r10000".
I don't think any particular handling of imadd is better than any other
in that case.  So my personal perference would be to leave out the
unnecessary insns (IMADD, SIGNEXT, FRDIV1, etc.).

>> Not sure whether the patch you attached was the latest one.
>> It didn't mention MOVE, FPDIV1, etc.  And like I say, "cpu"
>> must exactly mirror "enum processor_type", so you need to
>> remove "r12000", "r14000" and "r16000" from "cpu" too.
>
> Hmm, it might've not been.  I'll double check before I send the next one.
>
> And by removing r12000-r16000, you mean in mips.md, from the "cpu"
> field, right?

Right.

> I thought that specifies the list of valid params to
> -march.

Nope, that's mips.c:mips_cpu_table (which you're already handling
correctly).  "cpu" is just an .md copy of enum processor_type.

>> Are cpu_units and automata allowed to have the same name?
>> If so, I'd prefer to call them r10k_fpdiv and f10k_fpsqrt.
>
> That's a very good question.  I wasn't sure, so I spelled them out due to a lack 
> of creativity in the naming.
>
> Maybe I should called the automata r10k_a_int, r10k_a_fp, r10k_a_fpdiv, 
> r10k_a_fpsqrt, and r10k_a_addr, then map cpu units to those?

That'd be OK with me.

Richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-05 18:29             ` Richard Sandiford
@ 2008-08-06  7:58               ` Kumba
  2008-08-07 21:24                 ` Richard Sandiford
  0 siblings, 1 reply; 22+ messages in thread
From: Kumba @ 2008-08-06  7:58 UTC (permalink / raw)
  To: gcc-patches, mips, rdsandiford

Richard Sandiford wrote:
> 
> I think you want "foo, bar" (foo one cycle, then bar the next)
> rather than "foo + bar" (foo and bar simultaneously).  And you don't
> want to tie up the issue and completion units for more than one cycle.

That would make sense.  However, converting the 'foo + bar' to 'foo, bar' only 
seems to work as long as there aren't any repeat rates.  Down in the fdiv bits, 
as it start to calculate the repeat rates, it starts to send the state count out 
of control, to the point where my octane runs out of memory trying to process it 
all.


Here's what I converted one of the fdiv's into:

(define_insn_reservation "r10k_fdiv_single" 12
   (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
        (and (eq_attr "type" "fdiv,frdiv")
             (eq_attr "mode" "SF")))
   "r10k_fpmpy_issue, (r10k_fpdiv * 14), r10k_fpmpy_completion")

I figure that syntax reads as "issue, fpdiv is 14 cycles, completion".  But that 
repeat rate number at 14 makes the insn-automata.c build take a long time (an 
hour minimum).  At a repeat rate of 10, the NDA state count for r10k_a_fpmpy was 
in the 12,000 range (and took 4-5mins).  Plus, the mips.dfa output is 822MB.  So 
I think I'm missing something...


> Looks good.

Neat.  Not too hard :)

Here's a quick question, though.  Integer multiply and divides happen on ALU2. 
The manual makes a note that divides keep ALU2 busy for the duration of the 
divide.  I think this means that division isn't pipelined, and the GCC internals 
manual seems to describe something like this, though the example to me isn't 
easy to decipher.  If I'm interpreting it right, does this look correct?:

(define_insn_reservation "r10k_idiv_single" 34
   (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
        (and (eq_attr "type" "idiv")
             (eq_attr "mode" "SI")))
   "r10k_alu2 * 35, r10k_idiv_single")


It gives this as an example (formatted to look like mine):

(define_insn_reservation "div" 8
   (eq_attr "type" "div")
"i1_pipeline, div * 7, div + (port0 | port1)")

As the manual states, it's issued into the pipeline, divided for 7 cycles (I 
think), then issued to the finishing bit.

Come to think of it, This might apply the the problem above too...  Maybe I need 
to rethink my layout of my fp-multiplier cpu units?


> Again, I'm afraid it's really a case of trying and seeing what gives
> the best performance. ;)

Well, I dug around in mips.md, and I think I found the define_insn statement 
that sets up the "imul" type.  It looks like it only emits a "mult" asm 
instruction.  As far as I could tell, no "multu" or "dmultu" commands look like 
their emitted at all in mips.md.  I'm guess this isn't a widely used instructions?

If correct, I think it's safe then to just keep imul as-is, since it would 
appear that the majority of multiplications would be done via "mult".


> OK.  The time taken to compile something is certainly a valid benchmark.
> (It forms part of SPECINT, of course.)

Okay, I'll probably benchmark a lengthy program like glibc.  Even though the 
final output doesn't work yet (different problem there), it'll still compile and 
I can time it (usually, 3.5-5hrs).


> Nothing goes wrong if you fail to handle an instruction.  The compiler
> won't crash or anything.
> 
> At this stage we're dealing with things like "-march=r4130 -mtune=r10000".
> I don't think any particular handling of imadd is better than any other
> in that case.  So my personal perference would be to leave out the
> unnecessary insns (IMADD, SIGNEXT, FRDIV1, etc.).

Makes sense, so I went ahead and removed them.


> Right.

Done.


> Nope, that's mips.c:mips_cpu_table (which you're already handling
> correctly).  "cpu" is just an .md copy of enum processor_type.

Gotcha.  How about the "cpu" attr in the scheduler definition?  Do those need to 
be removed (to match the values in mips.md's "cpu" type), or is it checking on 
what's passed to -march?


> That'd be OK with me.

Done.


Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org


gcc/
     * config/mips/10000.md: Add R10000 scheduler
     * config/mips/mips.c: Add r10000 params & costs
     * config/mips/mips.h: Add R10k constant
     * config/mips/mips.md: Add r10000 params & incl 10000.md



diff -Naurp gcc.orig/gcc/config/mips/10000.md gcc/gcc/config/mips/10000.md
--- gcc.orig/gcc/config/mips/10000.md	1969-12-31 19:00:00.000000000 -0500
+++ gcc/gcc/config/mips/10000.md	2008-08-05 21:26:34.000000000 -0400
@@ -0,0 +1,252 @@
+;; DFA-based pipeline description for the VR1x000.
+;;   Copyright (C) 2005, 2006, 2008 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; R12K/R14K/R16K are derivatives of R10K, thus copy its description
+;; until specific tuning for each is added.
+
+;; R10000 has an int queue, fp queue, address queue.
+;; The int queue feeds ALU1 and ALU2.
+;; The fp queue feeds the fp-adder and fp-multiplier.
+;; The addr queue feeds the Load/Store unit.
+;;
+;; However, we define the fp-adder and fp-multiplier as
+;; separate automatons, because the fp-multiplier is
+;; divided into fp-multiplier, fp-division, and
+;; fp-squareroot units, all of which share the same
+;; issue and completion logic, yet can operate in
+;; parallel.
+;;
+;; This is based on the model described in the R10K Manual
+;; and it helps to reduce the size of the automata.
+(define_automaton "r10k_a_int, r10k_a_fpadder, r10k_a_addr,
+                   r10k_a_fpmpy, r10k_a_fpdiv, r10k_a_fpsqrt")
+
+(define_cpu_unit "r10k_alu1" "r10k_a_int")
+(define_cpu_unit "r10k_alu2" "r10k_a_int")
+(define_cpu_unit "r10k_fpadd" "r10k_a_fpadder")
+(define_cpu_unit "r10k_fpmpy" "r10k_a_fpmpy")
+(define_cpu_unit "r10k_fpmpy_issue" "r10k_a_fpmpy")
+(define_cpu_unit "r10k_fpmpy_completion" "r10k_a_fpmpy")
+(define_cpu_unit "r10k_fpdiv" "r10k_a_fpdiv")
+(define_cpu_unit "r10k_fpsqrt" "r10k_a_fpsqrt")
+(define_cpu_unit "r10k_loadstore" "r10k_a_addr")
+
+
+;; R10k Loads and Stores.
+(define_insn_reservation "r10k_load" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "load,prefetch,prefetchx"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_store" 0
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "store,fpstore,fpidxstore"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_fpload" 3
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fpload,fpidxload"))
+  "r10k_loadstore")
+
+
+;; Integer add/sub + logic ops, and mt hi/lo can be done by alu1 or alu2.
+;; Miscellaneous arith goes here too (this is a guess).
+(define_insn_reservation "r10k_arith" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "arith,mthilo,slt,clz,const,nop,trap,logical"))
+  "r10k_alu1 | r10k_alu2")
+
+;; We treat mfhilo differently, because we need to know when
+;; it's HI and when it's LO.
+(define_insn_reservation "r10k_mfhi" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "mfhilo")
+            (not (match_operand 1 "lo_operand"))))
+  "r10k_alu1 | r10k_alu2")
+
+(define_insn_reservation "r10k_mflo" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "mfhilo")
+            (match_operand 1 "lo_operand")))
+  "r10k_alu1 | r10k_alu2")
+
+
+;; ALU1 handles shifts, branch eval, and condmove.
+;;
+;; Brancher is separate, but part of ALU1, but can only
+;; do one branch per cycle (is this even implementable?).
+;;
+;; Unsure if the brancher handles jumps and calls as well, but since
+;; they're related, we'll add them here for now.
+(define_insn_reservation "r10k_brancher" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "shift,branch,jump,call"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_int_cmove" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SI,DI")))
+  "r10k_alu1")
+
+
+;; Coprocessor Moves.
+;; mtc1/dmtc1 are handled by ALU1.
+;; mfc1/dmfc1 are handled by the fp-multiplier.
+(define_insn_reservation "r10k_mt_xfer" 3
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "mtc"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_mf_xfer" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "mfc"))
+  "r10k_fpmpy_issue, r10k_fpmpy, r10k_fpmpy_completion")
+
+
+;; Only ALU2 does int multiplications and divisions.
+;;
+;; According to the Vr10000 series user manual,
+;; integer mult and div insns can be issued one
+;; cycle earlier if using register Lo, but this is
+;; not modeled here.  We use the latency for the
+;; Lo register, however, as this is the common case.
+;;
+;; Divides also keep ALU2 busy, but this isn't modeled
+;; here.
+(define_insn_reservation "r10k_imul_single" 5
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "imul,imul3")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 6")
+
+(define_insn_reservation "r10k_imul_double" 9
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "imul,imul3")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 10")
+
+(define_insn_reservation "r10k_idiv_single" 34
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 35")
+
+(define_insn_reservation "r10k_idiv_double" 66
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 67")
+
+;; If on the HI register, latency goes up one cycle
+(define_bypass 6 "r10k_imul_single" "r10k_mfhi")
+(define_bypass 10 "r10k_imul_double" "r10k_mfhi")
+(define_bypass 35 "r10k_idiv_single" "r10k_mfhi")
+(define_bypass 67 "r10k_idiv_double" "r10k_mfhi")
+
+
+;; Floating point add/sub, mul, abs value, neg, comp, & moves.
+(define_insn_reservation "r10k_fp_miscadd" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fadd,fabs,fneg,fcmp"))
+  "r10k_fpadd")
+
+(define_insn_reservation "r10k_fp_miscmul" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fmul,fmove"))
+  "r10k_fpmpy_issue, r10k_fpmpy, r10k_fpmpy_completion")
+
+(define_insn_reservation "r10k_fp_cmove" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SF,DF")))
+  "r10k_fpmpy_issue, r10k_fpmpy, r10k_fpmpy_completion")
+
+
+;; The fcvt.s.[wl] insn has latency 4, repeat 2.
+;; All other fcvt insns have latency 2, repeat 1.
+(define_insn_reservation "r10k_fcvt_single" 4
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "I2S")))
+  "r10k_fpadd * 2")
+
+(define_insn_reservation "r10k_fcvt_other" 2
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "!I2S")))
+  "r10k_fpadd")
+
+
+;; Run the fmadd insn through fp-adder first, then fp-multiplier.
+;;
+;; The latency for fmadd is 2 cycles if the result is used
+;; by another fmadd instruction.
+(define_insn_reservation "r10k_fmadd" 4
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "fmadd"))
+  "r10k_fpadd, r10k_fpmpy_issue, r10k_fpmpy, r10k_fpmpy_completion")
+
+(define_bypass 2 "r10k_fmadd" "r10k_fmadd")
+
+
+;; Floating point Divisions & square roots.
+(define_insn_reservation "r10k_fdiv_single" 12
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "SF")))
+  "r10k_fpmpy_issue, (r10k_fpdiv * 14), r10k_fpmpy_completion")
+
+(define_insn_reservation "r10k_fdiv_double" 19
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "DF")))
+  "r10k_fpmpy_issue, (r10k_fpdiv * 21), r10k_fpmpy_completion")
+
+(define_insn_reservation "r10k_fsqrt_single" 18
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpmpy_issue, (r10k_fpsqrt * 20), r10k_fpmpy_completion")
+
+(define_insn_reservation "r10k_fsqrt_double" 33
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpmpy_issue, (r10k_fpsqrt * 35), r10k_fpmpy_completion")
+
+(define_insn_reservation "r10k_frsqrt_single" 30
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpmpy_issue, (r10k_fpsqrt * 20), r10k_fpmpy_completion")
+
+(define_insn_reservation "r10k_frsqrt_double" 52
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpmpy_issue, (r10k_fpsqrt * 35), r10k_fpmpy_completion")
+
+
+;; Handle unknown/multi insns here (this is a guess).
+(define_insn_reservation "r10k_unknown" 1
+  (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
+       (eq_attr "type" "unknown,multi"))
+  "r10k_alu1 + r10k_alu2")
diff -Naurp gcc.orig/gcc/config/mips/mips.c gcc/gcc/config/mips/mips.c
--- gcc.orig/gcc/config/mips/mips.c	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.c	2008-08-04 01:26:49.000000000 -0400
@@ -593,6 +593,10 @@ static const struct mips_cpu_info mips_c

    /* MIPS IV processors. */
    { "r8000", PROCESSOR_R8000, 4, 0 },
+  { "r10000", PROCESSOR_R10000, 4, 0 },
+  { "r12000", PROCESSOR_R10000, 4, 0 },
+  { "r14000", PROCESSOR_R10000, 4, 0 },
+  { "r16000", PROCESSOR_R10000, 4, 0 },
    { "vr5000", PROCESSOR_R5000, 4, 0 },
    { "vr5400", PROCESSOR_R5400, 4, 0 },
    { "vr5500", PROCESSOR_R5500, 4, PTF_AVOID_BRANCHLIKELY },
@@ -988,6 +992,19 @@ static const struct mips_rtx_cost_data m
  		     1,           /* branch_cost */
  		     4            /* memory_latency */
    },
+  { /* R1x000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (5),            /* int_mult_si */
+    COSTS_N_INSNS (9),           /* int_mult_di */
+    COSTS_N_INSNS (34),           /* int_div_si */
+    COSTS_N_INSNS (66),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
    { /* SB1 */
      /* These costs are the same as the SB-1A below.  */
      COSTS_N_INSNS (4),            /* fp_add */
@@ -9872,7 +9889,10 @@ mips_issue_rate (void)
  	 but in reality only a maximum of 3 insns can be issued as
  	 floating-point loads and stores also require a slot in the
  	 AGEN pipe.  */
-     return 4;
+    case PROCESSOR_R10000:
+      /* All R10K Processors are quad-issue (being the first MIPS
+         processors to support this feature). */
+      return 4;

      case PROCESSOR_20KC:
      case PROCESSOR_R4130:
diff -Naurp gcc.orig/gcc/config/mips/mips.h gcc/gcc/config/mips/mips.h
--- gcc.orig/gcc/config/mips/mips.h	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.h	2008-08-04 00:05:27.000000000 -0400
@@ -66,6 +66,7 @@ enum processor_type {
    PROCESSOR_R7000,
    PROCESSOR_R8000,
    PROCESSOR_R9000,
+  PROCESSOR_R10000,
    PROCESSOR_SB1,
    PROCESSOR_SB1A,
    PROCESSOR_SR71000,
@@ -241,6 +242,7 @@ enum mips_code_readable_setting {
  #define TARGET_MIPS5500             (mips_arch == PROCESSOR_R5500)
  #define TARGET_MIPS7000             (mips_arch == PROCESSOR_R7000)
  #define TARGET_MIPS9000             (mips_arch == PROCESSOR_R9000)
+#define TARGET_MIPS10000            (mips_arch == PROCESSOR_R10000)
  #define TARGET_SB1                  (mips_arch == PROCESSOR_SB1		\
  				     || mips_arch == PROCESSOR_SB1A)
  #define TARGET_SR71K                (mips_arch == PROCESSOR_SR71000)
@@ -267,6 +269,7 @@ enum mips_code_readable_setting {
  #define TUNE_MIPS6000               (mips_tune == PROCESSOR_R6000)
  #define TUNE_MIPS7000               (mips_tune == PROCESSOR_R7000)
  #define TUNE_MIPS9000               (mips_tune == PROCESSOR_R9000)
+#define TUNE_MIPS10000              (mips_tune == PROCESSOR_R10000)
  #define TUNE_SB1                    (mips_tune == PROCESSOR_SB1		\
  				     || mips_tune == PROCESSOR_SB1A)

diff -Naurp gcc.orig/gcc/config/mips/mips.md gcc/gcc/config/mips/mips.md
--- gcc.orig/gcc/config/mips/mips.md	2008-08-01 21:55:41.000000000 -0400
+++ gcc/gcc/config/mips/mips.md	2008-08-05 18:31:52.000000000 -0400
@@ -553,7 +553,7 @@
  ;; Attribute describing the processor.  This attribute must match exactly
  ;; with the processor_type enumeration in mips.h.
  (define_attr "cpu"
- 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,xlr"
+ 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,r10000,sb1,sb1a,sr71000,xlr"
    (const (symbol_ref "mips_tune")))

  ;; The type of hardware hazard associated with this instruction.
@@ -903,6 +903,7 @@
  (include "6000.md")
  (include "7000.md")
  (include "9000.md")
+(include "10000.md")
  (include "sb1.md")
  (include "sr71k.md")
  (include "xlr.md")
diff -Naurp gcc.orig/gcc/doc/invoke.texi gcc/gcc/doc/invoke.texi
--- gcc.orig/gcc/doc/invoke.texi	2008-08-01 21:51:46.000000000 -0400
+++ gcc/gcc/doc/invoke.texi	2008-08-04 00:09:12.000000000 -0400
@@ -11980,6 +11980,7 @@ The processor names are:
  @samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400},
  @samp{r4600}, @samp{r4650}, @samp{r6000}, @samp{r8000},
  @samp{rm7000}, @samp{rm9000},
+@samp{r10000}, @samp{r12000}, @samp{r14000}, @samp{r16000},
  @samp{sb1},
  @samp{sr71000},
  @samp{vr4100}, @samp{vr4111}, @samp{vr4120}, @samp{vr4130}, @samp{vr4300},

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: contribute.html: compare pre/post patch testresults (was: [PATCH]:  GCC Scheduler support for R10000 on MIPS)
  2008-08-04 19:30           ` contribute.html: compare pre/post patch testresults (was: [PATCH]: GCC Scheduler support for R10000 on MIPS) Ralf Wildenhues
@ 2008-08-06 14:51             ` Ian Lance Taylor
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Lance Taylor @ 2008-08-06 14:51 UTC (permalink / raw)
  To: Ralf Wildenhues; +Cc: gcc-patches

Ralf Wildenhues <Ralf.Wildenhues@gmx.de> writes:

> 2008-08-04  Ralf Wildenhues  <Ralf.Wildenhues@gmx.de>
>
> 	* htdocs/contribute.html (testing): Test results should be
> 	compared to pre-patch or gcc-testresults list data.

This is OK.

Thanks.

Ian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-06  7:58               ` Kumba
@ 2008-08-07 21:24                 ` Richard Sandiford
  2008-08-08  8:46                   ` Kumba
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Sandiford @ 2008-08-07 21:24 UTC (permalink / raw)
  To: Kumba; +Cc: gcc-patches, mips

Kumba <kumba@gentoo.org> writes:
> Richard Sandiford wrote:
>> I think you want "foo, bar" (foo one cycle, then bar the next)
>> rather than "foo + bar" (foo and bar simultaneously).  And you don't
>> want to tie up the issue and completion units for more than one cycle.
>
> That would make sense.  However, converting the 'foo + bar' to 'foo,
> bar' only seems to work as long as there aren't any repeat rates.
> Down in the fdiv bits, as it start to calculate the repeat rates, it
> starts to send the state count out of control, to the point where my
> octane runs out of memory trying to process it all.
>
>
> Here's what I converted one of the fdiv's into:
>
> (define_insn_reservation "r10k_fdiv_single" 12
>    (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
>         (and (eq_attr "type" "fdiv,frdiv")
>              (eq_attr "mode" "SF")))
>    "r10k_fpmpy_issue, (r10k_fpdiv * 14), r10k_fpmpy_completion")
>
> I figure that syntax reads as "issue, fpdiv is 14 cycles, completion".
> But that repeat rate number at 14 makes the insn-automata.c build take
> a long time (an hour minimum).  At a repeat rate of 10, the NDA state
> count for r10k_a_fpmpy was in the 12,000 range (and took 4-5mins).
> Plus, the mips.dfa output is 822MB.

Yeah, that's not too surprising.  This model says that the pipeline
looks 15 cycles in advance to see whether a division issued now will
complete in 16 cycles' time, which needs a hefty number of DFA states
to track properly.  That's probably not how the pipeline works.

In other words, it's probably the completion stuff that's causing
problems.  Things might be better if you just model the issue and
execution stages.

Also, if you model the issue stage, you should model it for all insns,
not just the ones that use r10k_fpmpy_issue.

As always, the only way to know if you're making things better here
is to test it.  It may well be that things are better without this
issue stuff.

> Here's a quick question, though.  Integer multiply and divides happen
> on ALU2.  The manual makes a note that divides keep ALU2 busy for the
> duration of the divide.  I think this means that division isn't
> pipelined, and the GCC internals manual seems to describe something
> like this, though the example to me isn't easy to decipher.  If I'm
> interpreting it right, does this look correct?:
>
> (define_insn_reservation "r10k_idiv_single" 34
>    (and (eq_attr "cpu" "r10000,r12000,r14000,r16000")
>         (and (eq_attr "type" "idiv")
>              (eq_attr "mode" "SI")))
>    "r10k_alu2 * 35, r10k_idiv_single")

Well, this reserves ALU2 for 35 cycles and (immediately after that)
reserves r10k_idiv_single for one cycle.  Is that what you wanted?

>> Again, I'm afraid it's really a case of trying and seeing what gives
>> the best performance. ;)
>
> Well, I dug around in mips.md, and I think I found the define_insn
> statement that sets up the "imul" type.  It looks like it only emits a
> "mult" asm instruction.  As far as I could tell, no "multu" or
> "dmultu" commands look like their emitted at all in mips.md.  I'm
> guess this isn't a widely used instructions?

That's not correct.  "imul" is used for MULT, MULTU, DMULT and DMULTU.
(The "<u>" in those patterns means "" for signed and "u" for unsigned.)

>> OK.  The time taken to compile something is certainly a valid benchmark.
>> (It forms part of SPECINT, of course.)
>
> Okay, I'll probably benchmark a lengthy program like glibc.  Even
> though the final output doesn't work yet (different problem there),
> it'll still compile and I can time it (usually, 3.5-5hrs).

FWOW, an alternative is to pick a single big file (e.g. gcc's fold-const)
and preprocess it.  You can then run cc1 on it directly, which means that
the benchmark is a single process.

>> Nothing goes wrong if you fail to handle an instruction.  The compiler
>> won't crash or anything.
>> 
>> At this stage we're dealing with things like "-march=r4130 -mtune=r10000".
>> I don't think any particular handling of imadd is better than any other
>> in that case.  So my personal perference would be to leave out the
>> unnecessary insns (IMADD, SIGNEXT, FRDIV1, etc.).
>
> Makes sense, so I went ahead and removed them.

Thanks.

>> Nope, that's mips.c:mips_cpu_table (which you're already handling
>> correctly).  "cpu" is just an .md copy of enum processor_type.
>
> Gotcha.  How about the "cpu" attr in the scheduler definition?  Do
> those need to be removed (to match the values in mips.md's "cpu"
> type), or is it checking on what's passed to -march?

Yes, '(eq_attr "cpu" ...)' tests the attribute defined by
'(define_attr "cpu" ...)', so you need to remove the processor
names from both.

Richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-07 21:24                 ` Richard Sandiford
@ 2008-08-08  8:46                   ` Kumba
  2008-08-09  9:01                     ` Richard Sandiford
  0 siblings, 1 reply; 22+ messages in thread
From: Kumba @ 2008-08-08  8:46 UTC (permalink / raw)
  To: gcc-patches, mips, rdsandiford

Richard Sandiford wrote:
> 
> Yeah, that's not too surprising.  This model says that the pipeline
> looks 15 cycles in advance to see whether a division issued now will
> complete in 16 cycles' time, which needs a hefty number of DFA states
> to track properly.  That's probably not how the pipeline works.

Yeah, I figured the pipeline is using one cycle to issue, then while the 
fp-divider is doing its own thing, the fp-multiplier issue unit is moving on to 
issue instructions to the next component, be that the sqrt or multiplier 
directly.  Assuming one cycle to issue, then in the amount of time it takes for 
the fp-divider to do its work, the issue unit can pop out a good number of other 
instructions.  I guess DFA may not be able to handle scenarios like this without 
  generating a massive number of states?

> In other words, it's probably the completion stuff that's causing
> problems.  Things might be better if you just model the issue and
> execution stages.

Yeah, it looks like it is.  I dropped those, and it build in about 5 seconds flat.

> Also, if you model the issue stage, you should model it for all insns,
> not just the ones that use r10k_fpmpy_issue.

Hmm, model issue for all of them?  I don't recall the manual stating that the 
integer systems need that explicitly.  In the case of insns that only have a 
latency of one cycle, wouldn't factoring in an issue sequence add another cycle 
on and essentially slow things down?

Only the fp-divider and fp-square root unit have the specific mention of sharing 
their issue/completion logic, so, being a details person, I was giving it a shot 
at modeling.  But if it proves to be too problematic, then I'll probably revert 
back to just putting the divider and square root units back into their own 
automata and leaving them at that.

And, just for clarification, if the manual says something has a repeat rate of 
say, 16, and we do model the issue and/or completion phases, we need to subtract 
those cycles off the repeat rate, right?

> Well, this reserves ALU2 for 35 cycles and (immediately after that)
> reserves r10k_idiv_single for one cycle.  Is that what you wanted?

Not sure.  That specific example in the Processor Pipeline description seemed to 
detail an integer divider that remains busy for the duration of the divide, but 
I wasn't sure if I was converting it to my application properly.  Does DFA 
handle when a unit is already busy?  I.e., if r10k_alu2 is already working on a 
previous divide insn, if something else comes along (say another divide), will 
gcc take this into account and know not to issue that insn until the divide is 
complete?  It seems on this processor, only integer divides aren't pipelined, 
and it looked like re-using the running insn reservation achieved that affect.

The internals guide doesn't offer a lot of clear cut examples on things like 
this, so I'm sort of guessing at it.  It also doesn't help that the example 
provided uses "div" as both the name for the insn reservation and for the cpu 
unit, which makes the example more obtuse to the newcomer.

> That's not correct.  "imul" is used for MULT, MULTU, DMULT and DMULTU.
> (The "<u>" in those patterns means "" for signed and "u" for unsigned.)

I eventually figured out what <u> was doing (still not sure on <mode> 100%, or 
<su>), but I was looking more (or should say, searching more) at the actual asm 
code generated.  I was only seeing mult and mul asm commands being created - 
but, I know very little about mips asm, so I suppose even though multu or dmultu 
may not be explicitly spelled out, the operands to the asm insns can probably 
take args in such a way as to become unsigned variants.

Poking around mips.md, I'm clueless on where one would start.  It looks like 
there's a ton of different insns that fall into the 'imul' attribute type, so an 
initial looks makes it look like any such split would be pretty significant.

I guess for now, it's either to just use the Lo latency for MULT/DMULT, and hope 
the 1-2 cycle deviance from MULTU and DMULTU doesn't degrade performance too 
much, or use the Hi latency as the middle ground (MULT Lo lat is 5, Hi is 6; 
MULTU Lo is 6, Hi is 7, so 6 is the compromise pick), and maybe down the road, 
look at this as a future project.

> FWOW, an alternative is to pick a single big file (e.g. gcc's fold-const)
> and preprocess it.  You can then run cc1 on it directly, which means that
> the benchmark is a single process.

I still have to fully build gcc, right?  Or is there a way to fold in the 
changes to 10000.md, rebuild the pre-processor and cc1 directly (I assume it 
takes -march parameters) w/o rebuilding the whole compiler?  Think I can get 
away with just the bootstrap compiler?

> Yes, '(eq_attr "cpu" ...)' tests the attribute defined by
> '(define_attr "cpu" ...)', so you need to remove the processor
> names from both.

Done.

I think what I'll do is at minimum, see if the ALU2 blocking is able to be 
modeled for integer divides, then tweak my comments appropriately to state what 
we can't or won't model, then start to work on testing this in the testsuite and 
issue you a final patch.  I think pulling the issue bits may be preferable in 
the end, but I'll try testing them on that file you mentioned and see what 
happens.  I figure that's a pretty math-heavy compile, right?  I know I had this 
one C++ app around someplace that really whacked the system with math-intensive 
tests.  I may have to go digging around on my other systems and find it.

Cheers!,

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org

"The past tempts us, the present confuses us, the future frightens us.  And our 
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-08  8:46                   ` Kumba
@ 2008-08-09  9:01                     ` Richard Sandiford
  2008-08-13  8:53                       ` Kumba
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Sandiford @ 2008-08-09  9:01 UTC (permalink / raw)
  To: Kumba; +Cc: gcc-patches, mips

Kumba <kumba@gentoo.org> writes:
> Richard Sandiford wrote:
>> Also, if you model the issue stage, you should model it for all insns,
>> not just the ones that use r10k_fpmpy_issue.
>
> Hmm, model issue for all of them?  I don't recall the manual stating
> that the integer systems need that explicitly.  In the case of insns
> that only have a latency of one cycle, wouldn't factoring in an issue
> sequence add another cycle on and essentially slow things down?

The point is that the DFA description should (in general) be consistent.
It is inaccurate to say that, if an fp division and fp addition are issued
at the same time, the addition starts execution earlier than the division.

>> Well, this reserves ALU2 for 35 cycles and (immediately after that)
>> reserves r10k_idiv_single for one cycle.  Is that what you wanted?
>
> Not sure.  That specific example in the Processor Pipeline description
> seemed to detail an integer divider that remains busy for the duration
> of the divide, but I wasn't sure if I was converting it to my
> application properly.  Does DFA handle when a unit is already busy?
> I.e., if r10k_alu2 is already working on a previous divide insn, if
> something else comes along (say another divide), will gcc take this
> into account and know not to issue that insn until the divide is
> complete?  It seems on this processor, only integer divides aren't
> pipelined, and it looked like re-using the running insn reservation
> achieved that affect.

The scheduler "issues" an instruction only if the instruction wouldn't
stall.  In other words, it issues an instruction if all data is ready
and if there would be no unit conflicts with already-issued insns.
(Every is treated "as-if" data was read at the beginning.)
So something that used unit X on cycle Y would only be issued if
unit X will be free on cycle Y.

I'm still not sure what purpose r10k_idiv_single serves in your example.
Why isn't reserving r10k_alu2 for X cycles enough?

> I guess for now, it's either to just use the Lo latency for
> MULT/DMULT, and hope the 1-2 cycle deviance from MULTU and DMULTU
> doesn't degrade performance too much, or use the Hi latency as the
> middle ground (MULT Lo lat is 5, Hi is 6; MULTU Lo is 6, Hi is 7, so 6
> is the compromise pick), and maybe down the road, look at this as a
> future project.

Sounds good.

>> FWOW, an alternative is to pick a single big file (e.g. gcc's fold-const)
>> and preprocess it.  You can then run cc1 on it directly, which means that
>> the benchmark is a single process.
>
> I still have to fully build gcc, right?  Or is there a way to fold in the 
> changes to 10000.md, rebuild the pre-processor and cc1 directly (I assume it 
> takes -march parameters) w/o rebuilding the whole compiler?  Think I can get 
> away with just the bootstrap compiler?

Yeah, you'd have to rebuild gcc (or at least cc1).  An --enable-languages=c
build would be enough, but I guess that takes a long time on your machine.
So yeah, I can see why a workload that doesn't involve timing gcc might
be more appealing...

> I think what I'll do is at minimum, see if the ALU2 blocking is able
> to be modeled for integer divides, then tweak my comments
> appropriately to state what we can't or won't model, then start to
> work on testing this in the testsuite and issue you a final patch.  I
> think pulling the issue bits may be preferable in the end, but I'll
> try testing them on that file you mentioned and see what happens.

Sounds good.  (Of course, from my point of view, the issueless
version is fine.  I don't want it to sound like I'm asking you
to do this.  I just got the impression you wanted to experiment
with the description, and I was trying to answer your questions
about that.)

Richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-09  9:01                     ` Richard Sandiford
@ 2008-08-13  8:53                       ` Kumba
  2008-08-16 10:10                         ` Richard Sandiford
  0 siblings, 1 reply; 22+ messages in thread
From: Kumba @ 2008-08-13  8:53 UTC (permalink / raw)
  To: gcc-patches, mips, rdsandiford

[-- Attachment #1: Type: text/plain, Size: 16600 bytes --]

Richard Sandiford wrote:
> 
> The scheduler "issues" an instruction only if the instruction wouldn't
> stall.  In other words, it issues an instruction if all data is ready
> and if there would be no unit conflicts with already-issued insns.
> (Every is treated "as-if" data was read at the beginning.)
> So something that used unit X on cycle Y would only be issued if
> unit X will be free on cycle Y.
> 
> I'm still not sure what purpose r10k_idiv_single serves in your example.
> Why isn't reserving r10k_alu2 for X cycles enough?

I wasn't sure if that achieved the effect of keeping ALU2 busy during the entire 
divide.  So, I aimed to mimic what looked like a example of a non-pipelined 
division operation from that example from the gcc internals stuff, but I 
probably wasn't doing it right (getting confused because the insn_reservation & 
the cpu unit had the same name).

Something to toy with in the future I guess...for now, I reverted that change.

> Sounds good.

Yeah, I'll keep the define_bypasses in place for the division, and balance the 
mult/dmult stuff out by using the Hi latencies, since I have no idea what the 
proportion of insns are MULT/DMULT versus MULTU/DMULTU.  We'll have to work on 
that splitting of the imul type eventually, and then I'll fix that up down the road.

> Yeah, you'd have to rebuild gcc (or at least cc1).  An --enable-languages=c
> build would be enough, but I guess that takes a long time on your machine.
> So yeah, I can see why a workload that doesn't involve timing gcc might
> be more appealing...

Took me awhile, but I pinned down an xgcc command line that would build 
fold-const.c, and the different between using R10K and not using R10K was about 
1.9secs, so there's a small improvement.

> Sounds good.  (Of course, from my point of view, the issueless
> version is fine.  I don't want it to sound like I'm asking you
> to do this.  I just got the impression you wanted to experiment
> with the description, and I was trying to answer your questions
> about that.)

Yeah, I decided to go with the issue-less version, since that's largely what 
I've been using for the last 4 years anyways.  Ypu did give me a lot of insight 
on how this all works though!

Anyways, with the end of august rapidly approaching, I decided to go ahead and 
diff a final patch and complete the tests on four languages: c, c++, java, and 
fortran.  The tests output looks roughly the same as the last one I sent, but 
with the extra langs enabled, might've changed the output some.

Let me know if any of the patch wording needs tweaking, otherwise, hopefully 
this passes muster.

Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org

gcc/
     * config/mips/10000.md: Add R10000 scheduler
     * config/mips/mips.c: Add r10000 params & costs
     * config/mips/mips.h: Add R10k constant
     * config/mips/mips.md: Add r10000 params & incl 10000.md
     * doc/invoke.texi: List r1x000 family

diff -Naurp gcc.orig/gcc/config/mips/10000.md gcc/gcc/config/mips/10000.md
--- gcc.orig/gcc/config/mips/10000.md	1969-12-31 19:00:00.000000000 -0500
+++ gcc/gcc/config/mips/10000.md	2008-08-11 02:54:42.000000000 -0400
@@ -0,0 +1,256 @@
+;; DFA-based pipeline description for the VR1x000.
+;;   Copyright (C) 2005, 2006, 2008 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; R12K/R14K/R16K are derivatives of R10K, thus copy its description
+;; until specific tuning for each is added.
+
+;; R10000 has an int queue, fp queue, address queue.
+;; The int queue feeds ALU1 and ALU2.
+;; The fp queue feeds the fp-adder and fp-multiplier.
+;; The addr queue feeds the Load/Store unit.
+;;
+;; However, we define the fp-adder and fp-multiplier as
+;; separate automatons, because the fp-multiplier is
+;; divided into fp-multiplier, fp-division, and
+;; fp-squareroot units, all of which share the same
+;; issue and completion logic, yet can operate in
+;; parallel.
+;;
+;; This is based on the model described in the R10K Manual
+;; and it helps to reduce the size of the automata.
+(define_automaton "r10k_a_int, r10k_a_fpadder, r10k_a_addr,
+                   r10k_a_fpmpy, r10k_a_fpdiv, r10k_a_fpsqrt")
+
+(define_cpu_unit "r10k_alu1" "r10k_a_int")
+(define_cpu_unit "r10k_alu2" "r10k_a_int")
+(define_cpu_unit "r10k_fpadd" "r10k_a_fpadder")
+(define_cpu_unit "r10k_fpmpy" "r10k_a_fpmpy")
+(define_cpu_unit "r10k_fpdiv" "r10k_a_fpdiv")
+(define_cpu_unit "r10k_fpsqrt" "r10k_a_fpsqrt")
+(define_cpu_unit "r10k_loadstore" "r10k_a_addr")
+
+
+;; R10k Loads and Stores.
+(define_insn_reservation "r10k_load" 2
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "load,prefetch,prefetchx"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_store" 0
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "store,fpstore,fpidxstore"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_fpload" 3
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "fpload,fpidxload"))
+  "r10k_loadstore")
+
+
+;; Integer add/sub + logic ops, and mt hi/lo can be done by alu1 or alu2.
+;; Miscellaneous arith goes here too (this is a guess).
+(define_insn_reservation "r10k_arith" 1
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "arith,mthilo,slt,clz,const,nop,trap,logical"))
+  "r10k_alu1 | r10k_alu2")
+
+;; We treat mfhilo differently, because we need to know when
+;; it's HI and when it's LO.
+(define_insn_reservation "r10k_mfhi" 1
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "mfhilo")
+            (not (match_operand 1 "lo_operand"))))
+  "r10k_alu1 | r10k_alu2")
+
+(define_insn_reservation "r10k_mflo" 1
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "mfhilo")
+            (match_operand 1 "lo_operand")))
+  "r10k_alu1 | r10k_alu2")
+
+
+;; ALU1 handles shifts, branch eval, and condmove.
+;;
+;; Brancher is separate, but part of ALU1, but can only
+;; do one branch per cycle (is this even implementable?).
+;;
+;; Unsure if the brancher handles jumps and calls as well, but since
+;; they're related, we'll add them here for now.
+(define_insn_reservation "r10k_brancher" 1
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "shift,branch,jump,call"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_int_cmove" 1
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SI,DI")))
+  "r10k_alu1")
+
+
+;; Coprocessor Moves.
+;; mtc1/dmtc1 are handled by ALU1.
+;; mfc1/dmfc1 are handled by the fp-multiplier.
+(define_insn_reservation "r10k_mt_xfer" 3
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "mtc"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_mf_xfer" 2
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "mfc"))
+  "r10k_fpmpy")
+
+
+;; Only ALU2 does int multiplications and divisions.
+;;
+;; According to the Vr10000 series user manual,
+;; integer mult and div insns can be issued one
+;; cycle earlier if using register Lo.  We model
+;; this by using the Lo value by default, as it
+;; is the more common value, and use a bypass
+;; for the Hi value when needed.
+;;
+;; Also of note, There are different latencies
+;; for MULT/DMULT (Lo 5/Hi 6) and MULTU/DMULTU (Lo 6/Hi 7).
+;; However, gcc does not have separate types
+;; for these insns.  Thus to strike a balance,
+;; we use the Hi latency value for imul
+;; operations to strike a balance until the
+;; imul type can be split.
+;;
+;; Divides also keep ALU2 busy, but this isn't modeled
+;; here.
+(define_insn_reservation "r10k_imul_single" 6
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "imul,imul3")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 6")
+
+(define_insn_reservation "r10k_imul_double" 10
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "imul,imul3")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 10")
+
+(define_insn_reservation "r10k_idiv_single" 34
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 35")
+
+(define_insn_reservation "r10k_idiv_double" 66
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 67")
+
+(define_bypass 35 "r10k_idiv_single" "r10k_mfhi")
+(define_bypass 67 "r10k_idiv_double" "r10k_mfhi")
+
+
+;; Floating point add/sub, mul, abs value, neg, comp, & moves.
+(define_insn_reservation "r10k_fp_miscadd" 2
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "fadd,fabs,fneg,fcmp"))
+  "r10k_fpadd")
+
+(define_insn_reservation "r10k_fp_miscmul" 2
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "fmul,fmove"))
+  "r10k_fpmpy")
+
+(define_insn_reservation "r10k_fp_cmove" 2
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SF,DF")))
+  "r10k_fpmpy")
+
+
+;; The fcvt.s.[wl] insn has latency 4, repeat 2.
+;; All other fcvt insns have latency 2, repeat 1.
+(define_insn_reservation "r10k_fcvt_single" 4
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "I2S")))
+  "r10k_fpadd * 2")
+
+(define_insn_reservation "r10k_fcvt_other" 2
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "!I2S")))
+  "r10k_fpadd")
+
+
+;; Run the fmadd insn through fp-adder first, then fp-multiplier.
+;;
+;; The latency for fmadd is 2 cycles if the result is used
+;; by another fmadd instruction.
+(define_insn_reservation "r10k_fmadd" 4
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "fmadd"))
+  "r10k_fpadd, r10k_fpmpy")
+
+(define_bypass 2 "r10k_fmadd" "r10k_fmadd")
+
+
+;; Floating point Divisions & square roots.
+(define_insn_reservation "r10k_fdiv_single" 12
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "SF")))
+  "r10k_fpdiv * 14")
+
+(define_insn_reservation "r10k_fdiv_double" 19
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "DF")))
+  "r10k_fpdiv * 21")
+
+(define_insn_reservation "r10k_fsqrt_single" 18
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_fsqrt_double" 33
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+(define_insn_reservation "r10k_frsqrt_single" 30
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_frsqrt_double" 52
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+
+;; Handle unknown/multi insns here (this is a guess).
+(define_insn_reservation "r10k_unknown" 1
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "unknown,multi"))
+  "r10k_alu1 + r10k_alu2")
diff -Naurp gcc.orig/gcc/config/mips/mips.c gcc/gcc/config/mips/mips.c
--- gcc.orig/gcc/config/mips/mips.c	2008-08-09 15:43:46.000000000 -0400
+++ gcc/gcc/config/mips/mips.c	2008-08-09 18:19:29.000000000 -0400
@@ -597,6 +597,10 @@ static const struct mips_cpu_info mips_c

    /* MIPS IV processors. */
    { "r8000", PROCESSOR_R8000, 4, 0 },
+  { "r10000", PROCESSOR_R10000, 4, 0 },
+  { "r12000", PROCESSOR_R10000, 4, 0 },
+  { "r14000", PROCESSOR_R10000, 4, 0 },
+  { "r16000", PROCESSOR_R10000, 4, 0 },
    { "vr5000", PROCESSOR_R5000, 4, 0 },
    { "vr5400", PROCESSOR_R5400, 4, 0 },
    { "vr5500", PROCESSOR_R5500, 4, PTF_AVOID_BRANCHLIKELY },
@@ -992,6 +996,19 @@ static const struct mips_rtx_cost_data m
  		     1,           /* branch_cost */
  		     4            /* memory_latency */
    },
+  { /* R1x000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (5),            /* int_mult_si */
+    COSTS_N_INSNS (9),           /* int_mult_di */
+    COSTS_N_INSNS (34),           /* int_div_si */
+    COSTS_N_INSNS (66),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
    { /* SB1 */
      /* These costs are the same as the SB-1A below.  */
      COSTS_N_INSNS (4),            /* fp_add */
@@ -10268,7 +10285,10 @@ mips_issue_rate (void)
  	 but in reality only a maximum of 3 insns can be issued as
  	 floating-point loads and stores also require a slot in the
  	 AGEN pipe.  */
-     return 4;
+    case PROCESSOR_R10000:
+      /* All R10K Processors are quad-issue (being the first MIPS
+         processors to support this feature). */
+      return 4;

      case PROCESSOR_20KC:
      case PROCESSOR_R4130:
diff -Naurp gcc.orig/gcc/config/mips/mips.h gcc/gcc/config/mips/mips.h
--- gcc.orig/gcc/config/mips/mips.h	2008-08-09 15:43:46.000000000 -0400
+++ gcc/gcc/config/mips/mips.h	2008-08-09 18:19:29.000000000 -0400
@@ -66,6 +66,7 @@ enum processor_type {
    PROCESSOR_R7000,
    PROCESSOR_R8000,
    PROCESSOR_R9000,
+  PROCESSOR_R10000,
    PROCESSOR_SB1,
    PROCESSOR_SB1A,
    PROCESSOR_SR71000,
@@ -241,6 +242,7 @@ enum mips_code_readable_setting {
  #define TARGET_MIPS5500             (mips_arch == PROCESSOR_R5500)
  #define TARGET_MIPS7000             (mips_arch == PROCESSOR_R7000)
  #define TARGET_MIPS9000             (mips_arch == PROCESSOR_R9000)
+#define TARGET_MIPS10000            (mips_arch == PROCESSOR_R10000)
  #define TARGET_SB1                  (mips_arch == PROCESSOR_SB1		\
  				     || mips_arch == PROCESSOR_SB1A)
  #define TARGET_SR71K                (mips_arch == PROCESSOR_SR71000)
@@ -267,6 +269,7 @@ enum mips_code_readable_setting {
  #define TUNE_MIPS6000               (mips_tune == PROCESSOR_R6000)
  #define TUNE_MIPS7000               (mips_tune == PROCESSOR_R7000)
  #define TUNE_MIPS9000               (mips_tune == PROCESSOR_R9000)
+#define TUNE_MIPS10000              (mips_tune == PROCESSOR_R10000)
  #define TUNE_SB1                    (mips_tune == PROCESSOR_SB1		\
  				     || mips_tune == PROCESSOR_SB1A)

diff -Naurp gcc.orig/gcc/config/mips/mips.md gcc/gcc/config/mips/mips.md
--- gcc.orig/gcc/config/mips/mips.md	2008-08-09 15:43:46.000000000 -0400
+++ gcc/gcc/config/mips/mips.md	2008-08-09 18:19:29.000000000 -0400
@@ -556,7 +556,7 @@
  ;; Attribute describing the processor.  This attribute must match exactly
  ;; with the processor_type enumeration in mips.h.
  (define_attr "cpu"
- 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,xlr"
+ 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,r10000,sb1,sb1a,sr71000,xlr"
    (const (symbol_ref "mips_tune")))

  ;; The type of hardware hazard associated with this instruction.
@@ -906,6 +906,7 @@
  (include "6000.md")
  (include "7000.md")
  (include "9000.md")
+(include "10000.md")
  (include "sb1.md")
  (include "sr71k.md")
  (include "xlr.md")
diff -Naurp gcc.orig/gcc/doc/invoke.texi gcc/gcc/doc/invoke.texi
--- gcc.orig/gcc/doc/invoke.texi	2008-08-09 15:43:14.000000000 -0400
+++ gcc/gcc/doc/invoke.texi	2008-08-09 18:19:29.000000000 -0400
@@ -11980,6 +11980,7 @@ The processor names are:
  @samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400},
  @samp{r4600}, @samp{r4650}, @samp{r6000}, @samp{r8000},
  @samp{rm7000}, @samp{rm9000},
+@samp{r10000}, @samp{r12000}, @samp{r14000}, @samp{r16000},
  @samp{sb1},
  @samp{sr71000},
  @samp{vr4100}, @samp{vr4111}, @samp{vr4120}, @samp{vr4130}, @samp{vr4300},

[-- Attachment #2: gcc-tests-20080812.txt --]
[-- Type: text/plain, Size: 39819 bytes --]

# make -k check
make[1]: Entering directory `/usr/cvsroot/gcc'
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/fixincludes'
autogen -T ../.././fixincludes/check.tpl ../.././fixincludes/inclhack.def
make[2]: autogen: Command not found
make[2]: *** [check] Error 127
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/fixincludes'
make[1]: *** [check-fixincludes] Error 2
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc'
Making a new config file...
echo "set tmpdir /usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/testsuite" >> ./tmp0
test -d testsuite || mkdir testsuite
test -d testsuite/gcc || mkdir testsuite/gcc
(rootme=`${PWDCMD-pwd}`; export rootme; \
        srcdir=`cd ../.././gcc; ${PWDCMD-pwd}` ; export srcdir ; \
        cd testsuite/gcc; \
        rm -f tmp-site.exp; \
        sed '/set tmpdir/ s|testsuite|testsuite/gcc|' \
                < ../../site.exp > tmp-site.exp; \
        /bin/sh ${srcdir}/../move-if-change tmp-site.exp site.exp; \
        EXPECT=expect ; export EXPECT ; \
        if [ -f ${rootme}/../expect/expect ] ; then  \
           TCL_LIBRARY=`cd .. ; cd ${srcdir}/../tcl/library ; ${PWDCMD-pwd}` ; \
            export TCL_LIBRARY ; fi ; \
        GCC_EXEC_PREFIX="/usr/lib/gcc/" ; export GCC_EXEC_PREFIX ; \
        runtest --tool gcc )
WARNING: Couldn't find the global config file.
Test Run By root on Mon Aug 11 22:46:07 2008
Native configuration is mips-unknown-linux-gnu

                === gcc tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/gcc/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.c-torture/compile/compile.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.c-torture/execute/builtins/builtins.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.c-torture/execute/execute.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.c-torture/execute/ieee/ieee.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.c-torture/unsorted/unsorted.exp ...

Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/autopar/autopar.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/charset/charset.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/compat/compat.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/compat/struct-layout-1.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/cpp/cpp.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/cpp/trad/trad.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/debug/debug.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/debug/dwarf2/dwarf2.exp ...
FAIL: gcc.dg/debug/dwarf2/dwarf-die3.c scan-assembler-not DW_AT_inline
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/dfp/dfp.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/dg.exp ...
WARNING: program timed out.
FAIL: gcc.dg/20020425-1.c (test for excess errors)
FAIL: gcc.dg/pr35729.c scan-rtl-dump-times loop2_invariant "Decided to move invariant" 0
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/fixed-point/fixed-point.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/format/format.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/gomp/gomp.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/ipa/ipa.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/matrix/matrix.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/noncompile/noncompile.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/pch/pch.exp ...
FAIL: gcc.dg/pch/valid-1b.c -O0 -g -I. (test for excess errors)
FAIL: gcc.dg/pch/valid-1b.c -O0 -g assembly comparison
FAIL: gcc.dg/pch/valid-1b.c  -O0  -I. (test for excess errors)
FAIL: gcc.dg/pch/valid-1b.c  -O0  assembly comparison
FAIL: gcc.dg/pch/valid-1b.c  -O1  -I. (test for excess errors)
FAIL: gcc.dg/pch/valid-1b.c  -O1  assembly comparison
FAIL: gcc.dg/pch/valid-1b.c  -O2  -I. (test for excess errors)
FAIL: gcc.dg/pch/valid-1b.c  -O2  assembly comparison
FAIL: gcc.dg/pch/valid-1b.c  -O3 -fomit-frame-pointer  -I. (test for excess errors)
FAIL: gcc.dg/pch/valid-1b.c  -O3 -fomit-frame-pointer  assembly comparison
FAIL: gcc.dg/pch/valid-1b.c  -O3 -g  -I. (test for excess errors)
FAIL: gcc.dg/pch/valid-1b.c  -O3 -g  assembly comparison
FAIL: gcc.dg/pch/valid-1b.c  -Os  -I. (test for excess errors)
FAIL: gcc.dg/pch/valid-1b.c  -Os  assembly comparison
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/special/mips-abi.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/special/special.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/struct/struct-reorg.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/tls/tls.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/torture/dg-torture.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/torture/stackalign/stackalign.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/tree-prof/tree-prof.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/tree-ssa/tree-ssa.exp ...
XPASS: gcc.dg/tree-ssa/data-dep-1.c scan-tree-dump-times ltrans "4, \+, 1" 0
XPASS: gcc.dg/tree-ssa/ltrans-3.c scan-tree-dump-times ltrans "transformed loop" 1
XPASS: gcc.dg/tree-ssa/ssa-fre-13.c scan-tree-dump fre "Inserted .* &a"
XPASS: gcc.dg/tree-ssa/ssa-fre-13.c scan-tree-dump fre "Replaced tmp1_.\(D\)->data"
XPASS: gcc.dg/tree-ssa/ssa-fre-14.c scan-tree-dump fre "Inserted .* &a"
XPASS: gcc.dg/tree-ssa/ssa-fre-14.c scan-tree-dump fre "Replaced tmp1.data"
XPASS: gcc.dg/tree-ssa/ssa-fre-17.c scan-tree-dump fre "Replaced f.doms\[0\].dom with i_"
FAIL: gcc.dg/tree-ssa/ssa-store-ccp-3.c scan-tree-dump-times optimized "conststaticvariable" 1
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vect/costmodel/i386/i386-costmodel-vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vect/costmodel/ppc/ppc-costmodel-vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vect/costmodel/spu/spu-costmodel-vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/x86_64-costmodel-vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vect/vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vmx/vmx.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/vxworks/vxworks.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.dg/weak/weak.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/acker1.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/arm-isr.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/bprob.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/dectest.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/dhry.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/gcov.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/i386-prefetch.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/linkage.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/matrix1.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/mg-2.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/mg.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/options.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/sieve.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.misc-tests/sort2.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/alpha/alpha.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/arm/arm.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/arm/neon/neon.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/avr/avr.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/avr/torture/avr-torture.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/bfin/bfin.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/cris/cris.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/cris/torture/cris-torture.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/frv/frv.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/i386/i386.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/i386/math-torture/math-torture.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/i386/stackalign/stackalign.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/ia64/ia64.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/m68k/m68k.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/mips/inter/mips16-inter.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/mips/mips.exp ...
FAIL: gcc.target/mips/ext-1.c scan-assembler \tdext\t
FAIL: gcc.target/mips/ext-1.c scan-assembler-not and
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/powerpc/powerpc.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/s390/s390.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/sh/sh.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/sparc/sparc.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/spu/spu.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/x86_64/abi/abi-x86_64.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.target/xstormy16/xstormy16.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gcc.test-framework/test-framework.exp ...
skipping test framework tests, CHECK_TEST_FRAMEWORK is not defined

                === gcc Summary ===

# of expected passes            48826
# of unexpected failures        20
# of unexpected successes       7
# of expected failures          127
# of unsupported tests          488
/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/xgcc  version 4.4.0 20080809 (experimental) (GCC)

make[2]: [check-gcc] Error 1 (ignored)
test -d testsuite || mkdir testsuite
test -d testsuite/g++ || mkdir testsuite/g++
(rootme=`${PWDCMD-pwd}`; export rootme; \
        srcdir=`cd ../.././gcc; ${PWDCMD-pwd}` ; export srcdir ; \
        cd testsuite/g++; \
        rm -f tmp-site.exp; \
        sed '/set tmpdir/ s|testsuite|testsuite/g++|' \
                < ../../site.exp > tmp-site.exp; \
        /bin/sh ${srcdir}/../move-if-change tmp-site.exp site.exp; \
        EXPECT=expect ; export EXPECT ; \
        if [ -f ${rootme}/../expect/expect ] ; then  \
           TCL_LIBRARY=`cd .. ; cd ${srcdir}/../tcl/library ; ${PWDCMD-pwd}` ; \
            export TCL_LIBRARY ; fi ; \
        GCC_EXEC_PREFIX="/usr/lib/gcc/" ; export GCC_EXEC_PREFIX ; \
        runtest --tool g++ )
WARNING: Couldn't find the global config file.
Test Run By root on Tue Aug 12 02:25:15 2008
Native configuration is mips-unknown-linux-gnu

                === g++ tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/gcc/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/bprob/bprob.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/charset/charset.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/compat/compat.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/compat/struct-layout-1.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/debug/debug.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/debug/dwarf2/dwarf2.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/dg.exp ...
FAIL: g++.dg/ipa/iinline-1.C scan-ipa-dump inline "String::funcOne[^\n]*inline copy in int main"
FAIL: g++.dg/lookup/crash7.C  (test for errors, line 8)
FAIL: g++.dg/lookup/crash7.C (test for excess errors)
FAIL: g++.dg/other/PR23205.C scan-assembler .stabs.*foobar:c=i
FAIL: g++.dg/other/error25.C  (test for errors, line 4)
FAIL: g++.dg/other/error25.C (test for excess errors)
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/gcov/gcov.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/gomp/gomp.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/pch/pch.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/special/ecos.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/tls/tls.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/torture/dg-torture.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/torture/stackalign/stackalign.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/tree-prof/tree-prof.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.dg/vect/vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/g++.old-deja/old-deja.exp ...

                === g++ Summary ===

# of expected passes            17942
# of unexpected failures        6
# of expected failures          81
# of unsupported tests          143
/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/testsuite/g++/../../g++  version 4.4.0 20080809 (experimental) (GCC)

make[2]: [check-g++] Error 1 (ignored)
test -d testsuite || mkdir testsuite
test -d testsuite/gfortran || mkdir testsuite/gfortran
(rootme=`${PWDCMD-pwd}`; export rootme; \
        srcdir=`cd ../.././gcc; ${PWDCMD-pwd}` ; export srcdir ; \
        cd testsuite/gfortran; \
        rm -f tmp-site.exp; \
        sed '/set tmpdir/ s|testsuite|testsuite/gfortran|' \
                < ../../site.exp > tmp-site.exp; \
        /bin/sh ${srcdir}/../move-if-change tmp-site.exp site.exp; \
        EXPECT=expect ; export EXPECT ; \
        if [ -f ${rootme}/../expect/expect ] ; then  \
           TCL_LIBRARY=`cd .. ; cd ${srcdir}/../tcl/library ; ${PWDCMD-pwd}` ; \
            export TCL_LIBRARY ; fi ; \
        GCC_EXEC_PREFIX="/usr/lib/gcc/" ; export GCC_EXEC_PREFIX ; \
        runtest --tool gfortran )
WARNING: Couldn't find the global config file.
Test Run By root on Tue Aug 12 03:53:09 2008
Native configuration is mips-unknown-linux-gnu

                === gfortran tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/gcc/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/gcc/testsuite/gfortran.dg/debug/debug.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gfortran.dg/dg.exp ...
FAIL: gfortran.dg/widechar_intrinsics_1.f90  -O   (test for errors, line 114)
FAIL: gfortran.dg/widechar_intrinsics_1.f90  -O  (test for excess errors)
Running /usr/cvsroot/gcc/gcc/testsuite/gfortran.dg/gomp/gomp.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gfortran.dg/vect/vect.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gfortran.fortran-torture/compile/compile.exp ...
Running /usr/cvsroot/gcc/gcc/testsuite/gfortran.fortran-torture/execute/execute.exp ...

                === gfortran Summary ===

# of expected passes            26360
# of unexpected failures        2
# of expected failures          8
# of unsupported tests          246
/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/testsuite/gfortran/../../gfortran  version 4.4.0 20080809 (experimental) (GCC)

make[2]: [check-gfortran] Error 1 (ignored)
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc'
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/intl'
make[2]: Nothing to be done for `check'.
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/intl'
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libcpp'
make[2]: Nothing to be done for `check'.
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libcpp'
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libdecnumber'
make[2]: Nothing to be done for `check'.
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libdecnumber'
make[2]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libiberty'
make[3]: Entering directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libiberty/testsuite'
mips-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -I.. -I../../.././libiberty/testsuite/../../include  -o test-demangle \
                ../../.././libiberty/testsuite/test-demangle.c ../libiberty.a
./test-demangle < ../../.././libiberty/testsuite/demangle-expected
./test-demangle: 770 tests, 0 failures
mips-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -I.. -I../../.././libiberty/testsuite/../../include  -DHAVE_CONFIG_H -I.. -o test-pexecute \
                ../../.././libiberty/testsuite/test-pexecute.c ../libiberty.a
./test-pexecute
mips-unknown-linux-gnu-gcc -DHAVE_CONFIG_H -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -I.. -I../../.././libiberty/testsuite/../../include  -DHAVE_CONFIG_H -I.. -o test-expandargv \
                ../../.././libiberty/testsuite/test-expandargv.c ../libiberty.a
./test-expandargv
PASS: test-expandargv-0.
PASS: test-expandargv-1.
PASS: test-expandargv-2.
PASS: test-expandargv-3.
make[3]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libiberty/testsuite'
make[2]: Leaving directory `/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/libiberty'
make[1]: Target `check-host' not remade because of errors.
make[2]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3'
Making check in include
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/include'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/include'
Making check in libsupc++
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/libsupc++'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/libsupc++'
Making check in libmath
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/libmath'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/libmath'
Making check in doc
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/doc'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/doc'
Making check in src
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/src'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/src'
Making check in po
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/po'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/po'
Making check in testsuite
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/testsuite'
make  check-DEJAGNU
make[4]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/testsuite'
Making a new site.exp file...
srcdir=`CDPATH="${ZSH_VERSION+.}:" && cd ../../.././libstdc++-v3/testsuite && pwd`; export srcdir; \
        EXPECT=expect; export EXPECT; \
        runtest=runtest; \
        if /bin/sh -c "$runtest --version" > /dev/null 2>&1; then \
          l='libstdc++'; for tool in $l; do \
            $runtest  --tool $tool --srcdir $srcdir ; \
          done; \
        else echo "WARNING: could not find \`runtest'" 1>&2; :;\
        fi
WARNING: Couldn't find the global config file.
Test Run By root on Tue Aug 12 06:47:23 2008
Native configuration is mips-unknown-linux-gnu

                === libstdc++ tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/libstdc++-v3/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp.
ERROR: could not compile testsuite_allocator.cc
    while executing
"error "could not compile $f""
    (procedure "v3-build_support" line 61)
    invoked from within
"v3-build_support"
    (file "/usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp" line 22)
    invoked from within
"source /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-abi/abi.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""
Running /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp.
ERROR: could not compile testsuite_allocator.cc
    while executing
"error "could not compile $f""
    (procedure "v3-build_support" line 61)
    invoked from within
"v3-build_support"
    (file "/usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp" line 25)
    invoked from within
"source /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libstdc++-v3/testsuite/libstdc++-dg/conformance.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""

                === libstdc++ Summary ===

make[4]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/testsuite'
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/testsuite'
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3'
true "AR_FLAGS=rc" "CC_FOR_BUILD=mips-unknown-linux-gnu-gcc" "CC_FOR_TARGET=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/xgcc -B/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/ -B/usr/mips-unknown-linux-gnu/bin/ -B/usr/mips-unknown-linux-gnu/lib/ -isystem /usr/mips-unknown-linux-gnu/include -isystem /usr/mips-unknown-linux-gnu/sys-include" "CFLAGS=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -minterlink-mips16  " "CXXFLAGS=-g -O2   -D_GNU_SOURCE -minterlink-mips16  " "CFLAGS_FOR_BUILD=-O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb" "CFLAGS_FOR_TARGET=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -minterlink-mips16" "INSTALL=/usr/bin/install -c" "INSTALL_DATA=/usr/bin/install -c -m 644" "INSTALL_PROGRAM=/usr/bin/install -c" "INSTALL_SCRIPT=/usr/bin/install -c" "LDFLAGS=" "LIBCFLAGS=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -minterlink-mips16  " "LIBCFLAGS_FOR_TARGET=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -minterlink-mips16" "MAKE=make" "MAKEINFO=makeinfo --split-size=5000000 --split-size=5000000  " "PICFLAG=" "PICFLAG_FOR_TARGET=" "SHELL=/bin/sh" "RUNTESTFLAGS=" "exec_prefix=/usr" "infodir=/usr/share/gcc-data/mips-unknown-linux-gnu/gcc-trunk/info" "libdir=/usr/lib" "includedir=/usr/lib/gcc/mips-unknown-linux-gnu/gcc-trunk/include" "prefix=/usr" "tooldir=/usr/mips-unknown-linux-gnu" "gxx_include_dir=/usr/lib/gcc/mips-unknown-linux-gnu/gcc-trunk/include/g++-v4" "AR=/usr/mips-unknown-linux-gnu/bin/ar" "AS=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/as" "LD=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/collect-ld" "RANLIB=/usr/mips-unknown-linux-gnu/bin/ranlib" "NM=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/nm" "NM_FOR_BUILD=" "NM_FOR_TARGET=/usr/mips-unknown-linux-gnu/bin/nm" "DESTDIR=" "WERROR=" DO=all multi-do # make
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3'
make[2]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3'
make[2]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap'
Making check in testsuite
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap/testsuite'
make  check-DEJAGNU
make[4]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap/testsuite'
Making a new site.exp file...
srcdir=`CDPATH="${ZSH_VERSION+.}:" && cd ../../.././libmudflap/testsuite && pwd`; export srcdir; \
        EXPECT=`if [ -f ../../expect/expect ] ; then echo ../../expect/expect ; else echo expect ; fi`; export EXPECT; \
        runtest=`if [ -f ../../.././libmudflap/testsuite/../../dejagnu/runtest ] ; then echo ../../.././libmudflap/testsuite/../../dejagnu/runtest ; else echo runtest ;  fi`; \
        if /bin/sh -c "$runtest --version" > /dev/null 2>&1; then \
          l='libmudflap'; for tool in $l; do \
            $runtest  --tool $tool --srcdir $srcdir ; \
          done; \
        else echo "WARNING: could not find \`runtest'" 1>&2; :;\
        fi
WARNING: Couldn't find the global config file.
Test Run By root on Tue Aug 12 06:48:13 2008
Native configuration is mips-unknown-linux-gnu

                === libmudflap tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/libmudflap/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/cfrags.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/cfrags.exp.
ERROR: couldn't execute "/xgcc": no such file or directory
    while executing
"exec ${gccdir}/xgcc --print-multi-lib"
    (procedure "libmudflap-init" line 38)
    invoked from within
"libmudflap-init c"
    (file "/usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/cfrags.exp" line 4)
    invoked from within
"source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/cfrags.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/cfrags.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""
Running /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/externs.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/externs.exp.
ERROR: couldn't execute "/xgcc": no such file or directory
    while executing
"exec ${gccdir}/xgcc --print-multi-lib"
    (procedure "libmudflap-init" line 38)
    invoked from within
"libmudflap-init c"
    (file "/usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/externs.exp" line 4)
    invoked from within
"source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/externs.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c/externs.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""
Running /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/c++frags.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/c++frags.exp.
ERROR: couldn't execute "/xgcc": no such file or directory
    while executing
"exec ${gccdir}/xgcc --print-multi-lib"
    (procedure "libmudflap-init" line 38)
    invoked from within
"libmudflap-init c++"
    (file "/usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/c++frags.exp" line 4)
    invoked from within
"source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/c++frags.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/c++frags.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""
Running /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/ctors.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/ctors.exp.
ERROR: couldn't execute "/xgcc": no such file or directory
    while executing
"exec ${gccdir}/xgcc --print-multi-lib"
    (procedure "libmudflap-init" line 38)
    invoked from within
"libmudflap-init c++"
    (file "/usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/ctors.exp" line 4)
    invoked from within
"source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/ctors.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.c++/ctors.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""
Running /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.cth/cthfrags.exp ...
ERROR: tcl error sourcing /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.cth/cthfrags.exp.
ERROR: couldn't execute "/xgcc": no such file or directory
    while executing
"exec ${gccdir}/xgcc --print-multi-lib"
    (procedure "libmudflap-init" line 38)
    invoked from within
"libmudflap-init c"
    (file "/usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.cth/cthfrags.exp" line 4)
    invoked from within
"source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.cth/cthfrags.exp"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 source /usr/cvsroot/gcc/libmudflap/testsuite/libmudflap.cth/cthfrags.exp"
    invoked from within
"catch "uplevel #0 source $test_file_name""

                === libmudflap Summary ===

make[4]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap/testsuite'
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap/testsuite'
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap'
true "AR_FLAGS=rc" "CC_FOR_BUILD=mips-unknown-linux-gnu-gcc" "CFLAGS=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -minterlink-mips16  " "CXXFLAGS=-g -O2   -D_GNU_SOURCE -minterlink-mips16  " "CFLAGS_FOR_BUILD=-O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb" "CFLAGS_FOR_TARGET=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -minterlink-mips16" "INSTALL=/usr/bin/install -c" "INSTALL_DATA=/usr/bin/install -c -m 644" "INSTALL_PROGRAM=/usr/bin/install -c" "INSTALL_SCRIPT=/usr/bin/install -c" "JC1FLAGS=" "LDFLAGS=" "LIBCFLAGS=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -minterlink-mips16  " "LIBCFLAGS_FOR_TARGET=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -minterlink-mips16" "MAKE=make" "MAKEINFO=makeinfo --split-size=5000000 --split-size=5000000  " "PICFLAG=" "PICFLAG_FOR_TARGET=" "SHELL=/bin/sh" "RUNTESTFLAGS=" "exec_prefix=/usr" "infodir=/usr/share/gcc-data/mips-unknown-linux-gnu/gcc-trunk/info" "libdir=/usr/lib" "prefix=/usr" "includedir=/usr/lib/gcc/mips-unknown-linux-gnu/gcc-trunk/include" "AR=/usr/mips-unknown-linux-gnu/bin/ar" "AS=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/as" "CC=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/xgcc -B/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/ -B/usr/mips-unknown-linux-gnu/bin/ -B/usr/mips-unknown-linux-gnu/lib/ -isystem /usr/mips-unknown-linux-gnu/include -isystem /usr/mips-unknown-linux-gnu/sys-include" "CXX=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/g++ -B/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/ -nostdinc++ -nostdinc++ -I/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/include/mips-unknown-linux-gnu -I/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/include -I/usr/cvsroot/gcc/libstdc++-v3/libsupc++ -I/usr/cvsroot/gcc/libstdc++-v3/include/backward -I/usr/cvsroot/gcc/libstdc++-v3/testsuite/util -L/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/src -L/usr/cvsroot/gcc/mips-unknown-linux-gnu/libstdc++-v3/src/.libs -B/usr/mips-unknown-linux-gnu/bin/ -B/usr/mips-unknown-linux-gnu/lib/ -isystem /usr/mips-unknown-linux-gnu/include -isystem /usr/mips-unknown-linux-gnu/sys-include" "LD=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/collect-ld" "LIBCFLAGS=-g -O2 -march=r10000 -mtune=r10000 -pipe -fomit-frame-pointer -ftracer -fforce-addr -fweb -minterlink-mips16  " "NM=/usr/cvsroot/gcc/host-mips-unknown-linux-gnu/gcc/nm" "PICFLAG=" "RANLIB=/usr/mips-unknown-linux-gnu/bin/ranlib" "DESTDIR=" DO=all multi-do # make
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap'
make[2]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libmudflap'
make[2]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgfortran'
make  check-am
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgfortran'
true  DO=all multi-do # make
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgfortran'
make[2]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgfortran'
make[2]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libiberty'
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libiberty/testsuite'
make[3]: Nothing to be done for `check'.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libiberty/testsuite'
make[2]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libiberty'
make[2]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp'
Making check in testsuite
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp/testsuite'
make  check-DEJAGNU
make[4]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp/testsuite'
Making a new site.exp file...
srcdir=`CDPATH="${ZSH_VERSION+.}:" && cd ../../.././libgomp/testsuite && pwd`; export srcdir; \
        EXPECT=expect; export EXPECT; \
        runtest=runtest; \
        if /bin/sh -c "$runtest --version" > /dev/null 2>&1; then \
          l='libgomp'; for tool in $l; do \
            $runtest  --tool $tool --srcdir $srcdir ; \
          done; \
        else echo "WARNING: could not find \`runtest'" 1>&2; :;\
        fi
WARNING: Couldn't find the global config file.
Test Run By root on Tue Aug 12 06:48:25 2008
Native configuration is mips-unknown-linux-gnu

                === libgomp tests ===

Schedule of variations:
    unix

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
Using /usr/cvsroot/gcc/libgomp/testsuite/config/default.exp as tool-and-target-specific interface file.
Running /usr/cvsroot/gcc/libgomp/testsuite/libgomp.c/c.exp ...
WARNING: program timed out.
FAIL: libgomp.c/appendix-a/a.18.1.c execution test
FAIL: libgomp.c/barrier-1.c (test for excess errors)
WARNING: libgomp.c/barrier-1.c compilation failed to produce executable
FAIL: libgomp.c/collapse-1.c (test for excess errors)
WARNING: libgomp.c/collapse-1.c compilation failed to produce executable
FAIL: libgomp.c/collapse-2.c (test for excess errors)
WARNING: libgomp.c/collapse-2.c compilation failed to produce executable
FAIL: libgomp.c/collapse-3.c (test for excess errors)
WARNING: libgomp.c/collapse-3.c compilation failed to produce executable
FAIL: libgomp.c/critical-1.c (test for excess errors)
WARNING: libgomp.c/critical-1.c compilation failed to produce executable
FAIL: libgomp.c/debug-1.c (internal compiler error)
FAIL: libgomp.c/debug-1.c (test for excess errors)
WARNING: libgomp.c/debug-1.c compilation failed to produce executable
FAIL: libgomp.c/icv-1.c execution test
FAIL: libgomp.c/lib-2.c (test for excess errors)
WARNING: libgomp.c/lib-2.c compilation failed to produce executable
FAIL: libgomp.c/lock-1.c execution test
FAIL: libgomp.c/lock-2.c execution test
FAIL: libgomp.c/loop-1.c (test for excess errors)
WARNING: libgomp.c/loop-1.c compilation failed to produce executable
FAIL: libgomp.c/loop-10.c execution test
FAIL: libgomp.c/loop-2.c (test for excess errors)
WARNING: libgomp.c/loop-2.c compilation failed to produce executable
FAIL: libgomp.c/loop-3.c execution test
FAIL: libgomp.c/loop-5.c (test for excess errors)
WARNING: libgomp.c/loop-5.c compilation failed to produce executable
FAIL: libgomp.c/loop-6.c (test for excess errors)
WARNING: libgomp.c/loop-6.c compilation failed to produce executable
FAIL: libgomp.c/loop-7.c (test for excess errors)
WARNING: libgomp.c/loop-7.c compilation failed to produce executable
FAIL: libgomp.c/loop-8.c (test for excess errors)
WARNING: libgomp.c/loop-8.c compilation failed to produce executable
FAIL: libgomp.c/loop-9.c (test for excess errors)
WARNING: libgomp.c/loop-9.c compilation failed to produce executable
FAIL: libgomp.c/nested-3.c (test for excess errors)
WARNING: libgomp.c/nested-3.c compilation failed to produce executable
FAIL: libgomp.c/nestedfn-6.c (internal compiler error)
FAIL: libgomp.c/nestedfn-6.c (test for excess errors)
WARNING: libgomp.c/nestedfn-6.c compilation failed to produce executable
FAIL: libgomp.c/omp_workshare3.c  (test for errors, line 33)
FAIL: libgomp.c/omp_workshare3.c (test for excess errors)
FAIL: libgomp.c/ordered-1.c (test for excess errors)
WARNING: libgomp.c/ordered-1.c compilation failed to produce executable
FAIL: libgomp.c/ordered-2.c (test for excess errors)
WARNING: libgomp.c/ordered-2.c compilation failed to produce executable
FAIL: libgomp.c/parallel-1.c (test for excess errors)
WARNING: libgomp.c/parallel-1.c compilation failed to produce executable
FAIL: libgomp.c/pr26943-2.c  (test for warnings, line 23)
FAIL: libgomp.c/pr26943-2.c  (test for warnings, line 34)
FAIL: libgomp.c/pr26943-3.c  (test for warnings, line 29)
FAIL: libgomp.c/pr26943-3.c  (test for warnings, line 40)
FAIL: libgomp.c/pr26943-4.c  (test for warnings, line 30)
FAIL: libgomp.c/pr26943-4.c  (test for warnings, line 41)
FAIL: libgomp.c/reduction-5.c (internal compiler error)
FAIL: libgomp.c/reduction-5.c (test for excess errors)
WARNING: libgomp.c/reduction-5.c compilation failed to produce executable
FAIL: libgomp.c/sections-1.c (test for excess errors)
WARNING: libgomp.c/sections-1.c compilation failed to produce executable
FAIL: libgomp.c/single-1.c (test for excess errors)
WARNING: libgomp.c/single-1.c compilation failed to produce executable
FAIL: libgomp.c/task-1.c execution test
Running /usr/cvsroot/gcc/libgomp/testsuite/libgomp.c++/c++.exp ...
No libstdc++ library found, will not execute c++ tests
Running /usr/cvsroot/gcc/libgomp/testsuite/libgomp.fortran/fortran.exp ...

                === libgomp Summary ===

# of expected passes            182
# of unexpected failures        40
make[4]: *** [check-DEJAGNU] Error 1
make[4]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp/testsuite'
make[3]: *** [check-am] Error 2
make[3]: Target `check' not remade because of errors.
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp/testsuite'
make[3]: Entering directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp'
true  DO=all multi-do # make
make[3]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp'
make[2]: *** [check-recursive] Error 1
make[2]: Target `check' not remade because of errors.
make[2]: Leaving directory `/usr/cvsroot/gcc/mips-unknown-linux-gnu/libgomp'
make[1]: *** [check-target-libgomp] Error 2
make[1]: Target `check-target' not remade because of errors.
make[1]: Leaving directory `/usr/cvsroot/gcc'
make: *** [do-check] Error 2
make: Target `check' not remade because of errors.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-13  8:53                       ` Kumba
@ 2008-08-16 10:10                         ` Richard Sandiford
  2008-08-18  8:51                           ` Kumba
  0 siblings, 1 reply; 22+ messages in thread
From: Richard Sandiford @ 2008-08-16 10:10 UTC (permalink / raw)
  To: Kumba; +Cc: gcc-patches, mips

Kumba <kumba@gentoo.org> writes:
> gcc/
>      * config/mips/10000.md: Add R10000 scheduler
>      * config/mips/mips.c: Add r10000 params & costs
>      * config/mips/mips.h: Add R10k constant
>      * config/mips/mips.md: Add r10000 params & incl 10000.md
>      * doc/invoke.texi: List r1x000 family

Looks good, thanks.  Minor comment nits:

> +;; Integer add/sub + logic ops, and mt hi/lo can be done by alu1 or alu2.
> +;; Miscellaneous arith goes here too (this is a guess).
> +(define_insn_reservation "r10k_arith" 1
> +  (and (eq_attr "cpu" "r10000")
> +       (eq_attr "type" "arith,mthilo,slt,clz,const,nop,trap,logical"))
> +  "r10k_alu1 | r10k_alu2")

Not sure if this is really a guess.  "arith" is "general ALU stuff
that we haven't needed to split out as separate types".  So if
shifts and conditional moves are really the only non-branching
instructions that require a particular ALU (ALU1), then I think we
can be pretty confident this is right.

> +;; We treat mfhilo differently, because we need to know when
> +;; it's HI and when it's LO.
> +(define_insn_reservation "r10k_mfhi" 1
> +  (and (eq_attr "cpu" "r10000")
> +       (and (eq_attr "type" "mfhilo")
> +            (not (match_operand 1 "lo_operand"))))
> +  "r10k_alu1 | r10k_alu2")
> +
> +(define_insn_reservation "r10k_mflo" 1
> +  (and (eq_attr "cpu" "r10000")
> +       (and (eq_attr "type" "mfhilo")
> +            (match_operand 1 "lo_operand")))
> +  "r10k_alu1 | r10k_alu2")

s/We treat mfhilo differently/We use separate reservations for mfhilo/

> +;; Only ALU2 does int multiplications and divisions.
> +;;
> +;; According to the Vr10000 series user manual,
> +;; integer mult and div insns can be issued one
> +;; cycle earlier if using register Lo.  We model
> +;; this by using the Lo value by default, as it
> +;; is the more common value, and use a bypass
> +;; for the Hi value when needed.
> +;;
> +;; Also of note, There are different latencies
> +;; for MULT/DMULT (Lo 5/Hi 6) and MULTU/DMULTU (Lo 6/Hi 7).
> +;; However, gcc does not have separate types
> +;; for these insns.  Thus to strike a balance,
> +;; we use the Hi latency value for imul
> +;; operations to strike a balance until the
> +;; imul type can be split.

s/to strike a balance until/until/

> +;; Divides also keep ALU2 busy, but this isn't modeled
> +;; here.

Needs clarifying.  Integer divides _are_ modelled as using r10k_alu2.

OK otherwise.  Do you have a copyright assignment on file?

Let me know if you have svn write access.  I'll apply the patch
for you if not.

Richard

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-16 10:10                         ` Richard Sandiford
@ 2008-08-18  8:51                           ` Kumba
  2008-08-18 17:00                             ` David Daney
  2008-10-06 21:33                             ` Richard Sandiford
  0 siblings, 2 replies; 22+ messages in thread
From: Kumba @ 2008-08-18  8:51 UTC (permalink / raw)
  To: gcc-patches, mips, rdsandiford

Richard Sandiford wrote:
> 
> Not sure if this is really a guess.  "arith" is "general ALU stuff
> that we haven't needed to split out as separate types".  So if
> shifts and conditional moves are really the only non-branching
> instructions that require a particular ALU (ALU1), then I think we
> can be pretty confident this is right.

Pretty much.  The R10K manual lists Add/Sub/Logical/Set, MT/MF HI/LO (for Ints) 
as using either ALU1 or ALU2.  I figured "Arith" covered Add/Sub and such.

Shift/LUI, Cond Branch, and Int Condmove all use ALU1 (however, only one branch 
can be done in a single cycle).  Not sure whether I can lump LUI under there. 
It looks like it's already part of the "shift" type, so it's probably already 
covered.  I wasn't sure if the "branch" type was cond or uncond, so I put it 
under ALU1 anyways.

Btw, what's the QI, HI, and TI/TF modes for?  I only used SI/DI and SF/DF for 
testing cond move, but there doesn't seem to be immediate documentation on what 
these other modes are (admittedly, I didn't go looking because they don't look 
necessary for this patch, but curiosity beckons).


> s/to strike a balance until/until/

Ah, thanks for catching this.  Probably where I got sidetracked in my thinking...


> OK otherwise.  Do you have a copyright assignment on file?

Nope.  Is there something I need to fill out and e-mail to someone?

Do I need to put my name and the name of the author of the very original gcc-3.0 
patch in this file as well?


> Let me know if you have svn write access.  I'll apply the patch
> for you if not.

Nope, don't have SVN write access.  This is my first gcc patch pretty much.


Thanks!,

Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org


gcc/
     * config/mips/10000.md: Add R10000 scheduler
     * config/mips/mips.c: Add r10000 params & costs
     * config/mips/mips.h: Add R10k constant
     * config/mips/mips.md: Add r10000 params & incl 10000.md
     * doc/invoke.texi: List r1x000 family


diff -Naurp gcc.orig/gcc/config/mips/10000.md gcc/gcc/config/mips/10000.md
--- gcc.orig/gcc/config/mips/10000.md	1969-12-31 19:00:00.000000000 -0500
+++ gcc/gcc/config/mips/10000.md	2008-08-17 23:48:13.000000000 -0400
@@ -0,0 +1,253 @@
+;; DFA-based pipeline description for the VR1x000.
+;;   Copyright (C) 2005, 2006, 2008 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+
+;; R12K/R14K/R16K are derivatives of R10K, thus copy its description
+;; until specific tuning for each is added.
+
+;; R10000 has an int queue, fp queue, address queue.
+;; The int queue feeds ALU1 and ALU2.
+;; The fp queue feeds the fp-adder and fp-multiplier.
+;; The addr queue feeds the Load/Store unit.
+;;
+;; However, we define the fp-adder and fp-multiplier as
+;; separate automatons, because the fp-multiplier is
+;; divided into fp-multiplier, fp-division, and
+;; fp-squareroot units, all of which share the same
+;; issue and completion logic, yet can operate in
+;; parallel.
+;;
+;; This is based on the model described in the R10K Manual
+;; and it helps to reduce the size of the automata.
+(define_automaton "r10k_a_int, r10k_a_fpadder, r10k_a_addr,
+                   r10k_a_fpmpy, r10k_a_fpdiv, r10k_a_fpsqrt")
+
+(define_cpu_unit "r10k_alu1" "r10k_a_int")
+(define_cpu_unit "r10k_alu2" "r10k_a_int")
+(define_cpu_unit "r10k_fpadd" "r10k_a_fpadder")
+(define_cpu_unit "r10k_fpmpy" "r10k_a_fpmpy")
+(define_cpu_unit "r10k_fpdiv" "r10k_a_fpdiv")
+(define_cpu_unit "r10k_fpsqrt" "r10k_a_fpsqrt")
+(define_cpu_unit "r10k_loadstore" "r10k_a_addr")
+
+
+;; R10k Loads and Stores.
+(define_insn_reservation "r10k_load" 2
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "load,prefetch,prefetchx"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_store" 0
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "store,fpstore,fpidxstore"))
+  "r10k_loadstore")
+
+(define_insn_reservation "r10k_fpload" 3
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "fpload,fpidxload"))
+  "r10k_loadstore")
+
+
+;; Integer add/sub + logic ops, and mt hi/lo can be done by alu1 or alu2.
+;; Miscellaneous arith goes here too (this is a guess).
+(define_insn_reservation "r10k_arith" 1
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "arith,mthilo,slt,clz,const,nop,trap,logical"))
+  "r10k_alu1 | r10k_alu2")
+
+;; We treat mfhilo differently, because we need to know when
+;; it's HI and when it's LO.
+(define_insn_reservation "r10k_mfhi" 1
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "mfhilo")
+            (not (match_operand 1 "lo_operand"))))
+  "r10k_alu1 | r10k_alu2")
+
+(define_insn_reservation "r10k_mflo" 1
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "mfhilo")
+            (match_operand 1 "lo_operand")))
+  "r10k_alu1 | r10k_alu2")
+
+
+;; ALU1 handles shifts, branch eval, and condmove.
+;;
+;; Brancher is separate, but part of ALU1, but can only
+;; do one branch per cycle (is this even implementable?).
+;;
+;; Unsure if the brancher handles jumps and calls as well, but since
+;; they're related, we'll add them here for now.
+(define_insn_reservation "r10k_brancher" 1
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "shift,branch,jump,call"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_int_cmove" 1
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SI,DI")))
+  "r10k_alu1")
+
+
+;; Coprocessor Moves.
+;; mtc1/dmtc1 are handled by ALU1.
+;; mfc1/dmfc1 are handled by the fp-multiplier.
+(define_insn_reservation "r10k_mt_xfer" 3
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "mtc"))
+  "r10k_alu1")
+
+(define_insn_reservation "r10k_mf_xfer" 2
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "mfc"))
+  "r10k_fpmpy")
+
+
+;; Only ALU2 does int multiplications and divisions.
+;;
+;; According to the Vr10000 series user manual,
+;; integer mult and div insns can be issued one
+;; cycle earlier if using register Lo.  We model
+;; this by using the Lo value by default, as it
+;; is the more common value, and use a bypass
+;; for the Hi value when needed.
+;;
+;; Also of note, There are different latencies
+;; for MULT/DMULT (Lo 5/Hi 6) and MULTU/DMULTU (Lo 6/Hi 7).
+;; However, gcc does not have separate types
+;; for these insns.  Thus to strike a balance,
+;; we use the Hi latency value for imul
+;; operations until the imul type can be split.
+(define_insn_reservation "r10k_imul_single" 6
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "imul,imul3")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 6")
+
+(define_insn_reservation "r10k_imul_double" 10
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "imul,imul3")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 10")
+
+;; Divides keep ALU2 busy.
+(define_insn_reservation "r10k_idiv_single" 34
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "SI")))
+  "r10k_alu2 * 35")
+
+(define_insn_reservation "r10k_idiv_double" 66
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "idiv")
+            (eq_attr "mode" "DI")))
+  "r10k_alu2 * 67")
+
+(define_bypass 35 "r10k_idiv_single" "r10k_mfhi")
+(define_bypass 67 "r10k_idiv_double" "r10k_mfhi")
+
+
+;; Floating point add/sub, mul, abs value, neg, comp, & moves.
+(define_insn_reservation "r10k_fp_miscadd" 2
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "fadd,fabs,fneg,fcmp"))
+  "r10k_fpadd")
+
+(define_insn_reservation "r10k_fp_miscmul" 2
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "fmul,fmove"))
+  "r10k_fpmpy")
+
+(define_insn_reservation "r10k_fp_cmove" 2
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "condmove")
+            (eq_attr "mode" "SF,DF")))
+  "r10k_fpmpy")
+
+
+;; The fcvt.s.[wl] insn has latency 4, repeat 2.
+;; All other fcvt insns have latency 2, repeat 1.
+(define_insn_reservation "r10k_fcvt_single" 4
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "I2S")))
+  "r10k_fpadd * 2")
+
+(define_insn_reservation "r10k_fcvt_other" 2
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fcvt")
+            (eq_attr "cnv_mode" "!I2S")))
+  "r10k_fpadd")
+
+
+;; Run the fmadd insn through fp-adder first, then fp-multiplier.
+;;
+;; The latency for fmadd is 2 cycles if the result is used
+;; by another fmadd instruction.
+(define_insn_reservation "r10k_fmadd" 4
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "fmadd"))
+  "r10k_fpadd, r10k_fpmpy")
+
+(define_bypass 2 "r10k_fmadd" "r10k_fmadd")
+
+
+;; Floating point Divisions & square roots.
+(define_insn_reservation "r10k_fdiv_single" 12
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "SF")))
+  "r10k_fpdiv * 14")
+
+(define_insn_reservation "r10k_fdiv_double" 19
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fdiv,frdiv")
+            (eq_attr "mode" "DF")))
+  "r10k_fpdiv * 21")
+
+(define_insn_reservation "r10k_fsqrt_single" 18
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_fsqrt_double" 33
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "fsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+(define_insn_reservation "r10k_frsqrt_single" 30
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "SF")))
+  "r10k_fpsqrt * 20")
+
+(define_insn_reservation "r10k_frsqrt_double" 52
+  (and (eq_attr "cpu" "r10000")
+       (and (eq_attr "type" "frsqrt")
+            (eq_attr "mode" "DF")))
+  "r10k_fpsqrt * 35")
+
+
+;; Handle unknown/multi insns here (this is a guess).
+(define_insn_reservation "r10k_unknown" 1
+  (and (eq_attr "cpu" "r10000")
+       (eq_attr "type" "unknown,multi"))
+  "r10k_alu1 + r10k_alu2")
diff -Naurp gcc.orig/gcc/config/mips/mips.c gcc/gcc/config/mips/mips.c
--- gcc.orig/gcc/config/mips/mips.c	2008-08-17 23:28:33.000000000 -0400
+++ gcc/gcc/config/mips/mips.c	2008-08-17 23:48:13.000000000 -0400
@@ -597,6 +597,10 @@ static const struct mips_cpu_info mips_c

    /* MIPS IV processors. */
    { "r8000", PROCESSOR_R8000, 4, 0 },
+  { "r10000", PROCESSOR_R10000, 4, 0 },
+  { "r12000", PROCESSOR_R10000, 4, 0 },
+  { "r14000", PROCESSOR_R10000, 4, 0 },
+  { "r16000", PROCESSOR_R10000, 4, 0 },
    { "vr5000", PROCESSOR_R5000, 4, 0 },
    { "vr5400", PROCESSOR_R5400, 4, 0 },
    { "vr5500", PROCESSOR_R5500, 4, PTF_AVOID_BRANCHLIKELY },
@@ -992,6 +996,19 @@ static const struct mips_rtx_cost_data m
  		     1,           /* branch_cost */
  		     4            /* memory_latency */
    },
+  { /* R1x000 */
+    COSTS_N_INSNS (2),            /* fp_add */
+    COSTS_N_INSNS (2),            /* fp_mult_sf */
+    COSTS_N_INSNS (2),            /* fp_mult_df */
+    COSTS_N_INSNS (12),           /* fp_div_sf */
+    COSTS_N_INSNS (19),           /* fp_div_df */
+    COSTS_N_INSNS (5),            /* int_mult_si */
+    COSTS_N_INSNS (9),           /* int_mult_di */
+    COSTS_N_INSNS (34),           /* int_div_si */
+    COSTS_N_INSNS (66),           /* int_div_di */
+		     1,           /* branch_cost */
+		     4            /* memory_latency */
+  },
    { /* SB1 */
      /* These costs are the same as the SB-1A below.  */
      COSTS_N_INSNS (4),            /* fp_add */
@@ -10304,7 +10321,10 @@ mips_issue_rate (void)
  	 but in reality only a maximum of 3 insns can be issued as
  	 floating-point loads and stores also require a slot in the
  	 AGEN pipe.  */
-     return 4;
+    case PROCESSOR_R10000:
+      /* All R10K Processors are quad-issue (being the first MIPS
+         processors to support this feature). */
+      return 4;

      case PROCESSOR_20KC:
      case PROCESSOR_R4130:
diff -Naurp gcc.orig/gcc/config/mips/mips.h gcc/gcc/config/mips/mips.h
--- gcc.orig/gcc/config/mips/mips.h	2008-08-17 23:28:33.000000000 -0400
+++ gcc/gcc/config/mips/mips.h	2008-08-17 23:48:13.000000000 -0400
@@ -66,6 +66,7 @@ enum processor_type {
    PROCESSOR_R7000,
    PROCESSOR_R8000,
    PROCESSOR_R9000,
+  PROCESSOR_R10000,
    PROCESSOR_SB1,
    PROCESSOR_SB1A,
    PROCESSOR_SR71000,
@@ -253,6 +254,7 @@ enum mips_code_readable_setting {
  #define TARGET_MIPS5500             (mips_arch == PROCESSOR_R5500)
  #define TARGET_MIPS7000             (mips_arch == PROCESSOR_R7000)
  #define TARGET_MIPS9000             (mips_arch == PROCESSOR_R9000)
+#define TARGET_MIPS10000            (mips_arch == PROCESSOR_R10000)
  #define TARGET_SB1                  (mips_arch == PROCESSOR_SB1		\
  				     || mips_arch == PROCESSOR_SB1A)
  #define TARGET_SR71K                (mips_arch == PROCESSOR_SR71000)
@@ -279,6 +281,7 @@ enum mips_code_readable_setting {
  #define TUNE_MIPS6000               (mips_tune == PROCESSOR_R6000)
  #define TUNE_MIPS7000               (mips_tune == PROCESSOR_R7000)
  #define TUNE_MIPS9000               (mips_tune == PROCESSOR_R9000)
+#define TUNE_MIPS10000              (mips_tune == PROCESSOR_R10000)
  #define TUNE_SB1                    (mips_tune == PROCESSOR_SB1		\
  				     || mips_tune == PROCESSOR_SB1A)

diff -Naurp gcc.orig/gcc/config/mips/mips.md gcc/gcc/config/mips/mips.md
--- gcc.orig/gcc/config/mips/mips.md	2008-08-09 15:43:46.000000000 -0400
+++ gcc/gcc/config/mips/mips.md	2008-08-17 23:48:13.000000000 -0400
@@ -556,7 +556,7 @@
  ;; Attribute describing the processor.  This attribute must match exactly
  ;; with the processor_type enumeration in mips.h.
  (define_attr "cpu"
- 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,sb1,sb1a,sr71000,xlr"
+ 
"r3000,4kc,4kp,5kc,5kf,20kc,24kc,24kf2_1,24kf1_1,74kc,74kf2_1,74kf1_1,74kf3_2,loongson_2e,loongson_2f,m4k,r3900,r6000,r4000,r4100,r4111,r4120,r4130,r4300,r4600,r4650,r5000,r5400,r5500,r7000,r8000,r9000,r10000,sb1,sb1a,sr71000,xlr"
    (const (symbol_ref "mips_tune")))

  ;; The type of hardware hazard associated with this instruction.
@@ -906,6 +906,7 @@
  (include "6000.md")
  (include "7000.md")
  (include "9000.md")
+(include "10000.md")
  (include "sb1.md")
  (include "sr71k.md")
  (include "xlr.md")
diff -Naurp gcc.orig/gcc/doc/invoke.texi gcc/gcc/doc/invoke.texi
--- gcc.orig/gcc/doc/invoke.texi	2008-08-17 23:28:11.000000000 -0400
+++ gcc/gcc/doc/invoke.texi	2008-08-17 23:48:13.000000000 -0400
@@ -11982,6 +11982,7 @@ The processor names are:
  @samp{r2000}, @samp{r3000}, @samp{r3900}, @samp{r4000}, @samp{r4400},
  @samp{r4600}, @samp{r4650}, @samp{r6000}, @samp{r8000},
  @samp{rm7000}, @samp{rm9000},
+@samp{r10000}, @samp{r12000}, @samp{r14000}, @samp{r16000},
  @samp{sb1},
  @samp{sr71000},
  @samp{vr4100}, @samp{vr4111}, @samp{vr4120}, @samp{vr4130}, @samp{vr4300},

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-18  8:51                           ` Kumba
@ 2008-08-18 17:00                             ` David Daney
  2008-08-19  2:59                               ` Kumba
  2008-10-06 21:33                             ` Richard Sandiford
  1 sibling, 1 reply; 22+ messages in thread
From: David Daney @ 2008-08-18 17:00 UTC (permalink / raw)
  To: Kumba; +Cc: gcc-patches, mips, rdsandiford, GCC Mailing List

Kumba wrote:
> Richard Sandiford wrote:
> 
>> OK otherwise.  Do you have a copyright assignment on file?
> 
> Nope.  Is there something I need to fill out and e-mail to someone?
> 

Yes there is.  I'm not sure if Richard can cause them to be sent to you, but certainly requesting copyright assignment documents on gcc@gcc.gnu.org would work.  It can often take many weeks to get them processed, so starting as soon as possible would be a good idea. 

> Do I need to put my name and the name of the author of the very original 
> gcc-3.0 patch in this file as well?

It would depend on if any of the original patch code remains.  If so, probably a copyright assignment for the original author would be required as well (at least that is my understanding).

David Daney

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-18 17:00                             ` David Daney
@ 2008-08-19  2:59                               ` Kumba
  0 siblings, 0 replies; 22+ messages in thread
From: Kumba @ 2008-08-19  2:59 UTC (permalink / raw)
  To: David Daney; +Cc: gcc-patches, mips, rdsandiford, GCC Mailing List

David Daney wrote:

> Yes there is.  I'm not sure if Richard can cause them to be sent to you, 
> but certainly requesting copyright assignment documents on 
> gcc@gcc.gnu.org would work.  It can often take many weeks to get them 
> processed, so starting as soon as possible would be a good idea.

I'll submit a request outside of this thread later on then.  Thanks for the info!

As far as processing time goes, will that impact getting this patch committed? 
My understanding is end of August for new features for gcc-4.4.

> It would depend on if any of the original patch code remains.  If so, 
> probably a copyright assignment for the original author would be 
> required as well (at least that is my understanding).

That's an interesting thought then.  I largely left his code intact while I used 
this patch throughout gcc-3.2, 3.3, and 3.4 (I simply updated it to apply to the 
subsequent releases in the 3.x series).  I only converted it to DFA in gcc-4.0, 
which was largely a complete rewrite, using the original patch as a guide to 
learn how DFA changed things around.  Plus, the core information comes from the 
Vr10000 manual anyways, available off www.necel.com.  Considering the very 
original patch was posted once to gcc-patches years ago for gcc-3.0 
consideration, I would be surprised if the e-mail address of that submission 
still goes to him.

Thoughts?

-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org

"The past tempts us, the present confuses us, the future frightens us.  And our 
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH]: GCC Scheduler support for R10000 on MIPS
  2008-08-18  8:51                           ` Kumba
  2008-08-18 17:00                             ` David Daney
@ 2008-10-06 21:33                             ` Richard Sandiford
  1 sibling, 0 replies; 22+ messages in thread
From: Richard Sandiford @ 2008-10-06 21:33 UTC (permalink / raw)
  To: Kumba; +Cc: gcc-patches, mips

Kumba <kumba@gentoo.org> writes:
>> Let me know if you have svn write access.  I'll apply the patch
>> for you if not.
>
> Nope, don't have SVN write access.  This is my first gcc patch pretty much.

The copyright assignment has now gone through, so I went ahead
and applied the patch[*].  Thanks for the contribution, and for
your patience.  And thanks to Gerald for once again helping me
with the copyright stuff.

  [*] I removed the unused TARGET_MIPS10000 and TUNE_MIPS10000 macros
      though.  We generally only add those macros when they're needed
      for something.

I applied the following patch to the GCC 4.4 release notes.

Richard


Index: htdocs/gcc-4.4/changes.html
===================================================================
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.4/changes.html,v
retrieving revision 1.34
diff -u -p -r1.34 changes.html
--- htdocs/gcc-4.4/changes.html	6 Oct 2008 18:58:44 -0000	1.34
+++ htdocs/gcc-4.4/changes.html	6 Oct 2008 19:43:50 -0000
@@ -345,6 +345,10 @@
         instead of relying on a <code>libgcc</code> function.</li>
     <li>Native GNU/Linux toolchains now support <code>-march=native</code>
         and <code>-mtune=native</code>, which select the host processor.</li>
+    <li>GCC now supports the R10K, R12K, R14K and R16K processors.  The
+        canonical <code>-march=</code> and <code>-mtune=</code> names for
+        these processors are <code>r10000</code>, <code>r12000</code>,
+        <code>r14000</code> and <code>r16000</code> respectively.</li>
     <li>GCC can now work around the side effects of speculative execution
         on R10K processors.  Please see the documentation of the
         <code>-mr10k-cache-barrier</code> option for details.</li>

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2008-10-06 19:46 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-08-01  1:53 [PATCH]: GCC Scheduler support for R10000 on MIPS Kumba
2008-08-02  4:29 ` Kumba
2008-08-02  9:48 ` Richard Sandiford
2008-08-03  3:37   ` Kumba
2008-08-03  7:20     ` Ralf Wildenhues
2008-08-03 10:40     ` Richard Sandiford
2008-08-04  7:20       ` Kumba
2008-08-04 19:23         ` Richard Sandiford
2008-08-04 19:30           ` contribute.html: compare pre/post patch testresults (was: [PATCH]: GCC Scheduler support for R10000 on MIPS) Ralf Wildenhues
2008-08-06 14:51             ` Ian Lance Taylor
2008-08-05  2:48           ` [PATCH]: GCC Scheduler support for R10000 on MIPS Kumba
2008-08-05 18:29             ` Richard Sandiford
2008-08-06  7:58               ` Kumba
2008-08-07 21:24                 ` Richard Sandiford
2008-08-08  8:46                   ` Kumba
2008-08-09  9:01                     ` Richard Sandiford
2008-08-13  8:53                       ` Kumba
2008-08-16 10:10                         ` Richard Sandiford
2008-08-18  8:51                           ` Kumba
2008-08-18 17:00                             ` David Daney
2008-08-19  2:59                               ` Kumba
2008-10-06 21:33                             ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).