Dear Ramana, Thank you for your kindly help. I have modified according to your suggestion. The following is my Changlog. 2010-12-02 Sanjin Liu Mingfeng Wu * config/arm/arm-cores.def: Add Faraday CPU support - fa526,fa626,fa606te,fa626te,fmp626,fa726te. * config/arm/arm-tune.md: Regenerate. * config/arm/arm.c (arm_fa726te_tune): New. (fa726te_sched_adjust_cost): New. (arm_issue_rate): Handle fa726te. * config/arm/arm.md (generic_sched): Don't use Generic scheduler for Faraday cores. * config/arm/bpabi.h (TARGET_FIX_V4BX_SPEC): Handle fa526 and fa626. * config/arm/t-arm (MD_INCLUDES): Include machine description files for Faraday cores. * config/arm/t-arm-elf: Add multilib option for Faraday cores. * config/arm/t-linux-eabi: Add multilib option for Faraday cores except fa526 and fa626. * doc/invoke.texi: Document -mcpu for Faraday cores. * config/arm/fa526.md: New file. * config/arm/fa626.md: New file. * config/arm/fa606te.md: New file. * config/arm/fa626te.md: New file. * config/arm/fmp626.md: New file. * config/arm/fa726te.md: New file. 2010/12/9 Ramana Radhakrishnan : > Hi Mingfeng > > Sorry about the late response and thanks for working through the issues. > I've been off sick and only got back to looking at this today. > > Some minor nits in your changelog. > > >> 2010-12-02  Sanjin Liu   >>         Mingfeng Wu   >> >>       * config/arm/arm-cores.def: Add Faraday CPU support - >>       fa526/fa626/fa606te/fa626te/fmp626/fa726te. > > ',' instead of '/' in the changelog entry. fa526, fa626 etc. > >>       * config/arm/arm-tune.md: Regenerate. >>       * config/arm/arm.c (arm_fa726te_tune): New tune_params for fa726te > > It's enough to say New. > OK. >>       (fa726te_sched_adjust_cost): New cost function for fa726te. > > Enough to say New. > OK. >>       (arm_issue_rate): Add fa726te. > > s/Add/Handle > OK. >>       * config/arm/arm.md (generic_sched): Add Faraday cores to generic_sched >>       and include machine description files. > > Replace sentence with : > Don't use Generic scheduler for Faraday cores. > Replaced. > >>       * config/arm/bpabi.h (TARGET_FIX_V4BX_SPEC): Add fa526 and fa626. > > s/Add/Handle > OK. > >> > >> >> +(define_query_cpu_unit "fa726te_lsu1_pipe_e,fa726te_lsu1_pipe_w" >> "fa726te") >> > >> > You have a query_cpu_unit which you don't seem to be querying for in >> the backend in any form? Is there any thing else >> > missing in your pipeline description or has this been put in for >> future use ? >> > >> >> I only use the units defined by query_cpu_unit in the fa726te.md. The >> two units, fa726te_lsu1_pipe_e and >> fa726te_lsu1_pipe_w, are only used for arrange the load instructions. >> Because fa726te only supports one >> ldr/str pipe, I use the query_cpu_unit to define another pseudo pipe >> for better load instruction scheduling. > > The reason I asked why you used the define_cpu_unit vs > define_query_cpu_unit was because there was no backend hook that queried > for the cpu unit in question. IIRC define_cpu_unit and > define_query_cpu_unit are more or less identical except for the fact > that you can result in better minimization in one case than the other. > > However in this case the difference in the size of the automaton should > be minimal considering it isn't a very big automaton in question. > >> >> >> +;; reservation which blocks IS >> >> +(define_reservation "fa726te_blockage" "(fa726te_is0 >> +fa726te_is1)") >> > >> > Can you clarify the comment above ? Again the comments about >> sentence case and full stops hold. > >> >> >> It is used to restrict the instruction issue to one. > > > Ok so make that explicit in the comment. Something like: > > " ;Reservation to restrict issue to 1. > Modified as suggested. > > > Now on to your latest patch submission. > > > > Still a few formatting issues from your latest patch. > > >> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md >> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md 1970-01-01 08:00:00.000000000 +0800 >> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md     2010-12-02 09:18:17.671515000 +0800 >> > <...> > >> >> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> +;; Branch and Call Instructions >> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> + >> +;; Branch instructions are difficult to model accurately.  The FA606TE >> +;; core can predict most branches.  If the branch is predicted >> +;; correctly, and predicted early enough, the branch can be completely >> +;; eliminated from the instruction stream.  Some branches can >> +;; therefore appear to require zero cycles to execute.  We assume that >> +;; all branches are predicted correctly, and that the latency is >> +;; therefore the minimum value. >> + >> +(define_insn_reservation "606te_branch_op" 0 >> + (and (eq_attr "tune" "fa606te") >> +      (eq_attr "type" "branch")) >> + "fa606te_core") >> + >> +;; The latency for a call is actually the latency when the result being available. >> +;; i.e. R0 ready for int return value. For most cases, the return value is set by a >                                        ^^^ >                                        2 spaces between '.' and start of next sentence. > OK. >> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md >> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md 1970-01-01 08:00:00.000000000 +0800 >> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fa606te.md     2010-12-02 09:18:17.671515000 +0800 >> @@ -0,0 +1,171 @@ >> >> +;; The latency for a call is actually the latency when the result is available. >> +;; i.e. R0 ready for int return value. >                                         ^ > > Very small nit. Remove trailing white-space. > OK. >> >> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md >> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md 1970-01-01 08:00:00.000000000 +0800 >> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fa726te.md     2010-12-02 14:45:23.731365000 +0800 > > > >> >> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> +;; Pipelines >> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> + >> +;;   The ALU pipeline has fetch, decode, execute, memory, and >> +;;   write stages.  We only need to model the execute, memory and write >> +;;   stages. >> + >> +;;   E1      E2      E3      E4      E5      WB >> +;;______________________________________________________ >> +;; >> +;;      <-------------- LD/ST -----------> >> +;;    shifter + LU      <-- AU --> >                                     ^ Trailing whitespace. > OK. >> +;;      <-- AU -->     shifter + LU    CPSR     (Pipe 0) >> +;;______________________________________________________ >> +;; >> +;;      <---------- MUL ---------> >> +;;    shifter + LU      <-- AU --> >                                     ^ Trailing whitespace. > OK. > > >From fa726te.md > >> +(define_bypass 1 "726te_alu_shift_op,726te_alu_shift_reg_op,726te_mult_op" >                                                                             ^ > Trailing whitespace. > >> +                 "726te_alu_shift_op" "arm_no_early_alu_shift_dep") >> +(define_bypass 1 "726te_alu_shift_op,726te_alu_shift_reg_op,726te_mult_op" >                                                                             ^ > Trailing whitespace. > OK. > >> +;; The latency for a call is actually the latency when the result is available. >> +;; i.e. R0 is ready for int return value. > > Likewise. > OK. > >> >> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md >> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md  1970-01-01 08:00:00.000000000 +0800 >> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/fmp626.md      2010-12-02 09:18:17.687514000 +0800 > > >> +;; Pipeline architecture >> +;;   S       E       M       W(Q1)   Q2 >> +;;   ___________________________________________ >> +;;    shifter alu >                    ^^^ Multiple trailing whitespaces. > OK. > > >> diff -uNr tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h >> --- tmp/gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h    2010-12-01 18:48:48.000000000 +0800 >> +++ gcc-4.6-svn-167325-20101201/gcc/config/arm/bpabi.h        2010-12-02 09:18:17.660518000 +0800 >> @@ -52,7 +52,7 @@ >>  /* The BPABI integer comparison routines return { -1, 0, 1 }.  */ >>  #define TARGET_LIB_INT_CMP_BIASED !TARGET_BPABI >> >> -#define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4:--fix-v4bx}" >> +#define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*|march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}" >> > > Exceeds the 80 character per line limit.  Can you split this across multiple lines ? > > Something like > #define TARGET_FIX_V4BX_SPEC " %{mcpu=arm8|mcpu=arm810|mcpu=strongarm*\ > |march=armv4|mcpu=fa526|mcpu=fa626:--fix-v4bx}" > > should do the trick. > OK. Split it into two lines. > However this isn't an approval since I can't approve or reject your patch. > > > cheers > Ramana > > > > On Thu, 2010-12-02 at 16:27 +0800, M.F. Wu wrote: >> Dear Ramana, >> >> Thank you for your comments about the patch. >> The patch has been modified as the attached >> file shows. >> >> The Changlog: >> >> 2010-12-02  Sanjin Liu   >>         Mingfeng Wu   >> >>       * config/arm/arm-cores.def: Add Faraday CPU support - >>       fa526/fa626/fa606te/fa626te/fmp626/fa726te. >>       * config/arm/arm-tune.md: Regenerate. >>       * config/arm/arm.c (arm_fa726te_tune): New tune_params for fa726te >>       (fa726te_sched_adjust_cost): New cost function for fa726te. >>       (arm_issue_rate): Add fa726te. >>       * config/arm/arm.md (generic_sched): Add Faraday cores to generic_sched >>       and include machine description files. >>       * config/arm/bpabi.h (TARGET_FIX_V4BX_SPEC): Add fa526 and fa626. >>       * config/arm/t-arm (MD_INCLUDES): Include machine description files for >>       Faraday cores. >>       * config/arm/t-arm-elf: Add multilib option for Faraday cores. >>       * config/arm/t-linux-eabi: Add multilib option for Faraday cores except >>       fa526 and fa626. >>       * doc/invoke.texi: Document -mcpu for Faraday cores. >>       * config/arm/fa526.md: New file. >>       * config/arm/fa626.md: New file. >>       * config/arm/fa606te.md: New file. >>       * config/arm/fa626te.md: New file. >>       * config/arm/fmp626.md: New file. >>       * config/arm/fa726te.md: New file. >> >> >> >> 2010/11/30 Ramana Radhakrishnan : >> > Hi Mingfeng, >> > >> > Thanks for making these changes. >> > >> > Please do not make the Changelog a part of the final patch. Please make >> > this a part of your final mail submission and not a part of your patch. >> > >> > >> >> @@ -7913,6 +7921,36 @@ >> > >> > <...> >> > >> >> +      /* Use of carry (e.g. 64-bit arithmetic) in ALU: 3-cycle latency */ >> > >> > Full stop at the end of comment followed by 2 spaces before end of comment. >> > Can you please audit your patch to check for these issues ? >> > >> > This is true for comments in the machine description parts of your patch as well. >> > >> >> Fixed. >> >> > >> >> +      if (get_attr_conds(insn)  == CONDS_USE && >> >> +          get_attr_type(insn) != TYPE_BRANCH) >> >> +        { >> >> +          *cost = 3; >> >> +          return false; >> >> +        } >> > >> > Space between function name and paranthesis. Thus it should be >> > get_attr_conds (insn) and not get_attr_conds(insn) as above. >> > >> >> + >> >> +      if (GET_CODE (PATTERN (insn)) == COND_EXEC >> >> +          || get_attr_conds(insn)  == CONDS_USE) >> >> +        { >> >> +          *cost = 0; >> >> +          return false; >> >> +        } >> >> +    } >> > >> > Likewise. >> > >> >> Fixed. >> >> > >> >> diff -uNr tmp/gcc-4.6-svn-20101116/gcc/config/arm/fa526.md >> >> >> gcc-4.6-svn-20101116/gcc/config/arm/fa526.md >> >> >> > --- tmp/gcc-4.6-svn-20101116/gcc/config/arm/fa526.md  1970-01-01 >> >> >> 08:00:00.000000000 +0800 >> >> >> > +++ gcc-4.6-svn-20101116/gcc/config/arm/fa526.md      2010-11-23 >> >> >> 14:36:17.916371000 +0800 >> >> >> >> >> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> >> >> > +;; Branch and Call Instructions >> >> >> > >> >> >> +;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; >> >> >> > + >> > >> >> > And equivalently in all the other pipeline descriptions. >> >> > >> >> >> >> Sorry, I don't understand exactly what you mean... >> > >> > >> >> +;; Branch instructions are difficult to model accurately.  The ARM >> >> +;; core can predict most branches.  If the branch is predicted >> >> >> > In the sentence above - >> > Replace ARM with FA526 and similarly in the other pipeline descriptions. >> > >> > The FA526 core isn't one made by ARM and am not sure if you can use the name >> > in that regard here :) >> > >> >> OK, you are right. I have modified the "ARM" to our core. >> >> >> > ; >> >> > >> >> > +const struct tune_params arm_fa726te_tune = >> >> > +{ >> >> > +  arm_9e_rtx_costs, >> >> > +  fa726te_sched_adjust_cost, >> >> > +  1 >> >> > +}; >> >> > + >> > >> > This part of your patch is now out-of-date thanks to Ian Bolton's latest commits in that area with respect >> > to preloads. You might want to consider that in your final submission. I suppose using the defaults >> > and turning off preloads at O3 would be the correct thing to do to get your patch sheperded through. >> > >> >> OK. I modified the arm_fa726te_tune. >> >> const struct tune_params arm_fa726te_tune = >> { >>   arm_9e_rtx_costs, >>   fa726te_sched_adjust_cost, >>   1, >>   ARM_PREFETCH_NOT_BENEFICIAL >> }; >> >> >> >> >> >> --- tmp/gcc-4.6-svn-20101116/gcc/config/arm/fa726te.md        1970-01-01 08:00:00.000000000 +0800 >> >> +++ gcc-4.6-svn-20101116/gcc/config/arm/fa726te.md    2010-11-25 17:06:01.877554000 +0800 >> >> @@ -0,0 +1,221 @@ >> >> >> >> +(define_automaton "fa726te") >> >> +(automata_option "ndfa") >> >> + >> > >> > Why do you have an ndfa option here? Does this give you benefit with benchmarking on the FA726te core since this usually increases compile time >> > as the automaton ends up searching for all possible options ? >> > >> >> Yes. the ndfa option does benefit our benchmarking, but a little. So I >> remove the ndfa option here. >> >> > >> >> +;; pretend we have 2 LSUs (the second is ONLY for LDR), which can possibly >> >> +;; improve code quality >> > >> > Full stop at the end of the comment. pretend should start with a capital P and not lower case. (Pretend) >> > 2 spaces between the a full stop or a punctuation character that terminates a sentence and the start of >> > the next sentence. There are a number of places in your patch where one can see such cases. >> > >> >> Fixed. >> >> > >> >> +(define_query_cpu_unit "fa726te_lsu1_pipe_e,fa726te_lsu1_pipe_w" "fa726te") >> > >> > You have a query_cpu_unit which you don't seem to be querying for in the backend in any form? Is there any thing else >> > missing in your pipeline description or has this been put in for future use ? >> > >> >> I only use the units defined by query_cpu_unit in the fa726te.md. The >> two units, fa726te_lsu1_pipe_e and >> fa726te_lsu1_pipe_w, are only used for arrange the load instructions. >> Because fa726te only supports one >> ldr/str pipe, I use the query_cpu_unit to define another pseudo pipe >> for better load instruction scheduling. >> >> >> +;; reservation which blocks IS >> >> +(define_reservation "fa726te_blockage" "(fa726te_is0+fa726te_is1)") >> > >> > Can you clarify the comment above ? Again the comments about sentence case and full stops hold. >> > >> >> It is used to restrict the instruction issue to one. >> >> > >> > >> > cheers >> > Ramana >> > >> > >> > >> > >> > >> > > > >