[PATCH, ARM] Improve GCC pipeline description for Cortex-M4 FPU

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH, ARM] Improve GCC pipeline description for Cortex-M4 FPU
@ 2013-04-16 10:31 Terry Guo
  2013-04-16 11:55 ` Richard Earnshaw
  0 siblings, 1 reply; 2+ messages in thread
From: Terry Guo @ 2013-04-16 10:31 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Earnshaw, Ramana Radhakrishnan

[-- Attachment #1: Type: text/plain, Size: 1031 bytes --]

Hi,

This patch intends to improve cortex-m4 FPU pipeline description based on
below findings:

1) The integer instructions can be pipelined with fused/chained mac
instructions.
2) The two-cycle 32-bit floating point load instructions should be put
together to save one cycle. The three-cycle 64-bit fp load instructions
haven't such feature.
3) The 32-bit floating point store instructions need 1 cycle, not 2 cycles.

I use some f32 functions from CMSIS DSPLib to benchmark this patch. All of
them show performance improvement i.e. less cycles are needed to perform
those functions.

Is it OK for trunk?

BR,
Terry

2013-04-16  Terry Guo  <terry.guo@arm.com>

        * config/arm/cortex-m4-fpu.md (cortex_m4_v): Delete cpu unit.
Replace with ...
        (cortex_m4_v_a,  cortex_m4_v_b): ... new cpu units.
        (cortex_m4_v, cortex_m4_exa_va, cortex_m4_exb_vb): New reservations.
        (cortex_m4_fmacs): Use new reservations.
        (cortex_m4_f_load, cortex_m4_f_store): Likewise.
        

[-- Attachment #2: gcc-m4-fpu-pipeline-v1.txt --]
[-- Type: text/plain, Size: 1905 bytes --]

diff --git a/gcc/config/arm/cortex-m4-fpu.md b/gcc/config/arm/cortex-m4-fpu.md
index a1945be..4ce3f10 100644
--- a/gcc/config/arm/cortex-m4-fpu.md
+++ b/gcc/config/arm/cortex-m4-fpu.md
@@ -18,10 +18,14 @@
 ;; along with GCC; see the file COPYING3.  If not see
 ;; <http://www.gnu.org/licenses/>.
 
-;; Use an artifial unit to model FPU.
-(define_cpu_unit "cortex_m4_v" "cortex_m4")
+;; Use two artificial units to model FPU.
+(define_cpu_unit "cortex_m4_v_a" "cortex_m4")
+(define_cpu_unit "cortex_m4_v_b" "cortex_m4")
 
+(define_reservation "cortex_m4_v" "cortex_m4_v_a+cortex_m4_v_b")
 (define_reservation "cortex_m4_ex_v" "cortex_m4_ex+cortex_m4_v")
+(define_reservation "cortex_m4_exa_va" "cortex_m4_a+cortex_m4_v_a")
+(define_reservation "cortex_m4_exb_vb" "cortex_m4_b+cortex_m4_v_b")
 
 ;; Integer instructions following VDIV or VSQRT complete out-of-order.
 (define_insn_reservation "cortex_m4_fdivs" 15
@@ -44,10 +48,12 @@
        (eq_attr "type" "fmuls"))
   "cortex_m4_ex_v")
 
+;; Integer instructions following multiply-accumulate instructions
+;; complete out-of-order.
 (define_insn_reservation "cortex_m4_fmacs" 4
   (and (eq_attr "tune" "cortexm4")
        (eq_attr "type" "fmacs,ffmas"))
-  "cortex_m4_ex_v*3")
+  "cortex_m4_ex_v,cortex_m4_v*2")
 
 (define_insn_reservation "cortex_m4_ffariths" 1
   (and (eq_attr "tune" "cortexm4")
@@ -77,12 +83,12 @@
 (define_insn_reservation "cortex_m4_f_load" 2
   (and (eq_attr "tune" "cortexm4")
        (eq_attr "type" "f_loads"))
-  "cortex_m4_ex_v*2")
+  "cortex_m4_exa_va,cortex_m4_exb_vb")
 
-(define_insn_reservation "cortex_m4_f_store" 2
+(define_insn_reservation "cortex_m4_f_store" 1
   (and (eq_attr "tune" "cortexm4")
        (eq_attr "type" "f_stores"))
-  "cortex_m4_ex_v*2")
+  "cortex_m4_exa_va")
 
 (define_insn_reservation "cortex_m4_f_loadd" 3
   (and (eq_attr "tune" "cortexm4")

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH, ARM] Improve GCC pipeline description for Cortex-M4 FPU
  2013-04-16 10:31 [PATCH, ARM] Improve GCC pipeline description for Cortex-M4 FPU Terry Guo
@ 2013-04-16 11:55 ` Richard Earnshaw
  0 siblings, 0 replies; 2+ messages in thread
From: Richard Earnshaw @ 2013-04-16 11:55 UTC (permalink / raw)
  To: Terry Guo; +Cc: gcc-patches, Ramana Radhakrishnan

On 16/04/13 10:47, Terry Guo wrote:
> Hi,
>
> This patch intends to improve cortex-m4 FPU pipeline description based on
> below findings:
>
> 1) The integer instructions can be pipelined with fused/chained mac
> instructions.
> 2) The two-cycle 32-bit floating point load instructions should be put
> together to save one cycle. The three-cycle 64-bit fp load instructions
> haven't such feature.
> 3) The 32-bit floating point store instructions need 1 cycle, not 2 cycles.
>
> I use some f32 functions from CMSIS DSPLib to benchmark this patch. All of
> them show performance improvement i.e. less cycles are needed to perform
> those functions.
>
> Is it OK for trunk?
>
> BR,
> Terry
>
> 2013-04-16  Terry Guo  <terry.guo@arm.com>
>
>          * config/arm/cortex-m4-fpu.md (cortex_m4_v): Delete cpu unit.
> Replace with ...
>          (cortex_m4_v_a,  cortex_m4_v_b): ... new cpu units.
>          (cortex_m4_v, cortex_m4_exa_va, cortex_m4_exb_vb): New reservations.
>          (cortex_m4_fmacs): Use new reservations.
>          (cortex_m4_f_load, cortex_m4_f_store): Likewise.
>
>

OK.

R.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-04-16  9:55 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-04-16 10:31 [PATCH, ARM] Improve GCC pipeline description for Cortex-M4 FPU Terry Guo
2013-04-16 11:55 ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).