[ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
@ 2012-11-29 17:16 Christophe Lyon
  2012-11-29 21:12 ` Joseph S. Myers
  2012-12-17 15:12 ` Richard Earnshaw
  0 siblings, 2 replies; 12+ messages in thread
From: Christophe Lyon @ 2012-11-29 17:16 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2416 bytes --]

Hi,

I have been working on a patch to avoid using Neon for 64 bits bitops
when it's too expensive to move data between core and Neon registers.

Benchmarking and validation look OK on the 4.7 branch (compiler
configured for thumb and hard FP)
- validation on cortex-a9 board OK
- bencharking shows 10.5% improvement on spec2k's crafty bench. On
other benches we are between -0.5% and +0.5%.

On trunk I have noticed a regression in gfortran when using modulo
scheduling: sms-1.f90 now fails, but I suspect it's not because of
this patch since forcing compilation for armv5t makes the same test
fail with and without my patch.

Specifically, I have observed that the loop:
    862e:    3b01          subs    r3, #1
    8630:    ef70 08a1     vadd.i64    d16, d16, d17
    8634:    ec51 0b30     vmov    r0, r1, d16
    8638:    e9e2 0102     strd    r0, r1, [r2, #8]!
    863c:    d1f7          bne.n    862e <main+0x3e>

in transformed into:
    862e:    3901          subs    r1, #1
    8630:    1912          adds    r2, r2, r4
    8632:    eb43 0305     adc.w    r3, r3, r5
    8636:    e9e0 2302     strd    r2, r3, [r0, #8]!
    863a:    d1f8          bne.n    862e <main+0x3e>
with my patch.

This is wrong because adds/adc clobber the flags used to control the loop.

The patch is:
2012-11-28  Christophe Lyon  <christophe.lyon@linaro.org>

    gcc/
    * config/arm/arm-protos.h (tune_params): Add
    prefer_neon_for_64bits field.
    * config/arm/arm.c (prefer_neon_for_64bits): New variable.
    (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
    (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
    (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
    (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
    (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
    (arm_option_override): Handle -mneon-for-64bits new option.
    * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
    (prefer_neon_for_64bits): Declare new variable.
    * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
    avoid_neon_for_64bits and neon_for_64bits.
    (arch_enabled): Handle new arch types.
    (one_cmpldi2): Use new arch names.
    * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
    (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
    neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
    of onlya8.

Is it OK for trunk?

[-- Attachment #2: turn-off-64bits-neon.txt --]
[-- Type: text/plain, Size: 11098 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d942c5b..c92f055 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -247,6 +247,8 @@ struct tune_params
      performance. The first element covers Thumb state and the second one
      is for ARM state.  */
   bool logical_op_non_short_circuit[2];
+  /* Prefer Neon for 64-bit bitops.  */
+  bool prefer_neon_for_64bits;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 286a6c5..9efd215 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -816,6 +816,10 @@ int arm_arch_thumb2;
 int arm_arch_arm_hwdiv;
 int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+int prefer_neon_for_64bits = 0;
+
 /* In case of a PRE_INC, POST_INC, PRE_DEC, POST_DEC memory reference,
    we must report the mode of the memory reference from
    TARGET_PRINT_OPERAND to TARGET_PRINT_OPERAND_ADDRESS.  */
@@ -895,6 +899,7 @@ const struct tune_params arm_slowmul_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -908,6 +913,7 @@ const struct tune_params arm_fastmul_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -924,6 +930,7 @@ const struct tune_params arm_strongarm_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -937,6 +944,7 @@ const struct tune_params arm_xscale_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_9e_tune =
@@ -950,6 +958,7 @@ const struct tune_params arm_9e_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_v6t2_tune =
@@ -963,6 +972,7 @@ const struct tune_params arm_v6t2_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* Generic Cortex tuning.  Use more specific tunings if appropriate.  */
@@ -977,6 +987,7 @@ const struct tune_params arm_cortex_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_cortex_a15_tune =
@@ -990,6 +1001,7 @@ const struct tune_params arm_cortex_a15_tune =
   arm_default_branch_cost,
   true,						/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* Branches can be dual-issued on Cortex-A5, so conditional execution is
@@ -1006,6 +1018,7 @@ const struct tune_params arm_cortex_a5_tune =
   arm_cortex_a5_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {false, false},				/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_cortex_a9_tune =
@@ -1019,6 +1032,7 @@ const struct tune_params arm_cortex_a9_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
@@ -1034,6 +1048,7 @@ const struct tune_params arm_v6m_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {false, false},				/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -1047,6 +1062,7 @@ const struct tune_params arm_fa726te_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 
@@ -2077,6 +2093,12 @@ arm_option_override (void)
                            global_options.x_param_values,
                            global_options_set.x_param_values);
 
+  /* Use Neon to perform 64-bits operations rather than core
+     registers.  */
+  prefer_neon_for_64bits = current_tune->prefer_neon_for_64bits;
+  if (use_neon_for_64bits == 1)
+     prefer_neon_for_64bits = true;
+
   /* Use the alternative scheduling-pressure algorithm by default.  */
   maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2,
                          global_options.x_param_values,
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f520cc7..c71d85f 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -356,6 +356,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_IDIV		((TARGET_ARM && arm_arch_arm_hwdiv) \
 				 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
 
+/* Should NEON be used for 64-bits bitops.  */
+#define TARGET_PREFER_NEON_64BITS (prefer_neon_for_64bits)
+
 /* True iff the full BPABI is being used.  If TARGET_BPABI is true,
    then TARGET_AAPCS_BASED must be true -- but the converse does not
    hold.  TARGET_BPABI implies the use of the BPABI runtime library,
@@ -541,6 +544,10 @@ extern int arm_arch_arm_hwdiv;
 /* Nonzero if chip supports integer division instruction in Thumb mode.  */
 extern int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+extern int prefer_neon_for_64bits;
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ac507ef..afde613 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -202,7 +202,7 @@
 ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
 ; arm_arch6.  This attribute is used to compute attribute "enabled",
 ; use type "any" to enable an alternative in all cases.
-(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,neon_onlya8,nota8,neon_nota8,iwmmxt,iwmmxt2"
+(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,nota8,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2"
   (const_string "any"))
 
 (define_attr "arch_enabled" "no,yes"
@@ -241,18 +241,18 @@
 	      (eq_attr "tune" "cortexa8"))
 	 (const_string "yes")
 
-	 (and (eq_attr "arch" "neon_onlya8")
-	      (eq_attr "tune" "cortexa8")
-	      (match_test "TARGET_NEON"))
+	 (and (eq_attr "arch" "avoid_neon_for_64bits")
+	      (match_test "TARGET_NEON")
+	      (not (match_test "TARGET_PREFER_NEON_64BITS")))
 	 (const_string "yes")
 
 	 (and (eq_attr "arch" "nota8")
 	      (not (eq_attr "tune" "cortexa8")))
 	 (const_string "yes")
 
-	 (and (eq_attr "arch" "neon_nota8")
-	      (not (eq_attr "tune" "cortexa8"))
-	      (match_test "TARGET_NEON"))
+	 (and (eq_attr "arch" "neon_for_64bits")
+	      (match_test "TARGET_NEON")
+	      (match_test "TARGET_PREFER_NEON_64BITS"))
 	 (const_string "yes")
 
 	 (and (eq_attr "arch" "iwmmxt2")
@@ -4370,7 +4370,7 @@
   [(set_attr "length" "*,8,8,*")
    (set_attr "predicable" "no,yes,yes,no")
    (set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
-   (set_attr "arch" "neon_nota8,*,*,neon_onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")]
 )
 
 (define_expand "one_cmplsi2"
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index fb12c55..83b6002 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -251,3 +251,7 @@ that may trigger Cortex-M3 errata.
 munaligned-access
 Target Report Var(unaligned_access) Init(2)
 Enable unaligned word and halfword accesses to packed data.
+
+mneon-for-64bits
+Target Report RejectNegative Var(use_neon_for_64bits) Init(0)
+Use Neon to perform 64-bits operations rather than core registers.
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 2103580..8b0e877 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -617,7 +617,7 @@
   [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1,*,*,*")
    (set_attr "conds" "*,clob,clob,*,clob,clob,clob")
    (set_attr "length" "*,8,8,*,8,8,8")
-   (set_attr "arch" "nota8,*,*,onlya8,*,*,*")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits,*,*,*")]
 )
 
 (define_insn "*sub<mode>3_neon"
@@ -654,7 +654,7 @@
   [(set_attr "neon_type" "neon_int_2,*,*,*,neon_int_2")
    (set_attr "conds" "*,clob,clob,clob,*")
    (set_attr "length" "*,8,8,8,*")
-   (set_attr "arch" "nota8,*,*,*,onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,*,avoid_neon_for_64bits")]
 )
 
 (define_insn "*mul<mode>3_neon"
@@ -816,7 +816,7 @@
 }
   [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
    (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+   (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")]
 )
 
 ;; The concrete forms of the Neon immediate-logic instructions are vbic and
@@ -861,7 +861,7 @@
 }
   [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
    (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+   (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")]
 )
 
 (define_insn "orn<mode>3_neon"
@@ -957,7 +957,7 @@
    veor\t%P0, %P1, %P2"
   [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
    (set_attr "length" "*,8,8,*")
-   (set_attr "arch" "nota8,*,*,onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")]
 )
 
 (define_insn "one_cmpl<mode>2"
@@ -1279,7 +1279,7 @@
       }
     DONE;
   }"
-  [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+  [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")
    (set_attr "opt" "*,*,speed,speed,*,*")]
 )
 
@@ -1380,7 +1380,7 @@
 
     DONE;
   }"
-  [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+  [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")
    (set_attr "opt" "*,*,speed,speed,*,*")]
 )
 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2012-11-29 17:16 [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem Christophe Lyon
@ 2012-11-29 21:12 ` Joseph S. Myers
  2012-11-30 16:38   ` Christophe Lyon
  2012-12-17 15:12 ` Richard Earnshaw
  1 sibling, 1 reply; 12+ messages in thread
From: Joseph S. Myers @ 2012-11-29 21:12 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

On Thu, 29 Nov 2012, Christophe Lyon wrote:

> 2012-11-28  Christophe Lyon  <christophe.lyon@linaro.org>
> 
>     gcc/
>     * config/arm/arm-protos.h (tune_params): Add
>     prefer_neon_for_64bits field.
>     * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>     (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>     (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>     (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>     (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>     (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>     (arm_option_override): Handle -mneon-for-64bits new option.
>     * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>     (prefer_neon_for_64bits): Declare new variable.
>     * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>     avoid_neon_for_64bits and neon_for_64bits.
>     (arch_enabled): Handle new arch types.
>     (one_cmpldi2): Use new arch names.
>     * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>     (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>     neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>     of onlya8.

This ChangeLog entry doesn't appear to mention the arm.opt change.  
Furthermore, the patch seems to be missing any .texi change to document 
the option; any new option needs documentation.  You are also missing 
testcases for the testsuite to verify that both enabled and disabled 
states of the option work properly.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2012-11-29 21:12 ` Joseph S. Myers
@ 2012-11-30 16:38   ` Christophe Lyon
  2012-12-07  8:35     ` Christophe Lyon
  0 siblings, 1 reply; 12+ messages in thread
From: Christophe Lyon @ 2012-11-30 16:38 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3299 bytes --]

On 29 November 2012 21:59, Joseph S. Myers <joseph@codesourcery.com> wrote:
> On Thu, 29 Nov 2012, Christophe Lyon wrote:
>
>> 2012-11-28  Christophe Lyon  <christophe.lyon@linaro.org>
>>
>>     gcc/
>>     * config/arm/arm-protos.h (tune_params): Add
>>     prefer_neon_for_64bits field.
>>     * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>>     (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>>     (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>>     (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>>     (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>>     (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>>     (arm_option_override): Handle -mneon-for-64bits new option.
>>     * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>>     (prefer_neon_for_64bits): Declare new variable.
>>     * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>>     avoid_neon_for_64bits and neon_for_64bits.
>>     (arch_enabled): Handle new arch types.
>>     (one_cmpldi2): Use new arch names.
>>     * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>>     (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>>     neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>>     of onlya8.
>
> This ChangeLog entry doesn't appear to mention the arm.opt change.
> Furthermore, the patch seems to be missing any .texi change to document
> the option; any new option needs documentation.  You are also missing
> testcases for the testsuite to verify that both enabled and disabled
> states of the option work properly.
>
Indeed, I forgot about the documentation; here is an updated patch.

Regarding the testcases, as this patch disables transformations
recently introduced, I would have appreciated if testcases had been
associated with them in the 1st place.... This requirement should be
enforced :-)

Tested with qemu on target arm-none-linux-gnueabi.


2012-11-30  Christophe Lyon  <christophe.lyon@linaro.org>

    gcc/
    * config/arm/arm-protos.h (tune_params): Add
    prefer_neon_for_64bits field.
    * config/arm/arm.c (prefer_neon_for_64bits): New variable.
    (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
    (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
    (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
    (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
    (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
    (arm_option_override): Handle -mneon-for-64bits new option.
    * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
    (prefer_neon_for_64bits): Declare new variable.
    * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
    avoid_neon_for_64bits and neon_for_64bits.
    (arch_enabled): Handle new arch types.
    (one_cmpldi2): Use new arch names.
    * config/arm/arm.opt (mneon-for-64bits): Add option.
    * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
    (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
    neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
    of onlya8.
    * doc/invoke.texi (-mneon-for-64bits): Document.

    gcc/testsuite/
    * gcc.target/arm/neon-for-64bits-1.c: New tests.
    * gcc.target/arm/neon-for-64bits-2.c: Likewise.

[-- Attachment #2: turn-off-64bits-neon.txt --]
[-- Type: text/plain, Size: 15742 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d942c5b..c92f055 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -247,6 +247,8 @@ struct tune_params
      performance. The first element covers Thumb state and the second one
      is for ARM state.  */
   bool logical_op_non_short_circuit[2];
+  /* Prefer Neon for 64-bit bitops.  */
+  bool prefer_neon_for_64bits;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 286a6c5..9efd215 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -816,6 +816,10 @@ int arm_arch_thumb2;
 int arm_arch_arm_hwdiv;
 int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+int prefer_neon_for_64bits = 0;
+
 /* In case of a PRE_INC, POST_INC, PRE_DEC, POST_DEC memory reference,
    we must report the mode of the memory reference from
    TARGET_PRINT_OPERAND to TARGET_PRINT_OPERAND_ADDRESS.  */
@@ -895,6 +899,7 @@ const struct tune_params arm_slowmul_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -908,6 +913,7 @@ const struct tune_params arm_fastmul_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -924,6 +930,7 @@ const struct tune_params arm_strongarm_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -937,6 +944,7 @@ const struct tune_params arm_xscale_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_9e_tune =
@@ -950,6 +958,7 @@ const struct tune_params arm_9e_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_v6t2_tune =
@@ -963,6 +972,7 @@ const struct tune_params arm_v6t2_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* Generic Cortex tuning.  Use more specific tunings if appropriate.  */
@@ -977,6 +987,7 @@ const struct tune_params arm_cortex_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_cortex_a15_tune =
@@ -990,6 +1001,7 @@ const struct tune_params arm_cortex_a15_tune =
   arm_default_branch_cost,
   true,						/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* Branches can be dual-issued on Cortex-A5, so conditional execution is
@@ -1006,6 +1018,7 @@ const struct tune_params arm_cortex_a5_tune =
   arm_cortex_a5_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {false, false},				/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_cortex_a9_tune =
@@ -1019,6 +1032,7 @@ const struct tune_params arm_cortex_a9_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
@@ -1034,6 +1048,7 @@ const struct tune_params arm_v6m_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {false, false},				/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -1047,6 +1062,7 @@ const struct tune_params arm_fa726te_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 
@@ -2077,6 +2093,12 @@ arm_option_override (void)
                            global_options.x_param_values,
                            global_options_set.x_param_values);
 
+  /* Use Neon to perform 64-bits operations rather than core
+     registers.  */
+  prefer_neon_for_64bits = current_tune->prefer_neon_for_64bits;
+  if (use_neon_for_64bits == 1)
+     prefer_neon_for_64bits = true;
+
   /* Use the alternative scheduling-pressure algorithm by default.  */
   maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2,
                          global_options.x_param_values,
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index f520cc7..c71d85f 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -356,6 +356,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_IDIV		((TARGET_ARM && arm_arch_arm_hwdiv) \
 				 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
 
+/* Should NEON be used for 64-bits bitops.  */
+#define TARGET_PREFER_NEON_64BITS (prefer_neon_for_64bits)
+
 /* True iff the full BPABI is being used.  If TARGET_BPABI is true,
    then TARGET_AAPCS_BASED must be true -- but the converse does not
    hold.  TARGET_BPABI implies the use of the BPABI runtime library,
@@ -541,6 +544,10 @@ extern int arm_arch_arm_hwdiv;
 /* Nonzero if chip supports integer division instruction in Thumb mode.  */
 extern int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+extern int prefer_neon_for_64bits;
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ac507ef..afde613 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -202,7 +202,7 @@
 ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
 ; arm_arch6.  This attribute is used to compute attribute "enabled",
 ; use type "any" to enable an alternative in all cases.
-(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,neon_onlya8,nota8,neon_nota8,iwmmxt,iwmmxt2"
+(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,nota8,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2"
   (const_string "any"))
 
 (define_attr "arch_enabled" "no,yes"
@@ -241,18 +241,18 @@
 	      (eq_attr "tune" "cortexa8"))
 	 (const_string "yes")
 
-	 (and (eq_attr "arch" "neon_onlya8")
-	      (eq_attr "tune" "cortexa8")
-	      (match_test "TARGET_NEON"))
+	 (and (eq_attr "arch" "avoid_neon_for_64bits")
+	      (match_test "TARGET_NEON")
+	      (not (match_test "TARGET_PREFER_NEON_64BITS")))
 	 (const_string "yes")
 
 	 (and (eq_attr "arch" "nota8")
 	      (not (eq_attr "tune" "cortexa8")))
 	 (const_string "yes")
 
-	 (and (eq_attr "arch" "neon_nota8")
-	      (not (eq_attr "tune" "cortexa8"))
-	      (match_test "TARGET_NEON"))
+	 (and (eq_attr "arch" "neon_for_64bits")
+	      (match_test "TARGET_NEON")
+	      (match_test "TARGET_PREFER_NEON_64BITS"))
 	 (const_string "yes")
 
 	 (and (eq_attr "arch" "iwmmxt2")
@@ -4370,7 +4370,7 @@
   [(set_attr "length" "*,8,8,*")
    (set_attr "predicable" "no,yes,yes,no")
    (set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
-   (set_attr "arch" "neon_nota8,*,*,neon_onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")]
 )
 
 (define_expand "one_cmplsi2"
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index fb12c55..83b6002 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -251,3 +251,7 @@ that may trigger Cortex-M3 errata.
 munaligned-access
 Target Report Var(unaligned_access) Init(2)
 Enable unaligned word and halfword accesses to packed data.
+
+mneon-for-64bits
+Target Report RejectNegative Var(use_neon_for_64bits) Init(0)
+Use Neon to perform 64-bits operations rather than core registers.
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 2103580..8b0e877 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -617,7 +617,7 @@
   [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1,*,*,*")
    (set_attr "conds" "*,clob,clob,*,clob,clob,clob")
    (set_attr "length" "*,8,8,*,8,8,8")
-   (set_attr "arch" "nota8,*,*,onlya8,*,*,*")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits,*,*,*")]
 )
 
 (define_insn "*sub<mode>3_neon"
@@ -654,7 +654,7 @@
   [(set_attr "neon_type" "neon_int_2,*,*,*,neon_int_2")
    (set_attr "conds" "*,clob,clob,clob,*")
    (set_attr "length" "*,8,8,8,*")
-   (set_attr "arch" "nota8,*,*,*,onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,*,avoid_neon_for_64bits")]
 )
 
 (define_insn "*mul<mode>3_neon"
@@ -816,7 +816,7 @@
 }
   [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
    (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+   (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")]
 )
 
 ;; The concrete forms of the Neon immediate-logic instructions are vbic and
@@ -861,7 +861,7 @@
 }
   [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
    (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+   (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")]
 )
 
 (define_insn "orn<mode>3_neon"
@@ -957,7 +957,7 @@
    veor\t%P0, %P1, %P2"
   [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
    (set_attr "length" "*,8,8,*")
-   (set_attr "arch" "nota8,*,*,onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")]
 )
 
 (define_insn "one_cmpl<mode>2"
@@ -1279,7 +1279,7 @@
       }
     DONE;
   }"
-  [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+  [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")
    (set_attr "opt" "*,*,speed,speed,*,*")]
 )
 
@@ -1380,7 +1380,7 @@
 
     DONE;
   }"
-  [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+  [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")
    (set_attr "opt" "*,*,speed,speed,*,*")]
 )
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 51b6e85..3918b1d 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -514,7 +514,8 @@ Objective-C and Objective-C++ Dialects}.
 -mtp=@var{name} -mtls-dialect=@var{dialect} @gol
 -mword-relocations @gol
 -mfix-cortex-m3-ldrd @gol
--munaligned-access}
+-munaligned-access @gol
+-mneon-for-64bits}
 
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol
@@ -11521,6 +11522,11 @@ setting of this option.  If unaligned access is enabled then the
 preprocessor symbol @code{__ARM_FEATURE_UNALIGNED} will also be
 defined.
 
+@item -mneon-for-64bits
+@opindex mneon-for-64bits
+Enables using Neon to handle scalar 64-bits operations. This is
+disabled by default since the cost of moving data from core registers
+to Neon is high.
 @end table
 
 @node AVR Options
diff --git a/gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c b/gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c
new file mode 100644
index 0000000..a2a4103
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c
@@ -0,0 +1,54 @@
+/* Check that Neon is *not* used by default to handle 64-bits scalar
+   operations.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+
+typedef long long i64;
+typedef unsigned long long u64;
+typedef unsigned int u32;
+typedef int i32;
+
+/* Unary operators */
+#define UNARY_OP(name, op) \
+  void unary_##name(u64 *a, u64 *b) { *a = op (*b + 0x1234567812345678ULL) ; }
+
+/* Binary operators */
+#define BINARY_OP(name, op) \
+  void binary_##name(u64 *a, u64 *b, u64 *c) { *a = *b op *c ; }
+
+/* Unsigned shift */
+#define SHIFT_U(name, op, amount) \
+  void ushift_##name(u64 *a, u64 *b, int c) { *a = *b op amount; }
+
+/* Signed shift */
+#define SHIFT_S(name, op, amount) \
+  void sshift_##name(i64 *a, i64 *b, int c) { *a = *b op amount; }
+
+UNARY_OP(not, ~)
+
+BINARY_OP(add, +)
+BINARY_OP(sub, -)
+BINARY_OP(and, &)
+BINARY_OP(or, |)
+BINARY_OP(xor, ^)
+
+SHIFT_U(right1, >>, 1)
+SHIFT_U(right2, >>, 2)
+SHIFT_U(right5, >>, 5)
+SHIFT_U(rightn, >>, c)
+
+SHIFT_S(right1, >>, 1)
+SHIFT_S(right2, >>, 2)
+SHIFT_S(right5, >>, 5)
+SHIFT_S(rightn, >>, c)
+
+/* { dg-final {scan-assembler-times "vmvn" 0} }  */
+/* { dg-final {scan-assembler-times "vadd" 0} }  */
+/* { dg-final {scan-assembler-times "vsub" 0} }  */
+/* { dg-final {scan-assembler-times "vand" 0} }  */
+/* { dg-final {scan-assembler-times "vorr" 0} }  */
+/* { dg-final {scan-assembler-times "veor" 0} }  */
+/* { dg-final {scan-assembler-times "vshr" 0} }  */
diff --git a/gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c b/gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c
new file mode 100644
index 0000000..035bfb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c
@@ -0,0 +1,57 @@
+/* Check that Neon is used to handle 64-bits scalar operations.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mneon-for-64bits" } */
+/* { dg-add-options arm_neon } */
+
+typedef long long i64;
+typedef unsigned long long u64;
+typedef unsigned int u32;
+typedef int i32;
+
+/* Unary operators */
+#define UNARY_OP(name, op) \
+  void unary_##name(u64 *a, u64 *b) { *a = op (*b + 0x1234567812345678ULL) ; }
+
+/* Binary operators */
+#define BINARY_OP(name, op) \
+  void binary_##name(u64 *a, u64 *b, u64 *c) { *a = *b op *c ; }
+
+/* Unsigned shift */
+#define SHIFT_U(name, op, amount) \
+  void ushift_##name(u64 *a, u64 *b, int c) { *a = *b op amount; }
+
+/* Signed shift */
+#define SHIFT_S(name, op, amount) \
+  void sshift_##name(i64 *a, i64 *b, int c) { *a = *b op amount; }
+
+UNARY_OP(not, ~)
+
+BINARY_OP(add, +)
+BINARY_OP(sub, -)
+BINARY_OP(and, &)
+BINARY_OP(or, |)
+BINARY_OP(xor, ^)
+
+SHIFT_U(right1, >>, 1)
+SHIFT_U(right2, >>, 2)
+SHIFT_U(right5, >>, 5)
+SHIFT_U(rightn, >>, c)
+
+SHIFT_S(right1, >>, 1)
+SHIFT_S(right2, >>, 2)
+SHIFT_S(right5, >>, 5)
+SHIFT_S(rightn, >>, c)
+
+/* { dg-final {scan-assembler-times "vmvn" 1} }  */
+/* Two vadd: 1 in unary_not, 1 in binary_add */
+/* { dg-final {scan-assembler-times "vadd" 2} }  */
+/* { dg-final {scan-assembler-times "vsub" 1} }  */
+/* { dg-final {scan-assembler-times "vand" 1} }  */
+/* { dg-final {scan-assembler-times "vorr" 1} }  */
+/* { dg-final {scan-assembler-times "veor" 1} }  */
+/* 6 vshr for right shifts by constant, and variable right shift uses
+   vshl with a negative amount in register.  */
+/* { dg-final {scan-assembler-times "vshr" 6} }  */
+/* { dg-final {scan-assembler-times "vshl" 2} }  */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2012-11-30 16:38   ` Christophe Lyon
@ 2012-12-07  8:35     ` Christophe Lyon
  2012-12-14 16:58       ` Christophe Lyon
  0 siblings, 1 reply; 12+ messages in thread
From: Christophe Lyon @ 2012-12-07  8:35 UTC (permalink / raw)
  To: gcc-patches

Ping?
http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02558.html

Thanks,

Christophe.

On 30 November 2012 17:34, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> On 29 November 2012 21:59, Joseph S. Myers <joseph@codesourcery.com> wrote:
>> On Thu, 29 Nov 2012, Christophe Lyon wrote:
>>
>>> 2012-11-28  Christophe Lyon  <christophe.lyon@linaro.org>
>>>
>>>     gcc/
>>>     * config/arm/arm-protos.h (tune_params): Add
>>>     prefer_neon_for_64bits field.
>>>     * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>>>     (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>>>     (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>>>     (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>>>     (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>>>     (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>>>     (arm_option_override): Handle -mneon-for-64bits new option.
>>>     * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>>>     (prefer_neon_for_64bits): Declare new variable.
>>>     * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>>>     avoid_neon_for_64bits and neon_for_64bits.
>>>     (arch_enabled): Handle new arch types.
>>>     (one_cmpldi2): Use new arch names.
>>>     * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>>>     (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>>>     neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>>>     of onlya8.
>>
>> This ChangeLog entry doesn't appear to mention the arm.opt change.
>> Furthermore, the patch seems to be missing any .texi change to document
>> the option; any new option needs documentation.  You are also missing
>> testcases for the testsuite to verify that both enabled and disabled
>> states of the option work properly.
>>
> Indeed, I forgot about the documentation; here is an updated patch.
>
> Regarding the testcases, as this patch disables transformations
> recently introduced, I would have appreciated if testcases had been
> associated with them in the 1st place.... This requirement should be
> enforced :-)
>
> Tested with qemu on target arm-none-linux-gnueabi.
>
>
> 2012-11-30  Christophe Lyon  <christophe.lyon@linaro.org>
>
>     gcc/
>     * config/arm/arm-protos.h (tune_params): Add
>     prefer_neon_for_64bits field.
>     * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>     (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>     (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>     (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>     (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>     (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>     (arm_option_override): Handle -mneon-for-64bits new option.
>     * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>     (prefer_neon_for_64bits): Declare new variable.
>     * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>     avoid_neon_for_64bits and neon_for_64bits.
>     (arch_enabled): Handle new arch types.
>     (one_cmpldi2): Use new arch names.
>     * config/arm/arm.opt (mneon-for-64bits): Add option.
>     * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>     (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>     neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>     of onlya8.
>     * doc/invoke.texi (-mneon-for-64bits): Document.
>
>     gcc/testsuite/
>     * gcc.target/arm/neon-for-64bits-1.c: New tests.
>     * gcc.target/arm/neon-for-64bits-2.c: Likewise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2012-12-07  8:35     ` Christophe Lyon
@ 2012-12-14 16:58       ` Christophe Lyon
  0 siblings, 0 replies; 12+ messages in thread
From: Christophe Lyon @ 2012-12-14 16:58 UTC (permalink / raw)
  To: gcc-patches

Ping^2?

On 7 December 2012 09:34, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> Ping?
> http://gcc.gnu.org/ml/gcc-patches/2012-11/msg02558.html
>
> Thanks,
>
> Christophe.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2012-11-29 17:16 [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem Christophe Lyon
  2012-11-29 21:12 ` Joseph S. Myers
@ 2012-12-17 15:12 ` Richard Earnshaw
  2012-12-19 16:00   ` Christophe Lyon
  1 sibling, 1 reply; 12+ messages in thread
From: Richard Earnshaw @ 2012-12-17 15:12 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: gcc-patches

On 29/11/12 17:16, Christophe Lyon wrote:
> Hi,
>
> I have been working on a patch to avoid using Neon for 64 bits bitops
> when it's too expensive to move data between core and Neon registers.
>
> Benchmarking and validation look OK on the 4.7 branch (compiler
> configured for thumb and hard FP)
> - validation on cortex-a9 board OK
> - bencharking shows 10.5% improvement on spec2k's crafty bench. On
> other benches we are between -0.5% and +0.5%.
>
> On trunk I have noticed a regression in gfortran when using modulo
> scheduling: sms-1.f90 now fails, but I suspect it's not because of
> this patch since forcing compilation for armv5t makes the same test
> fail with and without my patch.
>

Hmm, that's worrying.  Could you please makesure this is recorded in 
bugzilla.  If this is a regression, please mark it as such.


> Specifically, I have observed that the loop:
>      862e:    3b01          subs    r3, #1
>      8630:    ef70 08a1     vadd.i64    d16, d16, d17
>      8634:    ec51 0b30     vmov    r0, r1, d16
>      8638:    e9e2 0102     strd    r0, r1, [r2, #8]!
>      863c:    d1f7          bne.n    862e <main+0x3e>
>
> in transformed into:
>      862e:    3901          subs    r1, #1
>      8630:    1912          adds    r2, r2, r4
>      8632:    eb43 0305     adc.w    r3, r3, r5
>      8636:    e9e0 2302     strd    r2, r3, [r0, #8]!
>      863a:    d1f8          bne.n    862e <main+0x3e>
> with my patch.
>
> This is wrong because adds/adc clobber the flags used to control the loop.
>
> The patch is:
> 2012-11-28  Christophe Lyon  <christophe.lyon@linaro.org>
>
>      gcc/
>      * config/arm/arm-protos.h (tune_params): Add
>      prefer_neon_for_64bits field.
>      * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>      (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>      (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>      (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>      (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>      (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>      (arm_option_override): Handle -mneon-for-64bits new option.
>      * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>      (prefer_neon_for_64bits): Declare new variable.
>      * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>      avoid_neon_for_64bits and neon_for_64bits.
>      (arch_enabled): Handle new arch types.
>      (one_cmpldi2): Use new arch names.
>      * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>      (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>      neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>      of onlya8.
>
> Is it OK for trunk?
>
>

Now that this optimization is disabled by default, the onlya8 code is 
completely redundant and should be purged, along with the insn 
alternatives that used it.

R.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2012-12-17 15:12 ` Richard Earnshaw
@ 2012-12-19 16:00   ` Christophe Lyon
  2013-01-08 16:24     ` Christophe Lyon
  2013-02-01 17:43     ` Ramana Radhakrishnan
  0 siblings, 2 replies; 12+ messages in thread
From: Christophe Lyon @ 2012-12-19 16:00 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2551 bytes --]

On 17 December 2012 16:12, Richard Earnshaw <rearnsha@arm.com> wrote:
> On 29/11/12 17:16, Christophe Lyon wrote:
>> On trunk I have noticed a regression in gfortran when using modulo
>> scheduling: sms-1.f90 now fails, but I suspect it's not because of
>> this patch since forcing compilation for armv5t makes the same test
>> fail with and without my patch.
>>
>
> Hmm, that's worrying.  Could you please makesure this is recorded in
> bugzilla.  If this is a regression, please mark it as such.
>
I was about to do so, but after bisecting it turns out that the
problem was introduced by
http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192969 and is very
likely to be another instance of PR55562, which has just been fixed
by http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01137.html.


>
> Now that this optimization is disabled by default, the onlya8 code is
> completely redundant and should be purged, along with the insn alternatives
> that used it.
>
> R.
>
Here is a new version of my patch, with the cleanup you requested.

2012-12-18  Christophe Lyon  <christophe.lyon@linaro.org>

        gcc/
        * config/arm/arm-protos.h (tune_params): Add
        prefer_neon_for_64bits field.
        * config/arm/arm.c (prefer_neon_for_64bits): New variable.
        (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
        (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
        (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
        (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
        (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
        (arm_option_override): Handle -mneon-for-64bits new option.
        * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
        (prefer_neon_for_64bits): Declare new variable.
        * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
        avoid_neon_for_64bits and neon_for_64bits. Remove onlya8 and
        nota8.
        (arch_enabled): Handle new arch types. Remove support for onlya8
        and nota8.
        (one_cmpldi2): Use new arch names.
        * config/arm/arm.opt (mneon-for-64bits): Add option.
        * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
        (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
        neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
        of onlya8.
        * doc/invoke.texi (-mneon-for-64bits): Document.

        gcc/testsuite/
        * gcc.target/arm/neon-for-64bits-1.c: New tests.
        * gcc.target/arm/neon-for-64bits-2.c: Likewise.

[-- Attachment #2: turn-off-64bits-neon.txt --]
[-- Type: text/plain, Size: 15838 bytes --]

diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index d942c5b..c92f055 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -247,6 +247,8 @@ struct tune_params
      performance. The first element covers Thumb state and the second one
      is for ARM state.  */
   bool logical_op_non_short_circuit[2];
+  /* Prefer Neon for 64-bit bitops.  */
+  bool prefer_neon_for_64bits;
 };
 
 extern const struct tune_params *current_tune;
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 84ce56f..5e99436 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -816,6 +816,10 @@ int arm_arch_thumb2;
 int arm_arch_arm_hwdiv;
 int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+int prefer_neon_for_64bits = 0;
+
 /* In case of a PRE_INC, POST_INC, PRE_DEC, POST_DEC memory reference,
    we must report the mode of the memory reference from
    TARGET_PRINT_OPERAND to TARGET_PRINT_OPERAND_ADDRESS.  */
@@ -895,6 +899,7 @@ const struct tune_params arm_slowmul_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -908,6 +913,7 @@ const struct tune_params arm_fastmul_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -924,6 +930,7 @@ const struct tune_params arm_strongarm_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -937,6 +944,7 @@ const struct tune_params arm_xscale_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_9e_tune =
@@ -950,6 +958,7 @@ const struct tune_params arm_9e_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_v6t2_tune =
@@ -963,6 +972,7 @@ const struct tune_params arm_v6t2_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* Generic Cortex tuning.  Use more specific tunings if appropriate.  */
@@ -977,6 +987,7 @@ const struct tune_params arm_cortex_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_cortex_a15_tune =
@@ -990,6 +1001,7 @@ const struct tune_params arm_cortex_a15_tune =
   arm_default_branch_cost,
   true,						/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* Branches can be dual-issued on Cortex-A5, so conditional execution is
@@ -1006,6 +1018,7 @@ const struct tune_params arm_cortex_a5_tune =
   arm_cortex_a5_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {false, false},				/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_cortex_a9_tune =
@@ -1019,6 +1032,7 @@ const struct tune_params arm_cortex_a9_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
@@ -1034,6 +1048,7 @@ const struct tune_params arm_v6m_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {false, false},				/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -1047,6 +1062,7 @@ const struct tune_params arm_fa726te_tune =
   arm_default_branch_cost,
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 
@@ -2077,6 +2093,12 @@ arm_option_override (void)
                            global_options.x_param_values,
                            global_options_set.x_param_values);
 
+  /* Use Neon to perform 64-bits operations rather than core
+     registers.  */
+  prefer_neon_for_64bits = current_tune->prefer_neon_for_64bits;
+  if (use_neon_for_64bits == 1)
+     prefer_neon_for_64bits = true;
+
   /* Use the alternative scheduling-pressure algorithm by default.  */
   maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2,
                          global_options.x_param_values,
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index d0f351d..bcaa890 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -356,6 +356,9 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_IDIV		((TARGET_ARM && arm_arch_arm_hwdiv) \
 				 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
 
+/* Should NEON be used for 64-bits bitops.  */
+#define TARGET_PREFER_NEON_64BITS (prefer_neon_for_64bits)
+
 /* True iff the full BPABI is being used.  If TARGET_BPABI is true,
    then TARGET_AAPCS_BASED must be true -- but the converse does not
    hold.  TARGET_BPABI implies the use of the BPABI runtime library,
@@ -541,6 +544,10 @@ extern int arm_arch_arm_hwdiv;
 /* Nonzero if chip supports integer division instruction in Thumb mode.  */
 extern int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+extern int prefer_neon_for_64bits;
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 649e901..e24c2b4 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -96,7 +96,7 @@
 ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
 ; arm_arch6.  This attribute is used to compute attribute "enabled",
 ; use type "any" to enable an alternative in all cases.
-(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,neon_onlya8,nota8,neon_nota8,iwmmxt,iwmmxt2"
+(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2"
   (const_string "any"))
 
 (define_attr "arch_enabled" "no,yes"
@@ -131,22 +131,14 @@
 	      (match_test "TARGET_32BIT && !arm_arch6"))
 	 (const_string "yes")
 
-	 (and (eq_attr "arch" "onlya8")
-	      (eq_attr "tune" "cortexa8"))
+	 (and (eq_attr "arch" "avoid_neon_for_64bits")
+	      (match_test "TARGET_NEON")
+	      (not (match_test "TARGET_PREFER_NEON_64BITS")))
 	 (const_string "yes")
 
-	 (and (eq_attr "arch" "neon_onlya8")
-	      (eq_attr "tune" "cortexa8")
-	      (match_test "TARGET_NEON"))
-	 (const_string "yes")
-
-	 (and (eq_attr "arch" "nota8")
-	      (not (eq_attr "tune" "cortexa8")))
-	 (const_string "yes")
-
-	 (and (eq_attr "arch" "neon_nota8")
-	      (not (eq_attr "tune" "cortexa8"))
-	      (match_test "TARGET_NEON"))
+	 (and (eq_attr "arch" "neon_for_64bits")
+	      (match_test "TARGET_NEON")
+	      (match_test "TARGET_PREFER_NEON_64BITS"))
 	 (const_string "yes")
 
 	 (and (eq_attr "arch" "iwmmxt2")
@@ -4326,7 +4318,7 @@
   [(set_attr "length" "*,8,8,*")
    (set_attr "predicable" "no,yes,yes,no")
    (set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
-   (set_attr "arch" "neon_nota8,*,*,neon_onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")]
 )
 
 (define_expand "one_cmplsi2"
diff --git a/gcc/config/arm/arm.opt b/gcc/config/arm/arm.opt
index 61d2d2f..c4ede22 100644
--- a/gcc/config/arm/arm.opt
+++ b/gcc/config/arm/arm.opt
@@ -247,3 +247,7 @@ that may trigger Cortex-M3 errata.
 munaligned-access
 Target Report Var(unaligned_access) Init(2)
 Enable unaligned word and halfword accesses to packed data.
+
+mneon-for-64bits
+Target Report RejectNegative Var(use_neon_for_64bits) Init(0)
+Use Neon to perform 64-bits operations rather than core registers.
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index fc38269..6e13d1a 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -488,7 +488,7 @@
   [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1,*,*,*")
    (set_attr "conds" "*,clob,clob,*,clob,clob,clob")
    (set_attr "length" "*,8,8,*,8,8,8")
-   (set_attr "arch" "nota8,*,*,onlya8,*,*,*")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits,*,*,*")]
 )
 
 (define_insn "*sub<mode>3_neon"
@@ -525,7 +525,7 @@
   [(set_attr "neon_type" "neon_int_2,*,*,*,neon_int_2")
    (set_attr "conds" "*,clob,clob,clob,*")
    (set_attr "length" "*,8,8,8,*")
-   (set_attr "arch" "nota8,*,*,*,onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,*,avoid_neon_for_64bits")]
 )
 
 (define_insn "*mul<mode>3_neon"
@@ -700,7 +700,7 @@
 }
   [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
    (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+   (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")]
 )
 
 ;; The concrete forms of the Neon immediate-logic instructions are vbic and
@@ -745,7 +745,7 @@
 }
   [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
    (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+   (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")]
 )
 
 (define_insn "orn<mode>3_neon"
@@ -841,7 +841,7 @@
    veor\t%P0, %P1, %P2"
   [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
    (set_attr "length" "*,8,8,*")
-   (set_attr "arch" "nota8,*,*,onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")]
 )
 
 (define_insn "one_cmpl<mode>2"
@@ -1163,7 +1163,7 @@
       }
     DONE;
   }"
-  [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+  [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")
    (set_attr "opt" "*,*,speed,speed,*,*")]
 )
 
@@ -1264,7 +1264,7 @@
 
     DONE;
   }"
-  [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+  [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")
    (set_attr "opt" "*,*,speed,speed,*,*")]
 )
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 06ba770..01faecd 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -513,7 +513,8 @@ Objective-C and Objective-C++ Dialects}.
 -mtp=@var{name} -mtls-dialect=@var{dialect} @gol
 -mword-relocations @gol
 -mfix-cortex-m3-ldrd @gol
--munaligned-access}
+-munaligned-access @gol
+-mneon-for-64bits}
 
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol
@@ -11499,6 +11500,11 @@ setting of this option.  If unaligned access is enabled then the
 preprocessor symbol @code{__ARM_FEATURE_UNALIGNED} will also be
 defined.
 
+@item -mneon-for-64bits
+@opindex mneon-for-64bits
+Enables using Neon to handle scalar 64-bits operations. This is
+disabled by default since the cost of moving data from core registers
+to Neon is high.
 @end table
 
 @node AVR Options
diff --git a/gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c b/gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c
new file mode 100644
index 0000000..a2a4103
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c
@@ -0,0 +1,54 @@
+/* Check that Neon is *not* used by default to handle 64-bits scalar
+   operations.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+
+typedef long long i64;
+typedef unsigned long long u64;
+typedef unsigned int u32;
+typedef int i32;
+
+/* Unary operators */
+#define UNARY_OP(name, op) \
+  void unary_##name(u64 *a, u64 *b) { *a = op (*b + 0x1234567812345678ULL) ; }
+
+/* Binary operators */
+#define BINARY_OP(name, op) \
+  void binary_##name(u64 *a, u64 *b, u64 *c) { *a = *b op *c ; }
+
+/* Unsigned shift */
+#define SHIFT_U(name, op, amount) \
+  void ushift_##name(u64 *a, u64 *b, int c) { *a = *b op amount; }
+
+/* Signed shift */
+#define SHIFT_S(name, op, amount) \
+  void sshift_##name(i64 *a, i64 *b, int c) { *a = *b op amount; }
+
+UNARY_OP(not, ~)
+
+BINARY_OP(add, +)
+BINARY_OP(sub, -)
+BINARY_OP(and, &)
+BINARY_OP(or, |)
+BINARY_OP(xor, ^)
+
+SHIFT_U(right1, >>, 1)
+SHIFT_U(right2, >>, 2)
+SHIFT_U(right5, >>, 5)
+SHIFT_U(rightn, >>, c)
+
+SHIFT_S(right1, >>, 1)
+SHIFT_S(right2, >>, 2)
+SHIFT_S(right5, >>, 5)
+SHIFT_S(rightn, >>, c)
+
+/* { dg-final {scan-assembler-times "vmvn" 0} }  */
+/* { dg-final {scan-assembler-times "vadd" 0} }  */
+/* { dg-final {scan-assembler-times "vsub" 0} }  */
+/* { dg-final {scan-assembler-times "vand" 0} }  */
+/* { dg-final {scan-assembler-times "vorr" 0} }  */
+/* { dg-final {scan-assembler-times "veor" 0} }  */
+/* { dg-final {scan-assembler-times "vshr" 0} }  */
diff --git a/gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c b/gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c
new file mode 100644
index 0000000..035bfb7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c
@@ -0,0 +1,57 @@
+/* Check that Neon is used to handle 64-bits scalar operations.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mneon-for-64bits" } */
+/* { dg-add-options arm_neon } */
+
+typedef long long i64;
+typedef unsigned long long u64;
+typedef unsigned int u32;
+typedef int i32;
+
+/* Unary operators */
+#define UNARY_OP(name, op) \
+  void unary_##name(u64 *a, u64 *b) { *a = op (*b + 0x1234567812345678ULL) ; }
+
+/* Binary operators */
+#define BINARY_OP(name, op) \
+  void binary_##name(u64 *a, u64 *b, u64 *c) { *a = *b op *c ; }
+
+/* Unsigned shift */
+#define SHIFT_U(name, op, amount) \
+  void ushift_##name(u64 *a, u64 *b, int c) { *a = *b op amount; }
+
+/* Signed shift */
+#define SHIFT_S(name, op, amount) \
+  void sshift_##name(i64 *a, i64 *b, int c) { *a = *b op amount; }
+
+UNARY_OP(not, ~)
+
+BINARY_OP(add, +)
+BINARY_OP(sub, -)
+BINARY_OP(and, &)
+BINARY_OP(or, |)
+BINARY_OP(xor, ^)
+
+SHIFT_U(right1, >>, 1)
+SHIFT_U(right2, >>, 2)
+SHIFT_U(right5, >>, 5)
+SHIFT_U(rightn, >>, c)
+
+SHIFT_S(right1, >>, 1)
+SHIFT_S(right2, >>, 2)
+SHIFT_S(right5, >>, 5)
+SHIFT_S(rightn, >>, c)
+
+/* { dg-final {scan-assembler-times "vmvn" 1} }  */
+/* Two vadd: 1 in unary_not, 1 in binary_add */
+/* { dg-final {scan-assembler-times "vadd" 2} }  */
+/* { dg-final {scan-assembler-times "vsub" 1} }  */
+/* { dg-final {scan-assembler-times "vand" 1} }  */
+/* { dg-final {scan-assembler-times "vorr" 1} }  */
+/* { dg-final {scan-assembler-times "veor" 1} }  */
+/* 6 vshr for right shifts by constant, and variable right shift uses
+   vshl with a negative amount in register.  */
+/* { dg-final {scan-assembler-times "vshr" 6} }  */
+/* { dg-final {scan-assembler-times "vshl" 2} }  */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2012-12-19 16:00   ` Christophe Lyon
@ 2013-01-08 16:24     ` Christophe Lyon
  2013-01-16 13:46       ` Christophe Lyon
  2013-02-01 17:43     ` Ramana Radhakrishnan
  1 sibling, 1 reply; 12+ messages in thread
From: Christophe Lyon @ 2013-01-08 16:24 UTC (permalink / raw)
  To: Richard Earnshaw; +Cc: gcc-patches

Ping?
http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01197.html

Thanks,

Christophe

On 19 December 2012 16:59, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> On 17 December 2012 16:12, Richard Earnshaw <rearnsha@arm.com> wrote:
>> On 29/11/12 17:16, Christophe Lyon wrote:
>>> On trunk I have noticed a regression in gfortran when using modulo
>>> scheduling: sms-1.f90 now fails, but I suspect it's not because of
>>> this patch since forcing compilation for armv5t makes the same test
>>> fail with and without my patch.
>>>
>>
>> Hmm, that's worrying.  Could you please makesure this is recorded in
>> bugzilla.  If this is a regression, please mark it as such.
>>
> I was about to do so, but after bisecting it turns out that the
> problem was introduced by
> http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192969 and is very
> likely to be another instance of PR55562, which has just been fixed
> by http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01137.html.
>
>
>>
>> Now that this optimization is disabled by default, the onlya8 code is
>> completely redundant and should be purged, along with the insn alternatives
>> that used it.
>>
>> R.
>>
> Here is a new version of my patch, with the cleanup you requested.
>
> 2012-12-18  Christophe Lyon  <christophe.lyon@linaro.org>
>
>         gcc/
>         * config/arm/arm-protos.h (tune_params): Add
>         prefer_neon_for_64bits field.
>         * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>         (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>         (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>         (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>         (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>         (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>         (arm_option_override): Handle -mneon-for-64bits new option.
>         * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>         (prefer_neon_for_64bits): Declare new variable.
>         * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>         avoid_neon_for_64bits and neon_for_64bits. Remove onlya8 and
>         nota8.
>         (arch_enabled): Handle new arch types. Remove support for onlya8
>         and nota8.
>         (one_cmpldi2): Use new arch names.
>         * config/arm/arm.opt (mneon-for-64bits): Add option.
>         * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>         (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>         neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>         of onlya8.
>         * doc/invoke.texi (-mneon-for-64bits): Document.
>
>         gcc/testsuite/
>         * gcc.target/arm/neon-for-64bits-1.c: New tests.
>         * gcc.target/arm/neon-for-64bits-2.c: Likewise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2013-01-08 16:24     ` Christophe Lyon
@ 2013-01-16 13:46       ` Christophe Lyon
  2013-01-21 16:53         ` Christophe Lyon
  0 siblings, 1 reply; 12+ messages in thread
From: Christophe Lyon @ 2013-01-16 13:46 UTC (permalink / raw)
  To: gcc-patches

Ping^2 ?


On 8 January 2013 17:24, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> Ping?
> http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01197.html
>
> Thanks,
>
> Christophe
>
> On 19 December 2012 16:59, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>> On 17 December 2012 16:12, Richard Earnshaw <rearnsha@arm.com> wrote:
>>> On 29/11/12 17:16, Christophe Lyon wrote:
>>>> On trunk I have noticed a regression in gfortran when using modulo
>>>> scheduling: sms-1.f90 now fails, but I suspect it's not because of
>>>> this patch since forcing compilation for armv5t makes the same test
>>>> fail with and without my patch.
>>>>
>>>
>>> Hmm, that's worrying.  Could you please makesure this is recorded in
>>> bugzilla.  If this is a regression, please mark it as such.
>>>
>> I was about to do so, but after bisecting it turns out that the
>> problem was introduced by
>> http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192969 and is very
>> likely to be another instance of PR55562, which has just been fixed
>> by http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01137.html.
>>
>>
>>>
>>> Now that this optimization is disabled by default, the onlya8 code is
>>> completely redundant and should be purged, along with the insn alternatives
>>> that used it.
>>>
>>> R.
>>>
>> Here is a new version of my patch, with the cleanup you requested.
>>
>> 2012-12-18  Christophe Lyon  <christophe.lyon@linaro.org>
>>
>>         gcc/
>>         * config/arm/arm-protos.h (tune_params): Add
>>         prefer_neon_for_64bits field.
>>         * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>>         (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>>         (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>>         (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>>         (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>>         (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>>         (arm_option_override): Handle -mneon-for-64bits new option.
>>         * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>>         (prefer_neon_for_64bits): Declare new variable.
>>         * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>>         avoid_neon_for_64bits and neon_for_64bits. Remove onlya8 and
>>         nota8.
>>         (arch_enabled): Handle new arch types. Remove support for onlya8
>>         and nota8.
>>         (one_cmpldi2): Use new arch names.
>>         * config/arm/arm.opt (mneon-for-64bits): Add option.
>>         * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>>         (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>>         neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>>         of onlya8.
>>         * doc/invoke.texi (-mneon-for-64bits): Document.
>>
>>         gcc/testsuite/
>>         * gcc.target/arm/neon-for-64bits-1.c: New tests.
>>         * gcc.target/arm/neon-for-64bits-2.c: Likewise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2013-01-16 13:46       ` Christophe Lyon
@ 2013-01-21 16:53         ` Christophe Lyon
  0 siblings, 0 replies; 12+ messages in thread
From: Christophe Lyon @ 2013-01-21 16:53 UTC (permalink / raw)
  To: gcc-patches

Ping?
http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01197.html

On 16 January 2013 14:46, Christophe Lyon <christophe.lyon@linaro.org> wrote:
> Ping^2 ?
>
>
> On 8 January 2013 17:24, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>> Ping?
>> http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01197.html
>>
>> Thanks,
>>
>> Christophe
>>
>> On 19 December 2012 16:59, Christophe Lyon <christophe.lyon@linaro.org> wrote:
>>> On 17 December 2012 16:12, Richard Earnshaw <rearnsha@arm.com> wrote:
>>>> On 29/11/12 17:16, Christophe Lyon wrote:
>>>>> On trunk I have noticed a regression in gfortran when using modulo
>>>>> scheduling: sms-1.f90 now fails, but I suspect it's not because of
>>>>> this patch since forcing compilation for armv5t makes the same test
>>>>> fail with and without my patch.
>>>>>
>>>>
>>>> Hmm, that's worrying.  Could you please makesure this is recorded in
>>>> bugzilla.  If this is a regression, please mark it as such.
>>>>
>>> I was about to do so, but after bisecting it turns out that the
>>> problem was introduced by
>>> http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=192969 and is very
>>> likely to be another instance of PR55562, which has just been fixed
>>> by http://gcc.gnu.org/ml/gcc-patches/2012-12/msg01137.html.
>>>
>>>
>>>>
>>>> Now that this optimization is disabled by default, the onlya8 code is
>>>> completely redundant and should be purged, along with the insn alternatives
>>>> that used it.
>>>>
>>>> R.
>>>>
>>> Here is a new version of my patch, with the cleanup you requested.
>>>
>>> 2012-12-18  Christophe Lyon  <christophe.lyon@linaro.org>
>>>
>>>         gcc/
>>>         * config/arm/arm-protos.h (tune_params): Add
>>>         prefer_neon_for_64bits field.
>>>         * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>>>         (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>>>         (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>>>         (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>>>         (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>>>         (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>>>         (arm_option_override): Handle -mneon-for-64bits new option.
>>>         * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>>>         (prefer_neon_for_64bits): Declare new variable.
>>>         * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>>>         avoid_neon_for_64bits and neon_for_64bits. Remove onlya8 and
>>>         nota8.
>>>         (arch_enabled): Handle new arch types. Remove support for onlya8
>>>         and nota8.
>>>         (one_cmpldi2): Use new arch names.
>>>         * config/arm/arm.opt (mneon-for-64bits): Add option.
>>>         * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>>>         (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>>>         neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>>>         of onlya8.
>>>         * doc/invoke.texi (-mneon-for-64bits): Document.
>>>
>>>         gcc/testsuite/
>>>         * gcc.target/arm/neon-for-64bits-1.c: New tests.
>>>         * gcc.target/arm/neon-for-64bits-2.c: Likewise.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2012-12-19 16:00   ` Christophe Lyon
  2013-01-08 16:24     ` Christophe Lyon
@ 2013-02-01 17:43     ` Ramana Radhakrishnan
  2013-03-21 14:42       ` Christophe Lyon
  1 sibling, 1 reply; 12+ messages in thread
From: Ramana Radhakrishnan @ 2013-02-01 17:43 UTC (permalink / raw)
  To: Christophe Lyon; +Cc: Richard Earnshaw, gcc-patches


>>
> Here is a new version of my patch, with the cleanup you requested.
>
> 2012-12-18  Christophe Lyon  <christophe.lyon@linaro.org>
>
>          gcc/
>          * config/arm/arm-protos.h (tune_params): Add
>          prefer_neon_for_64bits field.
>          * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>          (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>          (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>          (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>          (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>          (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>          (arm_option_override): Handle -mneon-for-64bits new option.
>          * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>          (prefer_neon_for_64bits): Declare new variable.
>          * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>          avoid_neon_for_64bits and neon_for_64bits. Remove onlya8 and
>          nota8.
>          (arch_enabled): Handle new arch types. Remove support for onlya8
>          and nota8.
>          (one_cmpldi2): Use new arch names.
>          * config/arm/arm.opt (mneon-for-64bits): Add option.
>          * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>          (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>          neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
>          of onlya8.
>          * doc/invoke.texi (-mneon-for-64bits): Document.
>
>          gcc/testsuite/
>          * gcc.target/arm/neon-for-64bits-1.c: New tests.
>          * gcc.target/arm/neon-for-64bits-2.c: Likewise.
>



Ok for 4.9 stage1 now.

regards
Ramana

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem
  2013-02-01 17:43     ` Ramana Radhakrishnan
@ 2013-03-21 14:42       ` Christophe Lyon
  0 siblings, 0 replies; 12+ messages in thread
From: Christophe Lyon @ 2013-03-21 14:42 UTC (permalink / raw)
  To: ramrad01; +Cc: Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 3523 bytes --]

Here is what I have commited (svn 196876.): a few updates were necessary.

Christophe.

2013-03-21  Christophe Lyon  <christophe.lyon@linaro.org>

        gcc/
        * config/arm/arm-protos.h (tune_params): Add
        prefer_neon_for_64bits field.
        * config/arm/arm.c (prefer_neon_for_64bits): New variable.
        (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
        (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
        (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
        (arm_cortex_a15_tune, arm_cortex_a5_tune): Ditto.
        (arm_cortex_a9_tune, arm_v6m_tune, arm_fa726te_tune): Ditto.
        (arm_option_override): Handle -mneon-for-64bits new option.
        * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
        (prefer_neon_for_64bits): Declare new variable.
        * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
        avoid_neon_for_64bits and neon_for_64bits. Remove onlya8 and
        nota8.
        (arch_enabled): Handle new arch types. Remove support for onlya8
        and nota8.
        (one_cmpldi2): Use new arch names.
        (zero_extend<mode>di2, extend<mode>di2): Ditto.
       * config/arm/arm.opt (mneon-for-64bits): Add option.
        * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
        (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
        neon_for_64bits instead of nota8 and avoid_neon_for_64bits instead
        of onlya8.
        * doc/invoke.texi (-mneon-for-64bits): Document.

        gcc/testsuite:
        * gcc.target/arm/neon-for-64bits-1.c: New tests.
        * gcc.target/arm/neon-for-64bits-2.c: Likewise.



On 1 February 2013 18:43, Ramana Radhakrishnan <ramrad01@arm.com> wrote:
>
>>>
>> Here is a new version of my patch, with the cleanup you requested.
>>
>> 2012-12-18  Christophe Lyon  <christophe.lyon@linaro.org>
>>
>>          gcc/
>>          * config/arm/arm-protos.h (tune_params): Add
>>          prefer_neon_for_64bits field.
>>          * config/arm/arm.c (prefer_neon_for_64bits): New variable.
>>          (arm_slowmul_tune): Default prefer_neon_for_64bits to false.
>>          (arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune): Ditto.
>>          (arm_9e_tune, arm_v6t2_tune, arm_cortex_tune): Ditto.
>>          (arm_cortex_a5_tune, arm_cortex_a15_tune): Ditto.
>>          (arm_cortex_a9_tune, arm_fa726te_tune): Ditto.
>>          (arm_option_override): Handle -mneon-for-64bits new option.
>>          * config/arm/arm.h (TARGET_PREFER_NEON_64BITS): New macro.
>>          (prefer_neon_for_64bits): Declare new variable.
>>          * config/arm/arm.md (arch): Rename neon_onlya8 and neon_nota8 to
>>          avoid_neon_for_64bits and neon_for_64bits. Remove onlya8 and
>>          nota8.
>>          (arch_enabled): Handle new arch types. Remove support for onlya8
>>          and nota8.
>>          (one_cmpldi2): Use new arch names.
>>          * config/arm/arm.opt (mneon-for-64bits): Add option.
>>          * config/arm/neon.md (adddi3_neon, subdi3_neon, iordi3_neon)
>>          (anddi3_neon, xordi3_neon, ashldi3_neon, <shift>di3_neon): Use
>>          neon_for_64bits instead of nota8 and avoid_neon_for_64bits
>> instead
>>          of onlya8.
>>          * doc/invoke.texi (-mneon-for-64bits): Document.
>>
>>          gcc/testsuite/
>>          * gcc.target/arm/neon-for-64bits-1.c: New tests.
>>          * gcc.target/arm/neon-for-64bits-2.c: Likewise.
>>
>
>
>
> Ok for 4.9 stage1 now.
>
> regards
> Ramana
>

[-- Attachment #2: turn-off-64bits-neon.txt --]
[-- Type: text/plain, Size: 17197 bytes --]

Index: gcc/config/arm/arm-protos.h
===================================================================
--- gcc/config/arm/arm-protos.h	(revision 196875)
+++ gcc/config/arm/arm-protos.h	(revision 196876)
@@ -269,6 +269,8 @@ struct tune_params
   bool logical_op_non_short_circuit[2];
   /* Vectorizer costs.  */
   const struct cpu_vec_costs* vec_costs;
+  /* Prefer Neon for 64-bit bitops.  */
+  bool prefer_neon_for_64bits;
 };
 
 extern const struct tune_params *current_tune;
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	(revision 196875)
+++ gcc/config/arm/arm.c	(revision 196876)
@@ -839,6 +839,10 @@ int arm_arch_thumb2;
 int arm_arch_arm_hwdiv;
 int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+int prefer_neon_for_64bits = 0;
+
 /* In case of a PRE_INC, POST_INC, PRE_DEC, POST_DEC memory reference,
    we must report the mode of the memory reference from
    TARGET_PRINT_OPERAND to TARGET_PRINT_OPERAND_ADDRESS.  */
@@ -936,6 +940,7 @@ const struct tune_params arm_slowmul_tun
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_fastmul_tune =
@@ -950,6 +955,7 @@ const struct tune_params arm_fastmul_tun
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* StrongARM has early execution of branches, so a sequence that is worth
@@ -967,6 +973,7 @@ const struct tune_params arm_strongarm_t
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_xscale_tune =
@@ -981,6 +988,7 @@ const struct tune_params arm_xscale_tune
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_9e_tune =
@@ -995,6 +1003,7 @@ const struct tune_params arm_9e_tune =
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_v6t2_tune =
@@ -1009,6 +1018,7 @@ const struct tune_params arm_v6t2_tune =
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* Generic Cortex tuning.  Use more specific tunings if appropriate.  */
@@ -1024,6 +1034,7 @@ const struct tune_params arm_cortex_tune
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_cortex_a15_tune =
@@ -1038,6 +1049,7 @@ const struct tune_params arm_cortex_a15_
   true,						/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* Branches can be dual-issued on Cortex-A5, so conditional execution is
@@ -1055,6 +1067,7 @@ const struct tune_params arm_cortex_a5_t
   false,					/* Prefer LDRD/STRD.  */
   {false, false},				/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_cortex_a9_tune =
@@ -1069,6 +1082,7 @@ const struct tune_params arm_cortex_a9_t
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than
@@ -1085,6 +1099,7 @@ const struct tune_params arm_v6m_tune =
   false,					/* Prefer LDRD/STRD.  */
   {false, false},				/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 const struct tune_params arm_fa726te_tune =
@@ -1099,6 +1114,7 @@ const struct tune_params arm_fa726te_tun
   false,					/* Prefer LDRD/STRD.  */
   {true, true},					/* Prefer non short circuit.  */
   &arm_default_vec_cost,                        /* Vectorizer costs.  */
+  false                                         /* Prefer Neon for 64-bits bitops.  */
 };
 
 
@@ -2129,6 +2145,12 @@ arm_option_override (void)
                            global_options.x_param_values,
                            global_options_set.x_param_values);
 
+  /* Use Neon to perform 64-bits operations rather than core
+     registers.  */
+  prefer_neon_for_64bits = current_tune->prefer_neon_for_64bits;
+  if (use_neon_for_64bits == 1)
+     prefer_neon_for_64bits = true;
+
   /* Use the alternative scheduling-pressure algorithm by default.  */
   maybe_set_param_value (PARAM_SCHED_PRESSURE_ALGORITHM, 2,
                          global_options.x_param_values,
Index: gcc/config/arm/arm.h
===================================================================
--- gcc/config/arm/arm.h	(revision 196875)
+++ gcc/config/arm/arm.h	(revision 196876)
@@ -354,6 +354,9 @@ extern void (*arm_lang_output_object_att
 #define TARGET_IDIV		((TARGET_ARM && arm_arch_arm_hwdiv) \
 				 || (TARGET_THUMB2 && arm_arch_thumb_hwdiv))
 
+/* Should NEON be used for 64-bits bitops.  */
+#define TARGET_PREFER_NEON_64BITS (prefer_neon_for_64bits)
+
 /* True iff the full BPABI is being used.  If TARGET_BPABI is true,
    then TARGET_AAPCS_BASED must be true -- but the converse does not
    hold.  TARGET_BPABI implies the use of the BPABI runtime library,
@@ -539,6 +542,10 @@ extern int arm_arch_arm_hwdiv;
 /* Nonzero if chip supports integer division instruction in Thumb mode.  */
 extern int arm_arch_thumb_hwdiv;
 
+/* Nonzero if we should use Neon to handle 64-bits operations rather
+   than core registers.  */
+extern int prefer_neon_for_64bits;
+
 #ifndef TARGET_DEFAULT
 #define TARGET_DEFAULT  (MASK_APCS_FRAME)
 #endif
Index: gcc/config/arm/arm.md
===================================================================
--- gcc/config/arm/arm.md	(revision 196875)
+++ gcc/config/arm/arm.md	(revision 196876)
@@ -94,7 +94,7 @@
 ; for ARM or Thumb-2 with arm_arch6, and nov6 for ARM without
 ; arm_arch6.  This attribute is used to compute attribute "enabled",
 ; use type "any" to enable an alternative in all cases.
-(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,onlya8,neon_onlya8,nota8,neon_nota8,iwmmxt,iwmmxt2"
+(define_attr "arch" "any,a,t,32,t1,t2,v6,nov6,neon_for_64bits,avoid_neon_for_64bits,iwmmxt,iwmmxt2"
   (const_string "any"))
 
 (define_attr "arch_enabled" "no,yes"
@@ -129,22 +129,14 @@
 	      (match_test "TARGET_32BIT && !arm_arch6"))
 	 (const_string "yes")
 
-	 (and (eq_attr "arch" "onlya8")
-	      (eq_attr "tune" "cortexa8"))
+	 (and (eq_attr "arch" "avoid_neon_for_64bits")
+	      (match_test "TARGET_NEON")
+	      (not (match_test "TARGET_PREFER_NEON_64BITS")))
 	 (const_string "yes")
 
-	 (and (eq_attr "arch" "neon_onlya8")
-	      (eq_attr "tune" "cortexa8")
-	      (match_test "TARGET_NEON"))
-	 (const_string "yes")
-
-	 (and (eq_attr "arch" "nota8")
-	      (not (eq_attr "tune" "cortexa8")))
-	 (const_string "yes")
-
-	 (and (eq_attr "arch" "neon_nota8")
-	      (not (eq_attr "tune" "cortexa8"))
-	      (match_test "TARGET_NEON"))
+	 (and (eq_attr "arch" "neon_for_64bits")
+	      (match_test "TARGET_NEON")
+	      (match_test "TARGET_PREFER_NEON_64BITS"))
 	 (const_string "yes")
 
 	 (and (eq_attr "arch" "iwmmxt2")
@@ -4330,7 +4322,7 @@
   [(set_attr "length" "*,8,8,*")
    (set_attr "predicable" "no,yes,yes,no")
    (set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
-   (set_attr "arch" "neon_nota8,*,*,neon_onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")]
 )
 
 (define_expand "one_cmplsi2"
@@ -4498,7 +4490,7 @@
   "TARGET_32BIT <qhs_zextenddi_cond>"
   "#"
   [(set_attr "length" "8,4,8,8")
-   (set_attr "arch" "neon_nota8,*,*,neon_onlya8")
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")
    (set_attr "ce_count" "2")
    (set_attr "predicable" "yes")]
 )
@@ -4513,7 +4505,7 @@
    (set_attr "ce_count" "2")
    (set_attr "shift" "1")
    (set_attr "predicable" "yes")
-   (set_attr "arch" "neon_nota8,*,a,t,neon_onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,a,t,avoid_neon_for_64bits")]
 )
 
 ;; Splits for all extensions to DImode
Index: gcc/config/arm/arm.opt
===================================================================
--- gcc/config/arm/arm.opt	(revision 196875)
+++ gcc/config/arm/arm.opt	(revision 196876)
@@ -247,3 +247,7 @@ that may trigger Cortex-M3 errata.
 munaligned-access
 Target Report Var(unaligned_access) Init(2)
 Enable unaligned word and halfword accesses to packed data.
+
+mneon-for-64bits
+Target Report RejectNegative Var(use_neon_for_64bits) Init(0)
+Use Neon to perform 64-bits operations rather than core registers.
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 196875)
+++ gcc/config/arm/neon.md	(revision 196876)
@@ -487,7 +487,7 @@
   [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1,*,*,*")
    (set_attr "conds" "*,clob,clob,*,clob,clob,clob")
    (set_attr "length" "*,8,8,*,8,8,8")
-   (set_attr "arch" "nota8,*,*,onlya8,*,*,*")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits,*,*,*")]
 )
 
 (define_insn "*sub<mode>3_neon"
@@ -524,7 +524,7 @@
   [(set_attr "neon_type" "neon_int_2,*,*,*,neon_int_2")
    (set_attr "conds" "*,clob,clob,clob,*")
    (set_attr "length" "*,8,8,8,*")
-   (set_attr "arch" "nota8,*,*,*,onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,*,avoid_neon_for_64bits")]
 )
 
 (define_insn "*mul<mode>3_neon"
@@ -699,7 +699,7 @@
 }
   [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
    (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+   (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")]
 )
 
 ;; The concrete forms of the Neon immediate-logic instructions are vbic and
@@ -744,7 +744,7 @@
 }
   [(set_attr "neon_type" "neon_int_1,neon_int_1,*,*,neon_int_1,neon_int_1")
    (set_attr "length" "*,*,8,8,*,*")
-   (set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")]
+   (set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")]
 )
 
 (define_insn "orn<mode>3_neon"
@@ -840,7 +840,7 @@
    veor\t%P0, %P1, %P2"
   [(set_attr "neon_type" "neon_int_1,*,*,neon_int_1")
    (set_attr "length" "*,8,8,*")
-   (set_attr "arch" "nota8,*,*,onlya8")]
+   (set_attr "arch" "neon_for_64bits,*,*,avoid_neon_for_64bits")]
 )
 
 (define_insn "one_cmpl<mode>2"
@@ -1162,7 +1162,7 @@
       }
     DONE;
   }"
-  [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+  [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")
    (set_attr "opt" "*,*,speed,speed,*,*")]
 )
 
@@ -1263,7 +1263,7 @@
 
     DONE;
   }"
-  [(set_attr "arch" "nota8,nota8,*,*,onlya8,onlya8")
+  [(set_attr "arch" "neon_for_64bits,neon_for_64bits,*,*,avoid_neon_for_64bits,avoid_neon_for_64bits")
    (set_attr "opt" "*,*,speed,speed,*,*")]
 )
 
Index: gcc/doc/invoke.texi
===================================================================
--- gcc/doc/invoke.texi	(revision 196875)
+++ gcc/doc/invoke.texi	(revision 196876)
@@ -510,7 +510,8 @@ Objective-C and Objective-C++ Dialects}.
 -mtp=@var{name} -mtls-dialect=@var{dialect} @gol
 -mword-relocations @gol
 -mfix-cortex-m3-ldrd @gol
--munaligned-access}
+-munaligned-access @gol
+-mneon-for-64bits}
 
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol
@@ -11530,6 +11531,11 @@ setting of this option.  If unaligned ac
 preprocessor symbol @code{__ARM_FEATURE_UNALIGNED} will also be
 defined.
 
+@item -mneon-for-64bits
+@opindex mneon-for-64bits
+Enables using Neon to handle scalar 64-bits operations. This is
+disabled by default since the cost of moving data from core registers
+to Neon is high.
 @end table
 
 @node AVR Options
Index: gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c
===================================================================
--- gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/neon-for-64bits-1.c	(revision 196876)
@@ -0,0 +1,54 @@
+/* Check that Neon is *not* used by default to handle 64-bits scalar
+   operations.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2" } */
+/* { dg-add-options arm_neon } */
+
+typedef long long i64;
+typedef unsigned long long u64;
+typedef unsigned int u32;
+typedef int i32;
+
+/* Unary operators */
+#define UNARY_OP(name, op) \
+  void unary_##name(u64 *a, u64 *b) { *a = op (*b + 0x1234567812345678ULL) ; }
+
+/* Binary operators */
+#define BINARY_OP(name, op) \
+  void binary_##name(u64 *a, u64 *b, u64 *c) { *a = *b op *c ; }
+
+/* Unsigned shift */
+#define SHIFT_U(name, op, amount) \
+  void ushift_##name(u64 *a, u64 *b, int c) { *a = *b op amount; }
+
+/* Signed shift */
+#define SHIFT_S(name, op, amount) \
+  void sshift_##name(i64 *a, i64 *b, int c) { *a = *b op amount; }
+
+UNARY_OP(not, ~)
+
+BINARY_OP(add, +)
+BINARY_OP(sub, -)
+BINARY_OP(and, &)
+BINARY_OP(or, |)
+BINARY_OP(xor, ^)
+
+SHIFT_U(right1, >>, 1)
+SHIFT_U(right2, >>, 2)
+SHIFT_U(right5, >>, 5)
+SHIFT_U(rightn, >>, c)
+
+SHIFT_S(right1, >>, 1)
+SHIFT_S(right2, >>, 2)
+SHIFT_S(right5, >>, 5)
+SHIFT_S(rightn, >>, c)
+
+/* { dg-final {scan-assembler-times "vmvn" 0} }  */
+/* { dg-final {scan-assembler-times "vadd" 0} }  */
+/* { dg-final {scan-assembler-times "vsub" 0} }  */
+/* { dg-final {scan-assembler-times "vand" 0} }  */
+/* { dg-final {scan-assembler-times "vorr" 0} }  */
+/* { dg-final {scan-assembler-times "veor" 0} }  */
+/* { dg-final {scan-assembler-times "vshr" 0} }  */
Index: gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c
===================================================================
--- gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/arm/neon-for-64bits-2.c	(revision 196876)
@@ -0,0 +1,57 @@
+/* Check that Neon is used to handle 64-bits scalar operations.  */
+
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O2 -mneon-for-64bits" } */
+/* { dg-add-options arm_neon } */
+
+typedef long long i64;
+typedef unsigned long long u64;
+typedef unsigned int u32;
+typedef int i32;
+
+/* Unary operators */
+#define UNARY_OP(name, op) \
+  void unary_##name(u64 *a, u64 *b) { *a = op (*b + 0x1234567812345678ULL) ; }
+
+/* Binary operators */
+#define BINARY_OP(name, op) \
+  void binary_##name(u64 *a, u64 *b, u64 *c) { *a = *b op *c ; }
+
+/* Unsigned shift */
+#define SHIFT_U(name, op, amount) \
+  void ushift_##name(u64 *a, u64 *b, int c) { *a = *b op amount; }
+
+/* Signed shift */
+#define SHIFT_S(name, op, amount) \
+  void sshift_##name(i64 *a, i64 *b, int c) { *a = *b op amount; }
+
+UNARY_OP(not, ~)
+
+BINARY_OP(add, +)
+BINARY_OP(sub, -)
+BINARY_OP(and, &)
+BINARY_OP(or, |)
+BINARY_OP(xor, ^)
+
+SHIFT_U(right1, >>, 1)
+SHIFT_U(right2, >>, 2)
+SHIFT_U(right5, >>, 5)
+SHIFT_U(rightn, >>, c)
+
+SHIFT_S(right1, >>, 1)
+SHIFT_S(right2, >>, 2)
+SHIFT_S(right5, >>, 5)
+SHIFT_S(rightn, >>, c)
+
+/* { dg-final {scan-assembler-times "vmvn" 1} }  */
+/* Two vadd: 1 in unary_not, 1 in binary_add */
+/* { dg-final {scan-assembler-times "vadd" 2} }  */
+/* { dg-final {scan-assembler-times "vsub" 1} }  */
+/* { dg-final {scan-assembler-times "vand" 1} }  */
+/* { dg-final {scan-assembler-times "vorr" 1} }  */
+/* { dg-final {scan-assembler-times "veor" 1} }  */
+/* 6 vshr for right shifts by constant, and variable right shift uses
+   vshl with a negative amount in register.  */
+/* { dg-final {scan-assembler-times "vshr" 6} }  */
+/* { dg-final {scan-assembler-times "vshl" 2} }  */

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2013-03-21 14:42 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-29 17:16 [ARM] Turning off 64bits ops in Neon and gfortran/modulo-scheduling problem Christophe Lyon
2012-11-29 21:12 ` Joseph S. Myers
2012-11-30 16:38   ` Christophe Lyon
2012-12-07  8:35     ` Christophe Lyon
2012-12-14 16:58       ` Christophe Lyon
2012-12-17 15:12 ` Richard Earnshaw
2012-12-19 16:00   ` Christophe Lyon
2013-01-08 16:24     ` Christophe Lyon
2013-01-16 13:46       ` Christophe Lyon
2013-01-21 16:53         ` Christophe Lyon
2013-02-01 17:43     ` Ramana Radhakrishnan
2013-03-21 14:42       ` Christophe Lyon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).