[PATCH, ARM] Avoid element-order-dependent operations for quad-word vectors in big-endian mode for NEON

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH, ARM] Avoid element-order-dependent operations for quad-word vectors in big-endian mode for NEON
@ 2010-12-09 14:56 Julian Brown
  2010-12-09 15:13 ` Joseph S. Myers
  0 siblings, 1 reply; 7+ messages in thread
From: Julian Brown @ 2010-12-09 14:56 UTC (permalink / raw)
  To: gcc-patches; +Cc: paul, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 2502 bytes --]

(Sending to the right mailing list address this time, sorry for
duplicates!)

Hi,

This patch fixes a large number of execution failures which occur when
compiling with NEON and auto-vectorization enabled in big-endian mode,
particularly when -mvectorize-with-neon-quad is in use.

The basic issue (as discussed several times previously, e.g.
http://gcc.gnu.org/ml/gcc-patches/2010-06/msg00409.html) is that
NEON quad-word vectors are formed from pairs of double-word vectors,
and those pairs are "the wrong way round" in big-endian mode. This is
contrary to the middle-end's expectations, and several patterns
introduced fairly recently to the GCC NEON support fail to take this
"feature" into account.

So: in the interests of un-breaking code in big-endian mode, the
attached patch does the simplest thing possible: disabling the
troublesome patterns in big-endian mode. I experimented with other
approaches, i.e. trying to "hide" the permutations required by NEON in
the backend patterns:

  http://lists.linaro.org/pipermail/linaro-toolchain/2010-November/000536.html

But eventually came to the conclusion that it was impractical to do
that. (We're planning to revisit the "internal" ordering of vectors for
big-endian NEON in the vectorizer at least, anyway.)

I've tested this patch with custom multilib configurations
(big-endian/little-endian), with the following options:

  little-endian, -mfpu=neon -mfloat-abi=softfp

  little-endian, -mfpu=neon -mfloat-abi=softfp
     -mvectorize-with-neon-quad

  big-endian, -mfpu=neon -mfloat-abi=softfp

  big-endian, -mfpu=neon -mfloat-abi=softfp -mvectorize-with-neon-quad

Unfortunately for C only, since building C++ was broken at the time I
started testing. Some tests fail to vectorize post-patch in BE mode
(predictably, since fewer things end up vectorizable), but many
execution tests transition from FAIL to PASS.

OK to apply?

Thanks,

Julian

ChangeLog

    gcc/
    * config/arm/neon.md (vec_shr_<mode>, vec_shl_<mode>): Disable in
    big-endian mode.
    (reduc_splus_<mode>, reduc_uplus_<mode>, reduc_smin_<mode>)
    (reduc_smax_<mode>, reduc_umin_<mode>, reduc_umax_<mode>)
    (neon_vec_unpack<US>_lo_<mode>, neon_vec_unpack<US>_hi_<mode>)
    (vec_unpack<US>_hi_<mode>, vec_unpack<US>_lo_<mode>)
    (neon_vec_<US>mult_lo_<mode>, vec_widen_<US>mult_lo_<mode>)
    (neon_vec_<US>mult_hi_<mode>, vec_widen_<US>mult_hi_<mode>)
    (vec_pack_trunc_<mode>, neon_vec_pack_trunc_<mode>): Disable for Q
    registers in big-endian mode.

[-- Attachment #2: neon-quad-vector-big-endian-fixes-fsf-2.diff --]
[-- Type: text/x-patch, Size: 7304 bytes --]

Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 167434)
+++ gcc/config/arm/neon.md	(working copy)
@@ -1028,11 +1028,14 @@
 ;; shift-count granularity. That's good enough for the middle-end's current
 ;; needs.
 
+;; Note that it's not safe to perform such an operation in big-endian mode,
+;; due to element-ordering issues.
+
 (define_expand "vec_shr_<mode>"
   [(match_operand:VDQ 0 "s_register_operand" "")
    (match_operand:VDQ 1 "s_register_operand" "")
    (match_operand:SI 2 "const_multiple_of_8_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx zero_reg;
   HOST_WIDE_INT num_bits = INTVAL (operands[2]);
@@ -1060,7 +1063,7 @@
   [(match_operand:VDQ 0 "s_register_operand" "")
    (match_operand:VDQ 1 "s_register_operand" "")
    (match_operand:SI 2 "const_multiple_of_8_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx zero_reg;
   HOST_WIDE_INT num_bits = INTVAL (operands[2]);
@@ -1256,7 +1259,8 @@
 (define_expand "reduc_splus_<mode>"
   [(match_operand:VQ 0 "s_register_operand" "")
    (match_operand:VQ 1 "s_register_operand" "")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)
+   && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -1272,7 +1276,7 @@
   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
 	(unspec:V2DI [(match_operand:V2DI 1 "s_register_operand" "w")]
 		     UNSPEC_VPADD))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vadd.i64\t%e0, %e1, %f1"
   [(set_attr "neon_type" "neon_int_1")]
 )
@@ -1282,7 +1286,7 @@
 (define_expand "reduc_uplus_<mode>"
   [(match_operand:VDQI 0 "s_register_operand" "")
    (match_operand:VDQI 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && (<Is_d_reg> || !BYTES_BIG_ENDIAN)"
 {
   emit_insn (gen_reduc_splus_<mode> (operands[0], operands[1]));
   DONE;
@@ -1301,7 +1305,8 @@
 (define_expand "reduc_smin_<mode>"
   [(match_operand:VQ 0 "s_register_operand" "")
    (match_operand:VQ 1 "s_register_operand" "")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)
+   && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -1326,7 +1331,8 @@
 (define_expand "reduc_smax_<mode>"
   [(match_operand:VQ 0 "s_register_operand" "")
    (match_operand:VQ 1 "s_register_operand" "")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)
+   && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -1351,7 +1357,7 @@
 (define_expand "reduc_umin_<mode>"
   [(match_operand:VQI 0 "s_register_operand" "")
    (match_operand:VQI 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -1376,7 +1382,7 @@
 (define_expand "reduc_umax_<mode>"
   [(match_operand:VQI 0 "s_register_operand" "")
    (match_operand:VQI 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -5238,7 +5244,7 @@
         (SE:<V_unpack> (vec_select:<V_HALF>
 			  (match_operand:VU 1 "register_operand" "w")
 			  (match_operand:VU 2 "vect_par_constant_low" ""))))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmovl.<US><V_sz_elem> %q0, %e1"
   [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5248,7 +5254,7 @@
         (SE:<V_unpack> (vec_select:<V_HALF>
 			  (match_operand:VU 1 "register_operand" "w")
 			  (match_operand:VU 2 "vect_par_constant_high" ""))))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmovl.<US><V_sz_elem> %q0, %f1"
   [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5256,7 +5262,7 @@
 (define_expand "vec_unpack<US>_hi_<mode>"
   [(match_operand:<V_unpack> 0 "register_operand" "")
    (SE:<V_unpack> (match_operand:VU 1 "register_operand"))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
   {
    rtvec v = rtvec_alloc (<V_mode_nunits>/2)  ;
    rtx t1;
@@ -5275,7 +5281,7 @@
 (define_expand "vec_unpack<US>_lo_<mode>"
   [(match_operand:<V_unpack> 0 "register_operand" "")
    (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
   {
    rtvec v = rtvec_alloc (<V_mode_nunits>/2)  ;
    rtx t1;
@@ -5298,7 +5304,7 @@
  		        (SE:<V_unpack> (vec_select:<V_HALF>
                            (match_operand:VU 3 "register_operand" "w") 
                            (match_dup 2)))))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmull.<US><V_sz_elem> %q0, %e1, %e3"
   [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5307,7 +5313,7 @@
   [(match_operand:<V_unpack> 0 "register_operand" "")
    (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))
    (SE:<V_unpack> (match_operand:VU 2 "register_operand" ""))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  {
    rtvec v = rtvec_alloc (<V_mode_nunits>/2)  ;
    rtx t1;
@@ -5332,7 +5338,7 @@
 		       (SE:<V_unpack> (vec_select:<V_HALF>
 			    (match_operand:VU 3 "register_operand" "w") 
 			    (match_dup 2)))))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmull.<US><V_sz_elem> %q0, %f1, %f3"
   [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5341,7 +5347,7 @@
   [(match_operand:<V_unpack> 0 "register_operand" "")
    (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))
    (SE:<V_unpack> (match_operand:VU 2 "register_operand" ""))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  {
    rtvec v = rtvec_alloc (<V_mode_nunits>/2)  ;
    rtx t1;
@@ -5435,6 +5441,10 @@
  }
 )
 
+; FIXME: These instruction patterns can't be used safely in big-endian mode
+; because the ordering of vector elements in Q registers is different from what
+; the semantics of the instructions require.
+
 (define_insn "vec_pack_trunc_<mode>"
  [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=&w")
        (vec_concat:<V_narrow_pack> 
@@ -5442,7 +5452,7 @@
 			(match_operand:VN 1 "register_operand" "w"))
 		(truncate:<V_narrow>
 			(match_operand:VN 2 "register_operand" "w"))))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  "vmovn.i<V_sz_elem>\t%e0, %q1\n\tvmovn.i<V_sz_elem>\t%f0, %q2"
  [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5451,7 +5461,7 @@
 (define_insn "neon_vec_pack_trunc_<mode>"
  [(set (match_operand:<V_narrow> 0 "register_operand" "=w")
        (truncate:<V_narrow> (match_operand:VN 1 "register_operand" "w")))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  "vmovn.i<V_sz_elem>\t%P0, %q1"
  [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5460,7 +5470,7 @@
  [(match_operand:<V_narrow_pack> 0 "register_operand" "")
   (match_operand:VSHFT 1 "register_operand" "")
   (match_operand:VSHFT 2 "register_operand")]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx tempreg = gen_reg_rtx (<V_DOUBLE>mode);
   

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, ARM] Avoid element-order-dependent operations for quad-word vectors in big-endian mode for NEON
  2010-12-09 14:56 [PATCH, ARM] Avoid element-order-dependent operations for quad-word vectors in big-endian mode for NEON Julian Brown
@ 2010-12-09 15:13 ` Joseph S. Myers
  2010-12-09 15:19   ` Julian Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Joseph S. Myers @ 2010-12-09 15:13 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, paul, Richard Earnshaw

On Thu, 9 Dec 2010, Julian Brown wrote:

> Unfortunately for C only, since building C++ was broken at the time I
> started testing. Some tests fail to vectorize post-patch in BE mode
> (predictably, since fewer things end up vectorizable), but many
> execution tests transition from FAIL to PASS.

Do any tests (scan, not execution) now fail for big endian in ways that 
indicate some of the check_effective_target_vect_* functions in 
target-supports.exp should be updated to know that certain features are 
only supported for little endian?

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, ARM] Avoid element-order-dependent operations for quad-word vectors in big-endian mode for NEON
  2010-12-09 15:13 ` Joseph S. Myers
@ 2010-12-09 15:19   ` Julian Brown
  2011-01-12 18:02     ` Julian Brown
  0 siblings, 1 reply; 7+ messages in thread
From: Julian Brown @ 2010-12-09 15:19 UTC (permalink / raw)
  To: Joseph S. Myers; +Cc: gcc-patches, paul, Richard Earnshaw

On Thu, 9 Dec 2010 14:40:59 +0000 (UTC)
"Joseph S. Myers" <joseph@codesourcery.com> wrote:

> On Thu, 9 Dec 2010, Julian Brown wrote:
> 
> > Unfortunately for C only, since building C++ was broken at the time
> > I started testing. Some tests fail to vectorize post-patch in BE
> > mode (predictably, since fewer things end up vectorizable), but many
> > execution tests transition from FAIL to PASS.
> 
> Do any tests (scan, not execution) now fail for big endian in ways
> that indicate some of the check_effective_target_vect_* functions in 
> target-supports.exp should be updated to know that certain features
> are only supported for little endian?

Possibly, yes: I suspect a few of those may need updating, really. One
awkward bit is that I'm still allowing some of the operations for
double-word registers only -- I don't think the
check_effective_target_* tests will suffice to distinguish between
D-reg & Q-reg cases.

Julian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, ARM] Avoid element-order-dependent operations for quad-word vectors in big-endian mode for NEON
  2010-12-09 15:19   ` Julian Brown
@ 2011-01-12 18:02     ` Julian Brown
  2011-02-09 12:11       ` Julian Brown
  2011-06-30 13:23       ` Richard Earnshaw
  0 siblings, 2 replies; 7+ messages in thread
From: Julian Brown @ 2011-01-12 18:02 UTC (permalink / raw)
  To: gcc-patches, Joseph S. Myers; +Cc: paul, Richard Earnshaw

[-- Attachment #1: Type: text/plain, Size: 2378 bytes --]

On Thu, 9 Dec 2010 14:56:39 +0000
Julian Brown <julian@codesourcery.com> wrote:

> On Thu, 9 Dec 2010 14:40:59 +0000 (UTC)
> "Joseph S. Myers" <joseph@codesourcery.com> wrote:
> 
> > On Thu, 9 Dec 2010, Julian Brown wrote:
> > 
> > > Unfortunately for C only, since building C++ was broken at the
> > > time I started testing. Some tests fail to vectorize post-patch
> > > in BE mode (predictably, since fewer things end up vectorizable),
> > > but many execution tests transition from FAIL to PASS.
> > 
> > Do any tests (scan, not execution) now fail for big endian in ways
> > that indicate some of the check_effective_target_vect_* functions
> > in target-supports.exp should be updated to know that certain
> > features are only supported for little endian?
> 
> Possibly, yes: I suspect a few of those may need updating, really. One
> awkward bit is that I'm still allowing some of the operations for
> double-word registers only -- I don't think the
> check_effective_target_* tests will suffice to distinguish between
> D-reg & Q-reg cases.

This version of the patch tweaks target-supports.exp to say that
various operations are not available in big-endian mode (removing some
of the FAILs from the previous version -- though in big-endian mode
without -mvectorize-with-neon-quad, some tests have transitioned from
PASS to XPASS. I'm not sure that's worth worrying about).

The main part of the patch remains unchanged.

OK to apply?

Cheers,

Julian

ChangeLog

    gcc/
    * config/arm/neon.md (vec_shr_<mode>, vec_shl_<mode>): Disable in
    big-endian mode.
    (reduc_splus_<mode>, reduc_uplus_<mode>, reduc_smin_<mode>)
    (reduc_smax_<mode>, reduc_umin_<mode>, reduc_umax_<mode>)
    (neon_vec_unpack<US>_lo_<mode>, neon_vec_unpack<US>_hi_<mode>)
    (vec_unpack<US>_hi_<mode>, vec_unpack<US>_lo_<mode>)
    (neon_vec_<US>mult_lo_<mode>, vec_widen_<US>mult_lo_<mode>)
    (neon_vec_<US>mult_hi_<mode>, vec_widen_<US>mult_hi_<mode>)
    (vec_pack_trunc_<mode>, neon_vec_pack_trunc_<mode>): Disable for Q
    registers in big-endian mode.

    gcc/testsuite/
    * lib/target-supports.exp
    (check_effective_target_arm_little_endian): New.
    (check_effective_target_vect_pack_trunc): Use above.
    (check_effective_target_vect_unpack): Likewise.
    (check_effective_target_vect_element_align): Test
    check_effective_target_arm_vect_no_misalign for ARM.

[-- Attachment #2: neon-quad-vector-big-endian-fixes-fsf-4.diff --]
[-- Type: text/x-patch, Size: 9354 bytes --]

Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	(revision 167434)
+++ gcc/config/arm/neon.md	(working copy)
@@ -1028,11 +1028,14 @@
 ;; shift-count granularity. That's good enough for the middle-end's current
 ;; needs.
 
+;; Note that it's not safe to perform such an operation in big-endian mode,
+;; due to element-ordering issues.
+
 (define_expand "vec_shr_<mode>"
   [(match_operand:VDQ 0 "s_register_operand" "")
    (match_operand:VDQ 1 "s_register_operand" "")
    (match_operand:SI 2 "const_multiple_of_8_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx zero_reg;
   HOST_WIDE_INT num_bits = INTVAL (operands[2]);
@@ -1060,7 +1063,7 @@
   [(match_operand:VDQ 0 "s_register_operand" "")
    (match_operand:VDQ 1 "s_register_operand" "")
    (match_operand:SI 2 "const_multiple_of_8_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx zero_reg;
   HOST_WIDE_INT num_bits = INTVAL (operands[2]);
@@ -1256,7 +1259,8 @@
 (define_expand "reduc_splus_<mode>"
   [(match_operand:VQ 0 "s_register_operand" "")
    (match_operand:VQ 1 "s_register_operand" "")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)
+   && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -1272,7 +1276,7 @@
   [(set (match_operand:V2DI 0 "s_register_operand" "=w")
 	(unspec:V2DI [(match_operand:V2DI 1 "s_register_operand" "w")]
 		     UNSPEC_VPADD))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vadd.i64\t%e0, %e1, %f1"
   [(set_attr "neon_type" "neon_int_1")]
 )
@@ -1282,7 +1286,7 @@
 (define_expand "reduc_uplus_<mode>"
   [(match_operand:VDQI 0 "s_register_operand" "")
    (match_operand:VDQI 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && (<Is_d_reg> || !BYTES_BIG_ENDIAN)"
 {
   emit_insn (gen_reduc_splus_<mode> (operands[0], operands[1]));
   DONE;
@@ -1301,7 +1305,8 @@
 (define_expand "reduc_smin_<mode>"
   [(match_operand:VQ 0 "s_register_operand" "")
    (match_operand:VQ 1 "s_register_operand" "")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)
+   && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -1326,7 +1331,8 @@
 (define_expand "reduc_smax_<mode>"
   [(match_operand:VQ 0 "s_register_operand" "")
    (match_operand:VQ 1 "s_register_operand" "")]
-  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)"
+  "TARGET_NEON && (!<Is_float_mode> || flag_unsafe_math_optimizations)
+   && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -1351,7 +1357,7 @@
 (define_expand "reduc_umin_<mode>"
   [(match_operand:VQI 0 "s_register_operand" "")
    (match_operand:VQI 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -1376,7 +1382,7 @@
 (define_expand "reduc_umax_<mode>"
   [(match_operand:VQI 0 "s_register_operand" "")
    (match_operand:VQI 1 "s_register_operand" "")]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx step1 = gen_reg_rtx (<V_HALF>mode);
   rtx res_d = gen_reg_rtx (<V_HALF>mode);
@@ -5238,7 +5244,7 @@
         (SE:<V_unpack> (vec_select:<V_HALF>
 			  (match_operand:VU 1 "register_operand" "w")
 			  (match_operand:VU 2 "vect_par_constant_low" ""))))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmovl.<US><V_sz_elem> %q0, %e1"
   [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5248,7 +5254,7 @@
         (SE:<V_unpack> (vec_select:<V_HALF>
 			  (match_operand:VU 1 "register_operand" "w")
 			  (match_operand:VU 2 "vect_par_constant_high" ""))))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmovl.<US><V_sz_elem> %q0, %f1"
   [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5256,7 +5262,7 @@
 (define_expand "vec_unpack<US>_hi_<mode>"
   [(match_operand:<V_unpack> 0 "register_operand" "")
    (SE:<V_unpack> (match_operand:VU 1 "register_operand"))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
   {
    rtvec v = rtvec_alloc (<V_mode_nunits>/2)  ;
    rtx t1;
@@ -5275,7 +5281,7 @@
 (define_expand "vec_unpack<US>_lo_<mode>"
   [(match_operand:<V_unpack> 0 "register_operand" "")
    (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
   {
    rtvec v = rtvec_alloc (<V_mode_nunits>/2)  ;
    rtx t1;
@@ -5298,7 +5304,7 @@
  		        (SE:<V_unpack> (vec_select:<V_HALF>
                            (match_operand:VU 3 "register_operand" "w") 
                            (match_dup 2)))))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmull.<US><V_sz_elem> %q0, %e1, %e3"
   [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5307,7 +5313,7 @@
   [(match_operand:<V_unpack> 0 "register_operand" "")
    (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))
    (SE:<V_unpack> (match_operand:VU 2 "register_operand" ""))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  {
    rtvec v = rtvec_alloc (<V_mode_nunits>/2)  ;
    rtx t1;
@@ -5332,7 +5338,7 @@
 		       (SE:<V_unpack> (vec_select:<V_HALF>
 			    (match_operand:VU 3 "register_operand" "w") 
 			    (match_dup 2)))))]
-  "TARGET_NEON"
+  "TARGET_NEON && !BYTES_BIG_ENDIAN"
   "vmull.<US><V_sz_elem> %q0, %f1, %f3"
   [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5341,7 +5347,7 @@
   [(match_operand:<V_unpack> 0 "register_operand" "")
    (SE:<V_unpack> (match_operand:VU 1 "register_operand" ""))
    (SE:<V_unpack> (match_operand:VU 2 "register_operand" ""))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  {
    rtvec v = rtvec_alloc (<V_mode_nunits>/2)  ;
    rtx t1;
@@ -5435,6 +5441,10 @@
  }
 )
 
+; FIXME: These instruction patterns can't be used safely in big-endian mode
+; because the ordering of vector elements in Q registers is different from what
+; the semantics of the instructions require.
+
 (define_insn "vec_pack_trunc_<mode>"
  [(set (match_operand:<V_narrow_pack> 0 "register_operand" "=&w")
        (vec_concat:<V_narrow_pack> 
@@ -5442,7 +5452,7 @@
 			(match_operand:VN 1 "register_operand" "w"))
 		(truncate:<V_narrow>
 			(match_operand:VN 2 "register_operand" "w"))))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  "vmovn.i<V_sz_elem>\t%e0, %q1\n\tvmovn.i<V_sz_elem>\t%f0, %q2"
  [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5451,7 +5461,7 @@
 (define_insn "neon_vec_pack_trunc_<mode>"
  [(set (match_operand:<V_narrow> 0 "register_operand" "=w")
        (truncate:<V_narrow> (match_operand:VN 1 "register_operand" "w")))]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
  "vmovn.i<V_sz_elem>\t%P0, %q1"
  [(set_attr "neon_type" "neon_shift_1")]
 )
@@ -5460,7 +5470,7 @@
  [(match_operand:<V_narrow_pack> 0 "register_operand" "")
   (match_operand:VSHFT 1 "register_operand" "")
   (match_operand:VSHFT 2 "register_operand")]
- "TARGET_NEON"
+ "TARGET_NEON && !BYTES_BIG_ENDIAN"
 {
   rtx tempreg = gen_reg_rtx (<V_DOUBLE>mode);
   
Index: gcc/testsuite/lib/target-supports.exp
===================================================================
--- gcc/testsuite/lib/target-supports.exp	(revision 167434)
+++ gcc/testsuite/lib/target-supports.exp	(working copy)
@@ -1820,6 +1820,15 @@ proc check_effective_target_arm32 { } {
     }]
 }
 
+# Return 1 if this is a little-endian ARM target
+proc check_effective_target_arm_little_endian { } {
+    return [check_no_compiler_messages arm_little_endian assembly {
+	#if !defined(__arm__) || !defined(__ARMEL__)
+	#error FOO
+	#endif
+    }]
+}
+
 # Return 1 if this is an ARM target that only supports aligned vector accesses
 proc check_effective_target_arm_vect_no_misalign { } {
     return [check_no_compiler_messages arm_vect_no_misalign assembly {
@@ -2759,7 +2768,8 @@ proc check_effective_target_vect_pack_tr
              || [istarget i?86-*-*]
              || [istarget x86_64-*-*]
              || [istarget spu-*-*]
-             || ([istarget arm*-*-*] && [check_effective_target_arm_neon]) } {
+             || ([istarget arm*-*-*] && [check_effective_target_arm_neon]
+		 && [check_effective_target_arm_little_endian]) } {
             set et_vect_pack_trunc_saved 1
         }
     }
@@ -2784,7 +2794,8 @@ proc check_effective_target_vect_unpack 
              || [istarget x86_64-*-*] 
              || [istarget spu-*-*]
              || [istarget ia64-*-*]
-             || ([istarget arm*-*-*] && [check_effective_target_arm_neon]) } {
+             || ([istarget arm*-*-*] && [check_effective_target_arm_neon]
+		 && [check_effective_target_arm_little_endian]) } {
             set et_vect_unpack_saved 1
         }
     }
@@ -2970,7 +2981,8 @@ proc check_effective_target_vect_element
 	verbose "check_effective_target_vect_element_align: using cached result" 2
     } else {
 	set et_vect_element_align 0
-	if { [istarget arm*-*-*]
+	if { ([istarget arm*-*-*]
+	      && ![check_effective_target_arm_vect_no_misalign])
 	     || [check_effective_target_vect_hw_misalign] } {
 	   set et_vect_element_align 1
 	}

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, ARM] Avoid element-order-dependent operations for quad-word vectors in big-endian mode for NEON
  2011-01-12 18:02     ` Julian Brown
@ 2011-02-09 12:11       ` Julian Brown
  2011-04-05 15:40         ` Julian Brown
  2011-06-30 13:23       ` Richard Earnshaw
  1 sibling, 1 reply; 7+ messages in thread
From: Julian Brown @ 2011-02-09 12:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: paul, Richard Earnshaw

On Wed, 12 Jan 2011 17:38:22 +0000
Julian Brown <julian@codesourcery.com> wrote:

> This version of the patch tweaks target-supports.exp to say that
> various operations are not available in big-endian mode (removing some
> of the FAILs from the previous version -- though in big-endian mode
> without -mvectorize-with-neon-quad, some tests have transitioned from
> PASS to XPASS. I'm not sure that's worth worrying about).
> 
> The main part of the patch remains unchanged.

Ping?

Julian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, ARM] Avoid element-order-dependent operations for quad-word vectors in big-endian mode for NEON
  2011-02-09 12:11       ` Julian Brown
@ 2011-04-05 15:40         ` Julian Brown
  0 siblings, 0 replies; 7+ messages in thread
From: Julian Brown @ 2011-04-05 15:40 UTC (permalink / raw)
  To: gcc-patches; +Cc: paul, Richard Earnshaw, Ramana Radhakrishnan

On Wed, 9 Feb 2011 12:11:35 +0000
Julian Brown <julian@codesourcery.com> wrote:

> On Wed, 12 Jan 2011 17:38:22 +0000
> Julian Brown <julian@codesourcery.com> wrote:
> 
> > This version of the patch tweaks target-supports.exp to say that
> > various operations are not available in big-endian mode (removing
> > some of the FAILs from the previous version -- though in big-endian
> > mode without -mvectorize-with-neon-quad, some tests have
> > transitioned from PASS to XPASS. I'm not sure that's worth worrying
> > about).
> > 
> > The main part of the patch remains unchanged.
> 
> Ping?

Ping?

(Patch: http://gcc.gnu.org/ml/gcc-patches/2011-01/msg00768.html)

Thanks,

Julian

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH, ARM] Avoid element-order-dependent operations for quad-word vectors in big-endian mode for NEON
  2011-01-12 18:02     ` Julian Brown
  2011-02-09 12:11       ` Julian Brown
@ 2011-06-30 13:23       ` Richard Earnshaw
  1 sibling, 0 replies; 7+ messages in thread
From: Richard Earnshaw @ 2011-06-30 13:23 UTC (permalink / raw)
  To: Julian Brown; +Cc: gcc-patches, Joseph S. Myers, paul

On 12/01/11 17:38, Julian Brown wrote:
> On Thu, 9 Dec 2010 14:56:39 +0000
> Julian Brown <julian@codesourcery.com> wrote:
> 
>> On Thu, 9 Dec 2010 14:40:59 +0000 (UTC)
>> "Joseph S. Myers" <joseph@codesourcery.com> wrote:
>>
>>> On Thu, 9 Dec 2010, Julian Brown wrote:
>>>
>>>> Unfortunately for C only, since building C++ was broken at the
>>>> time I started testing. Some tests fail to vectorize post-patch
>>>> in BE mode (predictably, since fewer things end up vectorizable),
>>>> but many execution tests transition from FAIL to PASS.
>>>
>>> Do any tests (scan, not execution) now fail for big endian in ways
>>> that indicate some of the check_effective_target_vect_* functions
>>> in target-supports.exp should be updated to know that certain
>>> features are only supported for little endian?
>>
>> Possibly, yes: I suspect a few of those may need updating, really. One
>> awkward bit is that I'm still allowing some of the operations for
>> double-word registers only -- I don't think the
>> check_effective_target_* tests will suffice to distinguish between
>> D-reg & Q-reg cases.
> 
> This version of the patch tweaks target-supports.exp to say that
> various operations are not available in big-endian mode (removing some
> of the FAILs from the previous version -- though in big-endian mode
> without -mvectorize-with-neon-quad, some tests have transitioned from
> PASS to XPASS. I'm not sure that's worth worrying about).
> 
> The main part of the patch remains unchanged.
> 
> OK to apply?
> 
> Cheers,
> 
> Julian
> 
> ChangeLog
> 
>     gcc/
>     * config/arm/neon.md (vec_shr_<mode>, vec_shl_<mode>): Disable in
>     big-endian mode.
>     (reduc_splus_<mode>, reduc_uplus_<mode>, reduc_smin_<mode>)
>     (reduc_smax_<mode>, reduc_umin_<mode>, reduc_umax_<mode>)
>     (neon_vec_unpack<US>_lo_<mode>, neon_vec_unpack<US>_hi_<mode>)
>     (vec_unpack<US>_hi_<mode>, vec_unpack<US>_lo_<mode>)
>     (neon_vec_<US>mult_lo_<mode>, vec_widen_<US>mult_lo_<mode>)
>     (neon_vec_<US>mult_hi_<mode>, vec_widen_<US>mult_hi_<mode>)
>     (vec_pack_trunc_<mode>, neon_vec_pack_trunc_<mode>): Disable for Q
>     registers in big-endian mode.
> 
>     gcc/testsuite/
>     * lib/target-supports.exp
>     (check_effective_target_arm_little_endian): New.
>     (check_effective_target_vect_pack_trunc): Use above.
>     (check_effective_target_vect_unpack): Likewise.
>     (check_effective_target_vect_element_align): Test
>     check_effective_target_arm_vect_no_misalign for ARM.
> 
> 

OK.

R.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-06-30 13:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-09 14:56 [PATCH, ARM] Avoid element-order-dependent operations for quad-word vectors in big-endian mode for NEON Julian Brown
2010-12-09 15:13 ` Joseph S. Myers
2010-12-09 15:19   ` Julian Brown
2011-01-12 18:02     ` Julian Brown
2011-02-09 12:11       ` Julian Brown
2011-04-05 15:40         ` Julian Brown
2011-06-30 13:23       ` Richard Earnshaw

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).