From: Michael Collison <michael.collison@linaro.org>
To: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,
gcc Patches <gcc-patches@gcc.gnu.org>,
Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>
Subject: Re: [ARM] Use vector wide add for mixed-mode adds
Date: Mon, 15 Feb 2016 06:32:00 -0000 [thread overview]
Message-ID: <56C170DD.1040303@linaro.org> (raw)
In-Reply-To: <56BA1356.4060506@foss.arm.com>
[-- Attachment #1: Type: text/plain, Size: 6465 bytes --]
Hi Kyrill,
I made the following changes based on your comments:
1. I rebased the patch so that it applies cleanly on trunk
2. Fixed the dg-add-options as requested to my new test cases
3. Fixed the GNU style issues identified by ./contrib/check_GNU_style.sh
The failure you are seeing on slp-reduc-3.c is a known failure. The test
case has a xfail with 'xfail { vect_widen_sum_hi_to_si_pattern' which I
added in my patch. Richard Biener resolved some of these issues with PR
68333, but 'slp-reduc-3.c' still fails. I will create a new PR.
I retested on the Linaro testing infrastructure with the latest trunk
and the only failure is 'slp-reduc-3.c'. Okay for GCC 7?
2016-02-12 Michael Collison <michael.collison@linaro.org>
* config/arm/neon.md (widen_<us>sum<mode>): New patterns where
mode is VQI to improve mixed mode vectorization.
* config/arm/neon.md (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3): New
define_insn to match low half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3): New
define_insn to match high half of signed vaddw.
* config/arm/neon.md (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3): New
define_insn to match low half of unsigned vaddw.
* config/arm/neon.md (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3): New
define_insn to match high half of unsigned vaddw.
* config/arm/arm.c (arm_simd_vect_par_cnst_half): New function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/arm-protos.h (arm_simd_vect_par_cnst_half): Prototype
for new function.
(arm_simd_check_vect_par_cnst_half_p): Likewise.
* config/arm/predicates.md (vect_par_constant_high): Support
big endian and simplify by calling
arm_simd_check_vect_par_cnst_half
(vect_par_constant_low): Likewise.
* testsuite/gcc.target/arm/neon-vaddws16.c: New test.
* testsuite/gcc.target/arm/neon-vaddws32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
* testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
* testsuite/lib/target-supports.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
that arm neon support vector widen sum of HImode TO SImode.
On 02/09/2016 09:27 AM, Kyrill Tkachov wrote:
> Hi Michael,
>
> On 17/12/15 00:02, Michael Collison wrote:
>> Kyrill,
>>
>> I have attached a patch that address your comments. The only change I
>> would ask you to re-consider renaming is the function 'bool
>> aarch32_simd_check_vect_par_cnst_half'. This function was copied from
>> the aarch64 port and I thought it as important to match the naming
>> for maintenance purposes. I did rename the function to 'bool
>> arm_simd_check_vect_par_cnst_half_p'. I changed 'aarch32' to 'arm'
>> and added '_p' per you suggestions. Is this okay?
>>
>
> Ok, that's fine with me.
>
>> I implemented all your other change suggestions.
>>
>
> Thanks, sorry it took a long time to get back to this, I was busy with
> regression-fixing patches as we're
> in bug-fixing mode...
>
>> 2015-12-16 Michael Collison <michael.collison@linaro.org>
>>
>> * config/arm/neon.md (widen_<us>sum<mode>): New patterns where
>> mode is VQI to improve mixed mode vectorization.
>> * config/arm/neon.md (vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3):
>> New
>> define_insn to match low half of signed vaddw.
>> * config/arm/neon.md (vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3):
>> New
>> define_insn to match high half of signed vaddw.
>> * config/arm/neon.md (vec_sel_widen_usum_lo<VQI:mode><VW:mode>3):
>> New
>> define_insn to match low half of unsigned vaddw.
>> * config/arm/neon.md (vec_sel_widen_usum_hi<VQI:mode><VW:mode>3):
>> New
>> define_insn to match high half of unsigned vaddw.
>> * config/arm/arm.c (arm_simd_vect_par_cnst_half): New function.
>> (arm_simd_check_vect_par_cnst_half_p): Likewise.
>> * config/arm/arm-protos.h (arm_simd_vect_par_cnst_half): Prototype
>> for new function.
>> (arm_simd_check_vect_par_cnst_half_p): Likewise.
>> * config/arm/predicates.md (vect_par_constant_high): Support
>> big endian and simplify by calling
>> arm_simd_check_vect_par_cnst_half
>> (vect_par_constant_low): Likewise.
>> * testsuite/gcc.target/arm/neon-vaddws16.c: New test.
>> * testsuite/gcc.target/arm/neon-vaddws32.c: New test.
>> * testsuite/gcc.target/arm/neon-vaddwu16.c: New test.
>> * testsuite/gcc.target/arm/neon-vaddwu32.c: New test.
>> * testsuite/gcc.target/arm/neon-vaddwu8.c: New test.
>> * testsuite/lib/target-supports.exp
>> (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate
>> that arm neon support vector widen sum of HImode TO SImode.
>>
>
> I've tried this out and I have a few comments.
> The arm.c hunk doesn't apply to current trunk anymore due to context.
> Can you please rebase the patch?
> I've fixed it up manually in my tree so I can build it.
> With this patch I'm seeing two PASS->FAIL on arm-none-eabi:
> FAIL: gcc.dg/vect/slp-reduc-3.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vectorizing stmts using SLP" 1
> FAIL: gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "vectorizing
> stmts using SLP" 1
> My compiler is configured with --with-float=hard --with-cpu=cortex-a9
> --with-fpu=neon --with-mode=thumb
> Can you please look into these? Maybe it's just the tests that need
> adjustment?
>
> Also, I'm seeing the new tests give an error:
> ERROR: gcc.target/arm/neon-vaddws16.c: Unrecognized option type:
> arm_neon_ok for " dg-add-options 3 arm_neon_ok "
> UNRESOLVED: gcc.target/arm/neon-vaddws16.c: Unrecognized option type:
> arm_neon_ok for " dg-add-options 3 arm_neon_ok "
>
> That've because the dg-add-options argument should be arm_neon rather
> than arm_neon_ok.
> Also, since the new tests are compile-only the effective target check
> should be arm_neon_ok rather than arm_neon_hw.
>
> I also see ./contrib/check_GNU_style.sh complaining about some minor
> style issues like trailing whitespace and
> blocks of whitespace that should be replaced with tabs.
>
> In any case, this patch is GCC 7 material at this point, so I think
> with the above issues resolved
> (and the FAILs investigated) this should be in good shape.
>
> Thanks,
> Kyrill
--
Michael Collison
Linaro Toolchain Working Group
michael.collison@linaro.org
[-- Attachment #2: tcwg_833_upstream_feb_12.patch --]
[-- Type: text/x-patch, Size: 14776 bytes --]
From f3d167389cce45ecbd62bb4b1da754ba629ce32f Mon Sep 17 00:00:00 2001
From: Michael Collison <michael.collison@linaro.org>
Date: Wed, 10 Feb 2016 22:13:26 -0700
Subject: [PATCH] patches for tcwg 833
Fix GNU style issues
GNU formatting changes pt. 2
GNU formatting changes pt. 3
GNU formatting changes pt. 4
Fix inadverdent change
Fix trailing whitespace
Fix another inadverdant change
Fix incorrect application of patch
Fix > 80 character line length issue
Fix trailing whitespace
Fix order of dg-options
---
gcc/config/arm/arm-protos.h | 4 +-
gcc/config/arm/arm.c | 76 ++++++++++++++++
gcc/config/arm/neon.md | 125 ++++++++++++++++++++++++++-
gcc/config/arm/predicates.md | 50 +----------
gcc/testsuite/gcc.target/arm/neon-vaddws16.c | 19 ++++
gcc/testsuite/gcc.target/arm/neon-vaddws32.c | 18 ++++
gcc/testsuite/gcc.target/arm/neon-vaddwu16.c | 18 ++++
gcc/testsuite/gcc.target/arm/neon-vaddwu32.c | 18 ++++
gcc/testsuite/gcc.target/arm/neon-vaddwu8.c | 19 ++++
gcc/testsuite/lib/target-supports.exp | 2 +
10 files changed, 296 insertions(+), 53 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/arm/neon-vaddws16.c
create mode 100644 gcc/testsuite/gcc.target/arm/neon-vaddws32.c
create mode 100644 gcc/testsuite/gcc.target/arm/neon-vaddwu16.c
create mode 100644 gcc/testsuite/gcc.target/arm/neon-vaddwu32.c
create mode 100644 gcc/testsuite/gcc.target/arm/neon-vaddwu8.c
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 0083673..d8179c4 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -50,7 +50,9 @@ extern tree arm_builtin_decl (unsigned code, bool initialize_p
ATTRIBUTE_UNUSED);
extern void arm_init_builtins (void);
extern void arm_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update);
-
+extern rtx arm_simd_vect_par_cnst_half (machine_mode mode, bool high);
+extern bool arm_simd_check_vect_par_cnst_half_p (rtx op, machine_mode mode,
+ bool high);
#ifdef RTX_CODE
extern bool arm_vector_mode_supported_p (machine_mode);
extern bool arm_small_register_classes_for_mode_p (machine_mode);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 82becef..7ac34bb 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -30239,4 +30239,80 @@ arm_sched_fusion_priority (rtx_insn *insn, int max_pri,
return;
}
+
+/* Construct and return a PARALLEL RTX vector with elements numbering the
+ lanes of either the high (HIGH == TRUE) or low (HIGH == FALSE) half of
+ the vector - from the perspective of the architecture. This does not
+ line up with GCC's perspective on lane numbers, so we end up with
+ different masks depending on our target endian-ness. The diagram
+ below may help. We must draw the distinction when building masks
+ which select one half of the vector. An instruction selecting
+ architectural low-lanes for a big-endian target, must be described using
+ a mask selecting GCC high-lanes.
+
+ Big-Endian Little-Endian
+
+GCC 0 1 2 3 3 2 1 0
+ | x | x | x | x | | x | x | x | x |
+Architecture 3 2 1 0 3 2 1 0
+
+Low Mask: { 2, 3 } { 0, 1 }
+High Mask: { 0, 1 } { 2, 3 }
+*/
+
+rtx
+arm_simd_vect_par_cnst_half (machine_mode mode, bool high)
+{
+ int nunits = GET_MODE_NUNITS (mode);
+ rtvec v = rtvec_alloc (nunits / 2);
+ int high_base = nunits / 2;
+ int low_base = 0;
+ int base;
+ rtx t1;
+ int i;
+
+ if (BYTES_BIG_ENDIAN)
+ base = high ? low_base : high_base;
+ else
+ base = high ? high_base : low_base;
+
+ for (i = 0; i < nunits / 2; i++)
+ RTVEC_ELT (v, i) = GEN_INT (base + i);
+
+ t1 = gen_rtx_PARALLEL (mode, v);
+ return t1;
+}
+
+/* Check OP for validity as a PARALLEL RTX vector with elements
+ numbering the lanes of either the high (HIGH == TRUE) or low lanes,
+ from the perspective of the architecture. See the diagram above
+ arm_simd_vect_par_cnst_half_p for more details. */
+
+bool
+arm_simd_check_vect_par_cnst_half_p (rtx op, machine_mode mode,
+ bool high)
+{
+ rtx ideal = arm_simd_vect_par_cnst_half (mode, high);
+ HOST_WIDE_INT count_op = XVECLEN (op, 0);
+ HOST_WIDE_INT count_ideal = XVECLEN (ideal, 0);
+ int i = 0;
+
+ if (!VECTOR_MODE_P (mode))
+ return false;
+
+ if (count_op != count_ideal)
+ return false;
+
+ for (i = 0; i < count_ideal; i++)
+ {
+ rtx elt_op = XVECEXP (op, 0, i);
+ rtx elt_ideal = XVECEXP (ideal, 0, i);
+
+ if (!CONST_INT_P (elt_op)
+ || INTVAL (elt_ideal) != INTVAL (elt_op))
+ return false;
+ }
+ return true;
+}
+
#include "gt-arm.h"
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index f495d40..754d394 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -1204,16 +1204,133 @@
;; Widening operations
+(define_expand "widen_ssum<mode>3"
+ [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
+ (plus:<V_double_width>
+ (sign_extend:<V_double_width>
+ (match_operand:VQI 1 "s_register_operand" ""))
+ (match_operand:<V_double_width> 2 "s_register_operand" "")))]
+ "TARGET_NEON"
+ {
+ machine_mode mode = GET_MODE (operands[1]);
+ rtx p1, p2;
+
+ p1 = arm_simd_vect_par_cnst_half (mode, false);
+ p2 = arm_simd_vect_par_cnst_half (mode, true);
+
+ if (operands[0] != operands[2])
+ emit_move_insn (operands[0], operands[2]);
+
+ emit_insn (gen_vec_sel_widen_ssum_lo<mode><V_half>3 (operands[0],
+ operands[1],
+ p1,
+ operands[0]));
+ emit_insn (gen_vec_sel_widen_ssum_hi<mode><V_half>3 (operands[0],
+ operands[1],
+ p2,
+ operands[0]));
+ DONE;
+ }
+)
+
+(define_insn "vec_sel_widen_ssum_lo<VQI:mode><VW:mode>3"
+ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+ (plus:<VW:V_widen>
+ (sign_extend:<VW:V_widen>
+ (vec_select:VW
+ (match_operand:VQI 1 "s_register_operand" "%w")
+ (match_operand:VQI 2 "vect_par_constant_low" "")))
+ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+ "TARGET_NEON"
+{
+ return BYTES_BIG_ENDIAN ? "vaddw.<V_s_elem>\t%q0, %q3, %f1" :
+ "vaddw.<V_s_elem>\t%q0, %q3, %e1";
+}
+ [(set_attr "type" "neon_add_widen")])
+
+(define_insn "vec_sel_widen_ssum_hi<VQI:mode><VW:mode>3"
+ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+ (plus:<VW:V_widen>
+ (sign_extend:<VW:V_widen>
+ (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+ (match_operand:VQI 2 "vect_par_constant_high" "")))
+ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+ "TARGET_NEON"
+{
+ return BYTES_BIG_ENDIAN ? "vaddw.<V_s_elem>\t%q0, %q3, %e1" :
+ "vaddw.<V_s_elem>\t%q0, %q3, %f1";
+}
+ [(set_attr "type" "neon_add_widen")])
+
(define_insn "widen_ssum<mode>3"
[(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
- (plus:<V_widen> (sign_extend:<V_widen>
- (match_operand:VW 1 "s_register_operand" "%w"))
- (match_operand:<V_widen> 2 "s_register_operand" "w")))]
+ (plus:<V_widen>
+ (sign_extend:<V_widen>
+ (match_operand:VW 1 "s_register_operand" "%w"))
+ (match_operand:<V_widen> 2 "s_register_operand" "w")))]
"TARGET_NEON"
"vaddw.<V_s_elem>\t%q0, %q2, %P1"
[(set_attr "type" "neon_add_widen")]
)
+(define_expand "widen_usum<mode>3"
+ [(set (match_operand:<V_double_width> 0 "s_register_operand" "")
+ (plus:<V_double_width>
+ (zero_extend:<V_double_width>
+ (match_operand:VQI 1 "s_register_operand" ""))
+ (match_operand:<V_double_width> 2 "s_register_operand" "")))]
+ "TARGET_NEON"
+ {
+ machine_mode mode = GET_MODE (operands[1]);
+ rtx p1, p2;
+
+ p1 = arm_simd_vect_par_cnst_half (mode, false);
+ p2 = arm_simd_vect_par_cnst_half (mode, true);
+
+ if (operands[0] != operands[2])
+ emit_move_insn (operands[0], operands[2]);
+
+ emit_insn (gen_vec_sel_widen_usum_lo<mode><V_half>3 (operands[0],
+ operands[1],
+ p1,
+ operands[0]));
+ emit_insn (gen_vec_sel_widen_usum_hi<mode><V_half>3 (operands[0],
+ operands[1],
+ p2,
+ operands[0]));
+ DONE;
+ }
+)
+
+(define_insn "vec_sel_widen_usum_lo<VQI:mode><VW:mode>3"
+ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+ (plus:<VW:V_widen>
+ (zero_extend:<VW:V_widen>
+ (vec_select:VW
+ (match_operand:VQI 1 "s_register_operand" "%w")
+ (match_operand:VQI 2 "vect_par_constant_low" "")))
+ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+ "TARGET_NEON"
+{
+ return BYTES_BIG_ENDIAN ? "vaddw.<V_u_elem>\t%q0, %q3, %f1" :
+ "vaddw.<V_u_elem>\t%q0, %q3, %e1";
+}
+ [(set_attr "type" "neon_add_widen")])
+
+(define_insn "vec_sel_widen_usum_hi<VQI:mode><VW:mode>3"
+ [(set (match_operand:<VW:V_widen> 0 "s_register_operand" "=w")
+ (plus:<VW:V_widen>
+ (zero_extend:<VW:V_widen>
+ (vec_select:VW (match_operand:VQI 1 "s_register_operand" "%w")
+ (match_operand:VQI 2 "vect_par_constant_high" "")))
+ (match_operand:<VW:V_widen> 3 "s_register_operand" "0")))]
+ "TARGET_NEON"
+{
+ return BYTES_BIG_ENDIAN ? "vaddw.<V_u_elem>\t%q0, %q3, %e1" :
+ "vaddw.<V_u_elem>\t%q0, %q3, %f1";
+}
+ [(set_attr "type" "neon_add_widen")])
+
(define_insn "widen_usum<mode>3"
[(set (match_operand:<V_widen> 0 "s_register_operand" "=w")
(plus:<V_widen> (zero_extend:<V_widen>
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 41a6ea4..a21f675 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -605,59 +605,13 @@
(define_special_predicate "vect_par_constant_high"
(match_code "parallel")
{
- HOST_WIDE_INT count = XVECLEN (op, 0);
- int i;
- int base = GET_MODE_NUNITS (mode);
-
- if ((count < 1)
- || (count != base/2))
- return false;
-
- if (!VECTOR_MODE_P (mode))
- return false;
-
- for (i = 0; i < count; i++)
- {
- rtx elt = XVECEXP (op, 0, i);
- int val;
-
- if (!CONST_INT_P (elt))
- return false;
-
- val = INTVAL (elt);
- if (val != (base/2) + i)
- return false;
- }
- return true;
+ return arm_simd_check_vect_par_cnst_half_p (op, mode, true);
})
(define_special_predicate "vect_par_constant_low"
(match_code "parallel")
{
- HOST_WIDE_INT count = XVECLEN (op, 0);
- int i;
- int base = GET_MODE_NUNITS (mode);
-
- if ((count < 1)
- || (count != base/2))
- return false;
-
- if (!VECTOR_MODE_P (mode))
- return false;
-
- for (i = 0; i < count; i++)
- {
- rtx elt = XVECEXP (op, 0, i);
- int val;
-
- if (!CONST_INT_P (elt))
- return false;
-
- val = INTVAL (elt);
- if (val != i)
- return false;
- }
- return true;
+ return arm_simd_check_vect_par_cnst_half_p (op, mode, false);
})
(define_predicate "const_double_vcvt_power_of_two_reciprocal"
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
new file mode 100644
index 0000000..8281134
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_neon } */
+
+
+
+int
+t6 (int len, void * dummy, short * __restrict x)
+{
+ len = len & ~31;
+ int result = 0;
+ __asm volatile ("");
+ for (int i = 0; i < len; i++)
+ result += x[i];
+ return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.s16" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws32.c b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c
new file mode 100644
index 0000000..8c18691
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddws32.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_neon } */
+
+
+int
+t6 (int len, void * dummy, int * __restrict x)
+{
+ len = len & ~31;
+ long long result = 0;
+ __asm volatile ("");
+ for (int i = 0; i < len; i++)
+ result += x[i];
+ return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.s32" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c
new file mode 100644
index 0000000..580bb06
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu16.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_neon } */
+
+
+int
+t6 (int len, void * dummy, unsigned short * __restrict x)
+{
+ len = len & ~31;
+ unsigned int result = 0;
+ __asm volatile ("");
+ for (int i = 0; i < len; i++)
+ result += x[i];
+ return result;
+}
+
+/* { dg-final { scan-assembler "vaddw.u16" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c
new file mode 100644
index 0000000..21b0633
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu32.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_neon } */
+
+
+int
+t6 (int len, void * dummy, unsigned int * __restrict x)
+{
+ len = len & ~31;
+ unsigned long long result = 0;
+ __asm volatile ("");
+ for (int i = 0; i < len; i++)
+ result += x[i];
+ return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.u32" } } */
diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c
new file mode 100644
index 0000000..d350ed5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/neon-vaddwu8.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target arm_neon_ok } */
+/* { dg-options "-O3" } */
+/* { dg-add-options arm_neon } */
+
+
+
+int
+t6 (int len, void * dummy, char * __restrict x)
+{
+ len = len & ~31;
+ unsigned short result = 0;
+ __asm volatile ("");
+ for (int i = 0; i < len; i++)
+ result += x[i];
+ return result;
+}
+
+/* { dg-final { scan-assembler "vaddw\.u8" } } */
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 645981a..01d72a5 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4313,6 +4313,8 @@ proc check_effective_target_vect_widen_sum_hi_to_si_pattern { } {
set et_vect_widen_sum_hi_to_si_pattern_saved 0
if { [istarget powerpc*-*-*]
|| [istarget aarch64*-*-*]
+ || ([istarget arm*-*-*] &&
+ [check_effective_target_arm_neon_ok])
|| [istarget ia64-*-*] } {
set et_vect_widen_sum_hi_to_si_pattern_saved 1
}
--
1.9.1
next prev parent reply other threads:[~2016-02-15 6:32 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-11-30 6:59 Michael Collison
2015-12-10 15:09 ` Kyrill Tkachov
2015-12-17 0:02 ` Michael Collison
2016-02-09 16:27 ` Kyrill Tkachov
2016-02-15 6:32 ` Michael Collison [this message]
-- strict thread matches above, loose matches on Subject: below --
2015-09-23 2:40 Michael Collison
2015-09-23 8:59 ` Kyrill Tkachov
2015-10-01 10:05 ` Michael Collison
2015-10-08 11:02 ` Kyrill Tkachov
2015-10-20 8:11 ` Michael Collison
2015-10-21 15:14 ` Charles Baylis
2015-08-18 8:02 Michael Collison
2015-08-18 13:46 ` Ramana Radhakrishnan
2015-08-23 4:16 ` Michael Collison
2015-08-24 8:37 ` Ramana Radhakrishnan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56C170DD.1040303@linaro.org \
--to=michael.collison@linaro.org \
--cc=Ramana.Radhakrishnan@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=kyrylo.tkachov@foss.arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).