Kyrill, I have modified the patch to address your comments. I also modified check_effective_target_vect_widen_sum_hi_to_si_pattern in target-supports.exp to indicate that arm neon supports vector widen sum of HImode to SImode. This resolved several test suite failures. Successfully tested on arm-none-eabi, arm-none-linux-gnueabihf. I have four related execution failure tests on armeb-non-linux-gnueabihf with -flto only. gcc.dg/vect/vect-outer-4f.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4g.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4k.c -flto -ffat-lto-objects execution test gcc.dg/vect/vect-outer-4l.c -flto -ffat-lto-objects execution test I am debugging but have not tracked down the root cause yet. Feedback? 2015-07-22 Michael Collison * config/arm/neon.md (widen_sum): New patterns where mode is VQI to improve mixed mode vectorization. * config/arm/neon.md (vec_sel_widen_ssum_lo3): New define_insn to match low half of signed vaddw. * config/arm/neon.md (vec_sel_widen_ssum_hi3): New define_insn to match high half of signed vaddw. * config/arm/neon.md (vec_sel_widen_usum_lo3): New define_insn to match low half of unsigned vaddw. * config/arm/neon.md (vec_sel_widen_usum_hi3): New define_insn to match high half of unsigned vaddw. * testsuite/gcc.target/arm/neon-vaddws16.c: New test. * testsuite/gcc.target/arm/neon-vaddws32.c: New test. * testsuite/gcc.target/arm/neon-vaddwu16.c: New test. * testsuite/gcc.target/arm/neon-vaddwu32.c: New test. * testsuite/gcc.target/arm/neon-vaddwu8.c: New test. * testsuite/lib/target-supports.exp (check_effective_target_vect_widen_sum_hi_to_si_pattern): Indicate that arm neon support vector widen sum of HImode TO SImode. On 09/23/2015 01:49 AM, Kyrill Tkachov wrote: > Hi Michael, > > On 23/09/15 00:52, Michael Collison wrote: >> This is a modified version of the previous patch that removes the >> documentation and read-md.c fixes. These patches have been submitted >> separately and approved. >> >> This patch is designed to address code that was not being vectorized due >> to missing widening patterns in the ARM backend. Code such as: >> >> int t6(int len, void * dummy, short * __restrict x) >> { >> len = len & ~31; >> int result = 0; >> __asm volatile (""); >> for (int i = 0; i < len; i++) >> result += x[i]; >> return result; >> } >> >> Validated on arm-none-eabi, arm-none-linux-gnueabi, >> arm-none-linux-gnueabihf, and armeb-none-linux-gnueabihf. >> >> 2015-09-22 Michael Collison >> >> * config/arm/neon.md (widen_sum): New patterns >> where mode is VQI to improve mixed mode add vectorization. >> > > Please list all the new define_expands and define_insns > in the changelog. Also, please add an ChangeLog entry for > the testsuite additions. > > The approach looks ok to me with a few comments on some > parts of the patch itself. > > > +(define_insn "vec_sel_widen_ssum_hi3" > + [(set (match_operand: 0 "s_register_operand" "=w") > + (plus: (sign_extend: (vec_select:VW > (match_operand:VQI 1 "s_register_operand" "%w") > + (match_operand:VQI 2 > "vect_par_constant_high" ""))) > + (match_operand: 3 "s_register_operand" > "0")))] > + "TARGET_NEON" > + "vaddw.\t%q0, %q3, %f1" > + [(set_attr "type" "neon_add_widen") > + (set_attr "length" "8")] > +) > > > This is a single instruction, and it has a length of 4, so no need to > override the length attribute. > Same with the other define_insns in this patch. > > > diff --git a/gcc/testsuite/gcc.target/arm/neon-vaddws16.c > b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c > new file mode 100644 > index 0000000..ed10669 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/neon-vaddws16.c > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target arm_neon_hw } */ > > The arm_neon_hw check is usually used when you want to run the tests. > Since this is a compile-only tests you just need arm_neon_ok. > > +/* { dg-add-options arm_neon_ok } */ > +/* { dg-options "-O3" } */ > + > + > +int > +t6(int len, void * dummy, short * __restrict x) > +{ > + len = len & ~31; > + int result = 0; > + __asm volatile (""); > + for (int i = 0; i < len; i++) > + result += x[i]; > + return result; > +} > + > +/* { dg-final { scan-assembler "vaddw\.s16" } } */ > + > + > + > > Stray trailing newlines. Similar comments for the other testcases. > > Thanks, > Kyrill > -- Michael Collison Linaro Toolchain Working Group michael.collison@linaro.org