On 07/25/2018 07:08 PM, Sudakshina Das wrote: > Hi Sam > > On 25/07/18 14:08, Sam Tebbs wrote: >> On 07/23/2018 05:01 PM, Sudakshina Das wrote: >>> Hi Sam >>> >>> >>> On Monday 23 July 2018 11:39 AM, Sam Tebbs wrote: >>>> Hi all, >>>> >>>> This patch extends the aarch64_get_lane_zero_extendsi instruction >>>> definition to >>>> also cover DI mode. This prevents a redundant AND instruction from >>>> being >>>> generated due to the pattern failing to be matched. >>>> >>>> Example: >>>> >>>> typedef char v16qi __attribute__ ((vector_size (16))); >>>> >>>> unsigned long long >>>> foo (v16qi a) >>>> { >>>> Â return a[0]; >>>> } >>>> >>>> Previously generated: >>>> >>>> foo: >>>> Â Â Â Â Â Â Â umovÂ Â Â w0, v0.b[0] >>>> Â Â Â Â Â Â Â andÂ Â Â Â x0, x0, 255 >>>> Â Â Â Â Â Â Â ret >>>> >>>> And now generates: >>>> >>>> foo: >>>> Â Â Â Â Â Â Â umovÂ Â Â w0, v0.b[0] >>>> Â Â Â Â Â Â Â ret >>>> >>>> Bootstrapped on aarch64-none-linux-gnu and tested on >>>> aarch64-none-elf with no >>>> regressions. >>>> >>>> gcc/ >>>> 2018-07-23Â Sam Tebbs >>>> >>>> Â Â Â Â Â Â Â * config/aarch64/aarch64-simd.md >>>> Â Â Â (*aarch64_get_lane_zero_extendsi): >>>> Â Â Â Â Â Â Â Rename to... >>>> (*aarch64_get_lane_zero_extend): ... This. >>>> Â Â Â Â Â Â Â Use GPI iterator instead of SI mode. >>>> >>>> gcc/testsuite >>>> 2018-07-23Â Sam Tebbs >>>> >>>> Â Â Â Â Â Â Â * gcc.target/aarch64/extract_zero_extend.c: New file >>>> >>> You will need an approval from a maintainer, but I would only add >>> one request to this: >>> >>> diff --git a/gcc/config/aarch64/aarch64-simd.md >>> b/gcc/config/aarch64/aarch64-simd.md >>> index 89e38e6..15fb661 100644 >>> --- a/gcc/config/aarch64/aarch64-simd.md >>> +++ b/gcc/config/aarch64/aarch64-simd.md >>> @@ -3032,15 +3032,16 @@ >>> Â Â [(set_attr "type" "neon_to_gp")] >>> Â ) >>> >>> -(define_insn "*aarch64_get_lane_zero_extendsi" >>> -Â [(set (match_operand:SI 0 "register_operand" "=r") >>> -Â Â Â (zero_extend:SI >>> +(define_insn "*aarch64_get_lane_zero_extend" >>> +Â [(set (match_operand:GPI 0 "register_operand" "=r") >>> +Â Â Â (zero_extend:GPI >>> >>> Since you are adding 4 new patterns with this change, could you add >>> more cases in your test as well to make sure you have coverage for >>> each of them. >>> >>> Thanks >>> Sudi >> >> Hi Sudi, >> >> Thanks for the feedback. Here is an updated patch that adds more >> testcases to cover the patterns generated by the different mode >> combinations. The changelog and description from my original email >> still apply. >> > > Thanks for making the changes and adding more test cases. I do however > see that you are only covering 2 out of 4 new > *aarch64_get_lane_zero_extenddi<> patterns. The > *aarch64_get_lane_zero_extendsi<> were already existing. I don't mind > those tests. I would just ask you to add the other two new patterns > as well. Also since the different versions of the instruction generate > same instructions (like foo_16qi and foo_8qi both give out the same > instruction), I would suggest using a -fdump-rtl-final (or any relevant > rtl dump) with the dg-options and using a scan-rtl-dump to scan the > pattern name. Something like: > /* { dg-do compile } */ > /* { dg-options "-O3 -fdump-rtl-final" } */ > ... > ... > /* { dg-final { scan-rtl-dump "aarch64_get_lane_zero_extenddiv16qi" > "final" } } */ > > Thanks > Sudi Hi Sudi, Thanks again. Here's an update that adds 4 more tests, so all 8 patterns generated are now tested for! Below is the updated changelog gcc/ 2018-07-26Â Sam TebbsÂ Â Â Â Â Â Â Â * config/aarch64/aarch64-simd.md Â Â Â Â Â Â Â (*aarch64_get_lane_zero_extendsi): Â Â Â Â Â Â Â Rename to... (*aarch64_get_lane_zero_extend): ... This. Â Â Â Â Â Â Â Use GPI iterator instead of SI mode. gcc/testsuite 2018-07-26Â Sam TebbsÂ Â Â Â Â Â Â Â * gcc.target/aarch64/extract_zero_extend.c: New file > >>> >>> Â Â Â Â Â (vec_select: >>> Â Â Â Â Â Â Â (match_operand:VDQQH 1 "register_operand" "w") >>> Â Â Â Â Â Â Â (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))] >>> Â Â "TARGET_SIMD" >>> Â Â { >>> -Â Â Â operands[2] = aarch64_endian_lane_rtx (mode, INTVAL >>> (operands[2])); >>> +Â Â Â operands[2] = aarch64_endian_lane_rtx (mode, >>> +Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â INTVAL (operands[2])); >>> Â Â Â Â return "umov\\t%w0, %1.[%2]"; >>> Â Â } >>> Â Â [(set_attr "type" "neon_to_gp")] >> >