From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 10156 invoked by alias); 18 Sep 2014 01:26:21 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 10098 invoked by uid 48); 18 Sep 2014 01:26:15 -0000 From: "cbaylis at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/63277] ARM - NEON excessive use of vmov for vtbl2 / uint8x8x2 for shuffling data unnecessarily around Date: Thu, 18 Sep 2014 01:26:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 5.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cbaylis at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2014-09/txt/msg01793.txt.bz2 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63277 cbaylis at gcc dot gnu.org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |cbaylis at gcc dot gnu.org --- Comment #4 from cbaylis at gcc dot gnu.org --- A much simplified test case based on arm_neon_excessive_vmov_wo_vcombine.c $ arm-unknown-linux-gnueabihf-gcc -O2 -S -o - -mfpu=neon mini.c #include void f(int8_t *p) { int8x16_t v; int8x8_t v2; int8x8x2_t vx; v=vld1q_s8(p); v2=vld1_s8(p); vx.val[0] = vget_low_s8(v); vx.val[1] = vget_high_s8(v); v2 = vtbl2_s8(vx, v2); vst1_s8(p, v2); } With -dp, the generated code is: f: vld1.8 {d18-d19}, [r0] @ 6 neon_vld1v16qi [length = 4] vmov d16, d18 @ v8qi @ 10 *neon_movv8qi/1 [length = 4] vld1.8 {d20}, [r0] @ 7 neon_vld1v8qi [length = 4] vmov d17, d19 @ v8qi @ 11 *neon_movv8qi/1 [length = 4] vtbl.8 d16, {d16, d17}, d20 @ 12 neon_vtbl2v8qi [length = 4] vst1.8 {d16}, [r0] @ 13 neon_vst1v8qi [length = 4] bx lr @ 24 *thumb2_return [length = 4] By the time IRA runs, the insns which result in the moves look like this: (insn 9 18 11 2 (set (subreg:V8QI (reg/v:TI 116 [ vx ]) 0) (subreg:V8QI (reg:V16QI 114 [ D.14019 ]) 0)) /tmp/mini.c:11 827 {*neon_movv8qi} The registers 116 and 114 are allocated to different hard registers, as they conflict. Presumably, the register allocator could be taught to treat this subreg->subreg move as a copy and allow the same hard register to be allocated.