From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 2593 invoked by alias); 9 Dec 2012 22:16:19 -0000 Received: (qmail 2520 invoked by uid 48); 9 Dec 2012 22:15:57 -0000 From: "siarhei.siamashka at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/55634] New: ARM: gcc vector extensions: storing vector to unaligned memory location does not use VST1.8 NEON instruction Date: Sun, 09 Dec 2012 22:16:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: siarhei.siamashka at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Changed-Fields: Message-ID: X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated Content-Type: text/plain; charset="UTF-8" MIME-Version: 1.0 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2012-12/txt/msg00909.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55634 Bug #: 55634 Summary: ARM: gcc vector extensions: storing vector to unaligned memory location does not use VST1.8 NEON instruction Classification: Unclassified Product: gcc Version: 4.7.2 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: target AssignedTo: unassigned@gcc.gnu.org ReportedBy: siarhei.siamashka@gmail.com The following test program tries to use GCC vector extensions to add two vectors together and store the result to unaligned memory location in a "portable" way with memcpy: /***********************************************/ #include typedef unsigned int T __attribute__ ((vector_size (16))); void foo (void *result, T *a, T *b) { T tmp = *a + *b; memcpy (result, &tmp, sizeof(tmp)); } /***********************************************/ Compiling with gcc 4.7.2: $ arm-none-linux-gnueabi-gcc -O2 -mcpu=cortex-a8 -mfpu=neon -c test.c $ objdump -d test.o 00000000 : 0: e52d4004 push {r4} ; (str r4, [sp, #-4]!) 4: ecd12b04 vldmia r1, {d18-d19} 8: e24dd014 sub sp, sp, #20 c: ecd20b04 vldmia r2, {d16-d17} 10: e28dc010 add ip, sp, #16 14: f26208e0 vadd.i32 q8, q9, q8 18: ed6c0b04 vstmdb ip!, {d16-d17} 1c: e1a0c00d mov ip, sp 20: e1a04000 mov r4, r0 24: e8bc000f ldm ip!, {r0, r1, r2, r3} 28: e5840000 str r0, [r4] 2c: e5841004 str r1, [r4, #4] 30: e5842008 str r2, [r4, #8] 34: e584300c str r3, [r4, #12] 38: e28dd014 add sp, sp, #20 3c: e8bd0010 pop {r4} 40: e12fff1e bx lr The same test program results in the following code if compiled for x86-64: 0000000000000000 : 0: 66 0f 6f 06 movdqa (%rsi),%xmm0 4: 66 0f fe 02 paddd (%rdx),%xmm0 8: f3 0f 7f 07 movdqu %xmm0,(%rdi) c: c3 retq So x86-64 target is able to use MOVDQU instruction. Hence ARM target should be able to use VST1.8 as well.