From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 4540 invoked by alias); 31 Dec 2013 14:50:23 -0000 Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org Received: (qmail 4531 invoked by uid 48); 31 Dec 2013 14:50:20 -0000 From: "freddie at witherden dot org" To: gcc-bugs@gcc.gnu.org Subject: [Bug tree-optimization/59650] New: Inefficient vector assignment code Date: Tue, 31 Dec 2013 14:50:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: tree-optimization X-Bugzilla-Version: 4.8.2 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: freddie at witherden dot org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2013-12/txt/msg02497.txt.bz2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59650 Bug ID: 59650 Summary: Inefficient vector assignment code Product: gcc Version: 4.8.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Consider the following snippet: typedef double v4d __attribute__((vector_size(32))); v4d set1(double *v) { v4d tmp = { v[0], v[1], v[2], v[3] }; return tmp; } v4d set2(double *v) { v4d tmp; tmp[0] = v[0]; tmp[1] = v[1]; tmp[2] = v[2]; tmp[3] = v[3]; return tmp; } if my understanding of the vector extensions is correct they should both do the same thing. Compiling with GCC 4.8.2 with -O3 -march=native on a Sandy Bridge system gives: 0000000000000000 <_Z4set1Pd>: 0: c5 fb 10 57 10 vmovsd 0x10(%rdi),%xmm2 5: c5 fb 10 1f vmovsd (%rdi),%xmm3 9: c5 e9 16 47 18 vmovhpd 0x18(%rdi),%xmm2,%xmm0 e: c5 e1 16 4f 08 vmovhpd 0x8(%rdi),%xmm3,%xmm1 13: c4 e3 75 18 c0 01 vinsertf128 $0x1,%xmm0,%ymm1,%ymm0 19: c3 retq 1a: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) 0000000000000020 <_Z4set2Pd>: 20: c5 fb 10 07 vmovsd (%rdi),%xmm0 24: c5 f9 28 c0 vmovapd %xmm0,%xmm0 28: c5 f9 28 c8 vmovapd %xmm0,%xmm1 2c: c5 f1 16 4f 08 vmovhpd 0x8(%rdi),%xmm1,%xmm1 31: c4 e3 7d 18 c1 00 vinsertf128 $0x0,%xmm1,%ymm0,%ymm0 37: c4 e3 7d 19 c1 01 vextractf128 $0x1,%ymm0,%xmm1 3d: c5 f1 12 4f 10 vmovlpd 0x10(%rdi),%xmm1,%xmm1 42: c4 e3 7d 18 c1 01 vinsertf128 $0x1,%xmm1,%ymm0,%ymm0 48: c4 e3 7d 19 c1 01 vextractf128 $0x1,%ymm0,%xmm1 4e: c5 f1 16 4f 18 vmovhpd 0x18(%rdi),%xmm1,%xmm1 53: c4 e3 7d 18 c1 01 vinsertf128 $0x1,%xmm1,%ymm0,%ymm0 59: c3 retq where I note the functions are different. For set1 I note that four moves are issued whereas I was expecting two 128-bit unaligned moves. The code for set2 also appears to be inefficient.