From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 30535 invoked by alias); 24 Sep 2009 23:14:37 -0000 Received: (qmail 30491 invoked by uid 48); 24 Sep 2009 23:14:27 -0000 Date: Thu, 24 Sep 2009 23:14:00 -0000 Subject: [Bug tree-optimization/41464] New: vector loads are unnecessarily split into high and low loads X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "nmiell at comcast dot net" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2009-09/txt/msg02311.txt.bz2 gcc (GCC) 4.4.1 20090725 (Red Hat 4.4.1-2) The testcase (built with -Wall -O3): #include void MulPi(float * __attribute__((aligned(16))) i, float * __attribute__((aligned(16))) f, int n) { for (int j = 0; j < n; j++) f[j] = (float) M_PI * i[j]; } produces the following for the vectorized version of the loop: .L7: movaps %xmm1, %xmm0 # zero XMM0 incl %ecx movlps (%rdi,%rax), %xmm0 # load the low half into XMM0 movhps 8(%rdi,%rax), %xmm0 # load the high half into XMM0 mulps %xmm2, %xmm0 # multiply by pi movaps %xmm0, (%rsi,%rax) # store to memory addq $16, %rax cmpl %r8d, %ecx jb .L7 -- Summary: vector loads are unnecessarily split into high and low loads Product: gcc Version: 4.4.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: nmiell at comcast dot net GCC build triplet: x86_64-linux-gnu GCC host triplet: x86_64-linux-gnu GCC target triplet: x86_64-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464