From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 13439 invoked by alias); 21 Jul 2010 17:46:59 -0000 Received: (qmail 13376 invoked by uid 48); 21 Jul 2010 17:46:44 -0000 Date: Wed, 21 Jul 2010 17:46:00 -0000 Subject: [Bug tree-optimization/45021] New: Redundant prefetches for the vectorized loop X-Bugzilla-Reason: CC Message-ID: Reply-To: gcc-bugzilla@gcc.gnu.org To: gcc-bugs@gcc.gnu.org From: "changpeng dot fang at amd dot com" Mailing-List: contact gcc-bugs-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-bugs-owner@gcc.gnu.org X-SW-Source: 2010-07/txt/msg02247.txt.bz2 For the following test case, prefetches will be inserted for both the load and store of a[i] if the loop is vectorized: float a[1024], b[1024]; void foo(int beta) { int i; for(i=0; i<1024; i++) a[i] = a[i] + beta * b[i]; } with gcc -O3 -fprefetch-loop-arrays -march=amdfam10 -S, a piece of the assembly is: movaps (%rcx), %xmm0 addl $4, %edi prefetcht0 (%rdx) prefetcht0 240(%rcx) prefetchw (%rdx) leaq 64(%rax), %rsi mulps %xmm1, %xmm0 If we don't vectorize the loop, we only generate prefetch for the load a[i]: addl $16, %eax salq $2, %rcx mulss %xmm1, %xmm0 prefetcht0 a+92(%rcx) prefetcht0 b+92(%rcx) movl %esi, %ecx -- Summary: Redundant prefetches for the vectorized loop Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: changpeng dot fang at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45021